app

app_zookeeper_http

Zookeeper by HTTP

Overview

This template is designed for the effortless deployment of Apache Zookeeper monitoring by Zabbix via HTTP and doesn't require any external scripts.

This template works with standalone and cluster instances. Metrics are collected from each Zookeeper node by requests to AdminServer.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache Zookeeper, version 3.6+, 3.8+

Configuration

Setup

Enable the AdminServer and configure the parameters according to the official documentation.
Set the hostname or IP address of the Apache Zookeeper host in the {$ZOOKEEPER.HOST} macro. You can also change the {$ZOOKEEPER.COMMAND_URL}, {$ZOOKEEPER.PORT} and {$ZOOKEEPER.SCHEME} macros if necessary.

Macros used

Name	Description	Default
{$ZOOKEEPER.HOST}	The hostname or IP address of the Apache Zookeeper host.	`<SET ZOOKEEPER HOST>`
{$ZOOKEEPER.PORT}	The port the embedded Jetty server listens on (admin.serverPort).	`8080`
{$ZOOKEEPER.COMMAND_URL}	The URL for listing and issuing commands relative to the root URL (admin.commandURL).	`commands`
{$ZOOKEEPER.SCHEME}	Request scheme which may be http or https	`http`
{$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}	Maximum percentage of file descriptors usage alert threshold (for trigger expression).	`85`
{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN}	Maximum number of outstanding requests (for trigger expression).	`10`
{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN}	Maximum number of pending syncs from the followers (for trigger expression).	`10`

Items

Name	Description	Type	Key and additional info
Get server metrics		HTTP agent	zookeeper.get_metrics
Get connections stats	Get information on client connections to server. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance).	HTTP agent	zookeeper.getconnectionsstats
Server mode	Mode of the server. In an ensemble, this may either be leader or follower. Otherwise, it is standalone	Dependent item	zookeeper.server_state Preprocessing JSON Path: `$.server_state` Discard unchanged with heartbeat: `1h`
Uptime	Uptime that a peer has been in a table leading/following/observing state.	Dependent item	zookeeper.uptime Preprocessing JSON Path: `$.uptime` Custom multiplier: `0.001`
Version	Version of Zookeeper server.	Dependent item	zookeeper.version Preprocessing JSON Path: `$.version` Regular expression: `^([0-9\.]+) \1` Discard unchanged with heartbeat: `3h`
Approximate data size	Data tree size in bytes.The size includes the znode path and its value.	Dependent item	zookeeper.approximatedatasize Preprocessing JSON Path: `$.approximate_data_size`
File descriptors, max	Maximum number of file descriptors that a zookeeper server can open.	Dependent item	zookeeper.maxfiledescriptor_count Preprocessing JSON Path: `$.max_file_descriptor_count` Discard unchanged with heartbeat: `1h`
File descriptors, open	Number of file descriptors that a zookeeper server has open.	Dependent item	zookeeper.openfiledescriptor_count Preprocessing JSON Path: `$.open_file_descriptor_count`
Outstanding requests	The number of queued requests when the server is under load and is receiving more sustained requests than it can process.	Dependent item	zookeeper.outstanding_requests Preprocessing JSON Path: `$.outstanding_requests`
Commit per sec	The number of commits performed per second	Dependent item	zookeeper.commit_count.rate Preprocessing JSON Path: `$.commit_count` Change per second
Diff syncs per sec	Number of diff syncs performed per second	Dependent item	zookeeper.diff_count.rate Preprocessing JSON Path: `$.diff_count` Change per second
Snap syncs per sec	Number of snap syncs performed per second	Dependent item	zookeeper.snap_count.rate Preprocessing JSON Path: `$.snap_count` Change per second
Looking per sec	Rate of transitions into looking state.	Dependent item	zookeeper.looking_count.rate Preprocessing JSON Path: `$.looking_count` Change per second
Alive connections	Number of active clients connected to a zookeeper server.	Dependent item	zookeeper.numaliveconnections Preprocessing JSON Path: `$.num_alive_connections`
Global sessions	Number of global sessions.	Dependent item	zookeeper.global_sessions Preprocessing JSON Path: `$.global_sessions`
Local sessions	Number of local sessions.	Dependent item	zookeeper.local_sessions Preprocessing JSON Path: `$.local_sessions`
Drop connections per sec	Rate of connection drops.	Dependent item	zookeeper.connectiondropcount.rate Preprocessing JSON Path: `$.connection_drop_count` Change per second
Rejected connections per sec	Rate of connection rejected.	Dependent item	zookeeper.connection_rejected.rate Preprocessing JSON Path: `$.connection_rejected` Change per second
Revalidate connections per sec	Rate of connection revalidations.	Dependent item	zookeeper.connectionrevalidatecount.rate Preprocessing JSON Path: `$.connection_revalidate_count` Change per second
Revalidate per sec	Rate of revalidations.	Dependent item	zookeeper.revalidate_count.rate Preprocessing JSON Path: `$.revalidate_count` Change per second
Latency, max	The maximum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.max_latency Preprocessing JSON Path: `$.max_latency`
Latency, min	The minimum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.min_latency Preprocessing JSON Path: `$.min_latency`
Latency, avg	The average amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.avg_latency Preprocessing JSON Path: `$.avg_latency`
Znode count	The number of znodes in the ZooKeeper namespace (the data)	Dependent item	zookeeper.znode_count Preprocessing JSON Path: `$.znode_count` Discard unchanged with heartbeat: `1h`
Ephemeral nodes count	Number of ephemeral nodes that a zookeeper server has in its data tree.	Dependent item	zookeeper.ephemerals_count Preprocessing JSON Path: `$.ephemerals_count`
Watch count	Number of watches currently set on the local ZooKeeper process.	Dependent item	zookeeper.watch_count Preprocessing JSON Path: `$.watch_count`
Packets sent per sec	The number of zookeeper packets sent from a server per second.	Dependent item	zookeeper.packets_sent Preprocessing JSON Path: `$.packets_sent` Change per second
Packets received per sec	The number of zookeeper packets received by a server per second.	Dependent item	zookeeper.packets_received.rate Preprocessing JSON Path: `$.packets_received` Change per second
Bytes received per sec	Number of bytes received per second.	Dependent item	zookeeper.bytesreceivedcount.rate Preprocessing JSON Path: `$.bytes_received_count` Change per second
Election time, avg	Time between entering and leaving election.	Dependent item	zookeeper.avgelectiontime Preprocessing JavaScript: `The text is too long. Please see the template.`
Elections	Number of elections happened.	Dependent item	zookeeper.cntelectiontime Preprocessing JavaScript: `The text is too long. Please see the template.`
Fsync time, avg	Time to fsync transaction log.	Dependent item	zookeeper.avg_fsynctime Preprocessing JavaScript: `The text is too long. Please see the template.`
Fsync	Count of performed fsyncs.	Dependent item	zookeeper.cnt_fsynctime Preprocessing JavaScript: `The text is too long. Please see the template.`
Snapshot write time, avg	Average time to write a snapshot.	Dependent item	zookeeper.avg_snapshottime Preprocessing JavaScript: `The text is too long. Please see the template.`
Snapshot writes	Count of performed snapshot writes.	Dependent item	zookeeper.cnt_snapshottime Preprocessing JavaScript: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
Zookeeper: Server mode has changed	Zookeeper node state has changed. Acknowledge to close the problem manually.	`last(/Zookeeper by HTTP/zookeeper.server_state,#1)<>last(/Zookeeper by HTTP/zookeeper.server_state,#2) and length(last(/Zookeeper by HTTP/zookeeper.server_state))>0`\|Info	Manual close: Yes
Zookeeper: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes	`nodata(/Zookeeper by HTTP/zookeeper.uptime,10m)=1`\|Warning	Manual close: Yes
Zookeeper: Version has changed	Zookeeper version has changed. Acknowledge to close the problem manually.	`last(/Zookeeper by HTTP/zookeeper.version,#1)<>last(/Zookeeper by HTTP/zookeeper.version,#2) and length(last(/Zookeeper by HTTP/zookeeper.version))>0`\|Info	Manual close: Yes
Zookeeper: Too many file descriptors used	Number of file descriptors used more than {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}% of the available number of file descriptors.	`min(/Zookeeper by HTTP/zookeeper.open_file_descriptor_count,5m) * 100 / last(/Zookeeper by HTTP/zookeeper.max_file_descriptor_count) > {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}`\|Warning
Zookeeper: Too many queued requests	Number of queued requests in the server. This goes up when the server receives more requests than it can process.	`min(/Zookeeper by HTTP/zookeeper.outstanding_requests,5m)>{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN}`\|Average	Manual close: Yes

LLD rule Leader metrics discovery

Name Description Type Key and additional info

Leader metrics discovery

Name	Description	Type	Key and additional info
Leader metrics discovery	Additional metrics for leader node	Dependent item	zookeeper.metrics.leader Preprocessing JSON Path: `$.server_state` JavaScript: `The text is too long. Please see the template.`

Additional metrics for leader node

Dependent item

zookeeper.metrics.leader

Preprocessing

JSON Path: $.server_state
JavaScript: The text is too long. Please see the template.

Item prototypes for Leader metrics discovery

Name	Description	Type	Key and additional info
Pending syncs{#SINGLETON}	Number of pending syncs to carry out to ZooKeeper ensemble followers.	Dependent item	zookeeper.pending_syncs[{#SINGLETON}] Preprocessing JSON Path: `$.pending_syncs`
Quorum size{#SINGLETON}		Dependent item	zookeeper.quorum_size[{#SINGLETON}] Preprocessing JSON Path: `$.quorum_size`
Synced followers{#SINGLETON}	Number of synced followers reported when a node server_state is leader.	Dependent item	zookeeper.synced_followers[{#SINGLETON}] Preprocessing JSON Path: `$.synced_followers`
Synced non-voting follower{#SINGLETON}	Number of synced voting followers reported when a node server_state is leader.	Dependent item	zookeeper.syncednonvoting_followers[{#SINGLETON}] Preprocessing JSON Path: `$.synced_non_voting_followers`
Synced observers{#SINGLETON}	Number of synced observers.	Dependent item	zookeeper.synced_observers[{#SINGLETON}] Preprocessing JSON Path: `$.synced_observers`
Learners{#SINGLETON}	Number of learners.	Dependent item	zookeeper.learners[{#SINGLETON}] Preprocessing JSON Path: `$.learners`

Trigger prototypes for Leader metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Zookeeper: Too many pending syncs		`min(/Zookeeper by HTTP/zookeeper.pending_syncs[{#SINGLETON}],5m)>{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN}`\|Average	Manual close: Yes
Zookeeper: Too few active followers	The number of followers should equal the total size of your ZooKeeper ensemble, minus 1 (the leader is not included in the follower count). If the ensemble fails to maintain quorum, all automatic failover features are suspended.	`last(/Zookeeper by HTTP/zookeeper.synced_followers[{#SINGLETON}]) < last(/Zookeeper by HTTP/zookeeper.quorum_size[{#SINGLETON}])-1`\|Average

LLD rule Clients discovery

Name Description Type Key and additional info

Clients discovery

Name	Description	Type	Key and additional info
Clients discovery	Get list of client connections. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance).	HTTP agent	zookeeper.clients Preprocessing JavaScript: `The text is too long. Please see the template.`

Get list of client connections.

Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance).

HTTP agent

zookeeper.clients

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Clients discovery

Name	Description	Type	Key and additional info
Zookeeper client {#TYPE} [{#CLIENT}]: Get client info	The item gets information about "{#CLIENT}" client of "{#TYPE}" type.	Dependent item	zookeeper.client_info[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, max	The maximum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.max_latency[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.max_latency`
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, min	The minimum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.min_latency[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.min_latency`
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, avg	The average amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.avg_latency[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.avg_latency`
Zookeeper client {#TYPE} [{#CLIENT}]: Packets sent per sec	The number of packets sent.	Dependent item	zookeeper.packets_sent[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.packets_sent` Change per second
Zookeeper client {#TYPE} [{#CLIENT}]: Packets received per sec	The number of packets received.	Dependent item	zookeeper.packets_received[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.packets_received` Change per second
Zookeeper client {#TYPE} [{#CLIENT}]: Outstanding requests	The number of queued requests when the server is under load and is receiving more sustained requests than it can process.	Dependent item	zookeeper.outstanding_requests[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.outstanding_requests`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_zabbix_server

View README Download JSON

Zabbix server health

Overview

This template is designed to monitor internal Zabbix metrics on the local Zabbix server.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix server 7.0

Configuration

Setup

Link this template to the local Zabbix server host.

Macros used

Name	Description	Default
{$ZABBIX.PROXY.LAST_SEEN.MAX}	The maximum number of seconds that Zabbix proxy has not been seen.	`600`
{$PROXY.GROUP.AVAIL.PERCENT.MIN}	Minimum threshold for the proxy group availability percentage triggers.	`75`
{$PROXY.GROUP.DISCOVERY.NAME.MATCHES}	Filter to include discovered proxy groups by their name.	`.*`
{$PROXY.GROUP.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered proxy groups by their name.	`CHANGE_IF_NEEDED`
{$ZABBIX.SERVER.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.SERVER.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.SERVER.UTIL.MAX:"value cache"}	Maximum threshold for the value cache utilization trigger.	`95`

Items

Name	Description	Type	Key and additional info
Zabbix stats cluster	The master item of Zabbix cluster statistics.	Zabbix internal	zabbix[cluster,discovery,nodes]
Zabbix proxies stats	The master item of Zabbix proxies' statistics.	Zabbix internal	zabbix[proxy,discovery]
Zabbix proxy groups stats	The master item of Zabbix proxy groups' statistics.	Zabbix internal	zabbix[proxy group,discovery]
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix internal	zabbix[queue,10m]
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix internal	zabbix[queue]
Zabbix preprocessing	The master item of Zabbix server's preprocessing statistics.	Zabbix internal	zabbix[preprocessing]
Utilization of alert manager internal processes, in %	The average percentage of the time during which the alert manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,alert manager,avg,busy]
Utilization of alert syncer internal processes, in %	The average percentage of the time during which the alert syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,alert syncer,avg,busy]
Utilization of alerter internal processes, in %	The average percentage of the time during which the alerter processes have been busy for the last minute.	Zabbix internal	zabbix[process,alerter,avg,busy]
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,availability manager,avg,busy]
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,configuration syncer,avg,busy]
Utilization of configuration syncer worker internal processes, in %	The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,configuration syncer worker,avg,busy]
Utilization of escalator internal processes, in %	The average percentage of the time during which the escalator processes have been busy for the last minute.	Zabbix internal	zabbix[process,escalator,avg,busy]
Utilization of history poller internal processes, in %	The average percentage of the time during which the history poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,history poller,avg,busy]
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,odbc poller,avg,busy]
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,history syncer,avg,busy]
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Zabbix internal	zabbix[process,housekeeper,avg,busy]
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,http poller,avg,busy]
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Zabbix internal	zabbix[process,icmp pinger,avg,busy]
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,ipmi manager,avg,busy]
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,ipmi poller,avg,busy]
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,java poller,avg,busy]
Utilization of LLD manager internal processes, in %	The average percentage of the time during which the LLD manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,lld manager,avg,busy]
Utilization of LLD worker internal processes, in %	The average percentage of the time during which the LLD worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,lld worker,avg,busy]
Utilization of connector manager internal processes, in %	The average percentage of the time during which the connector manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,connector manager,avg,busy]
Utilization of connector worker internal processes, in %	The average percentage of the time during which the connector worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,connector worker,avg,busy]
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,discovery manager,avg,busy]
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,discovery worker,avg,busy]
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,poller,avg,busy]
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,preprocessing worker,avg,busy]
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,preprocessing manager,avg,busy]
Utilization of proxy poller data collector processes, in %	The average percentage of the time during which the proxy poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,proxy poller,avg,busy]
Utilization of proxy group manager internal processes, in %	The average percentage of the time during which the proxy group manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,proxy group manager,avg,busy]
Utilization of report manager internal processes, in %	The average percentage of the time during which the report manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,report manager,avg,busy]
Utilization of report writer internal processes, in %	The average percentage of the time during which the report writer processes have been busy for the last minute.	Zabbix internal	zabbix[process,report writer,avg,busy]
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Zabbix internal	zabbix[process,self-monitoring,avg,busy]
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Zabbix internal	zabbix[process,snmp trapper,avg,busy]
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,task manager,avg,busy]
Utilization of timer internal processes, in %	The average percentage of the time during which the timer processes have been busy for the last minute.	Zabbix internal	zabbix[process,timer,avg,busy]
Utilization of service manager internal processes, in %	The average percentage of the time during which the service manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,service manager,avg,busy]
Utilization of trigger housekeeper internal processes, in %	The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute.	Zabbix internal	zabbix[process,trigger housekeeper,avg,busy]
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Zabbix internal	zabbix[process,trapper,avg,busy]
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,unreachable poller,avg,busy]
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Zabbix internal	zabbix[process,vmware collector,avg,busy]
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,agent poller,avg,busy]
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,http agent poller,avg,busy]
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,snmp poller,avg,busy]
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,internal poller,avg,busy]
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,browser poller,avg,busy]
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Zabbix internal	zabbix[rcache,buffer,pused]
Trend function cache, % of unique requests	The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced.	Zabbix internal	zabbix[tcache,cache,pitems]
Trend function cache, % of misses	The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses.	Zabbix internal	zabbix[tcache,cache,pmisses]
Value cache, % used	The availability statistics of Zabbix value cache. The percentage of used data buffer.	Zabbix internal	zabbix[vcache,buffer,pused]
Value cache hits	The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache).	Zabbix internal	zabbix[vcache,cache,hits] Preprocessing Change per second
Value cache misses	The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database).	Zabbix internal	zabbix[vcache,cache,misses] Preprocessing Change per second
Value cache operating mode	The operating mode of the value cache.	Zabbix internal	zabbix[vcache,cache,mode]
Zabbix server check	Flag indicating whether it is a server or not.	Zabbix internal	zabbix[triggers] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix server.	Zabbix internal	zabbix[version] Preprocessing Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Zabbix internal	zabbix[vmware,buffer,pused]
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Zabbix internal	zabbix[wcache,history,pused]
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Zabbix internal	zabbix[wcache,index,pused]
Trend write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour.	Zabbix internal	zabbix[wcache,trend,pused]
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Zabbix internal	zabbix[wcache,values] Preprocessing Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Zabbix internal	zabbix[wcache,values,uint] Preprocessing Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Zabbix internal	zabbix[wcache,values,float] Preprocessing Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Zabbix internal	zabbix[wcache,values,log] Preprocessing Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Zabbix internal	zabbix[wcache,values,not supported] Preprocessing Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Zabbix internal	zabbix[wcache,values,str] Preprocessing Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Zabbix internal	zabbix[wcache,values,text] Preprocessing Change per second
Number of values synchronized with the database per second	Average quantity of values written to the database, recalculated once per minute.	Zabbix internal	zabbix[vps,written] Preprocessing Change per second
LLD queue	The number of values enqueued in the low-level discovery processing queue.	Zabbix internal	zabbix[lld_queue]
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	zabbix[preprocessing_queue] Preprocessing JSON Path: `$.data.queue`
Preprocessing throughput	Reflects the throughput of the preprocessing.	Dependent item	zabbix[preprocessing_throughput] Preprocessing JSON Path: `$.data.queued.size` Change per second
Connector queue	The count of values enqueued in the connector queue.	Zabbix internal	zabbix[connector_queue]
Discovery queue	The count of values enqueued in the discovery queue.	Zabbix internal	zabbix[discovery_queue]

Triggers

Name	Description	Expression	Severity
Zabbix server: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Zabbix server health/zabbix[queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix server: Utilization of alert manager processes is high	Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,alert manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alert syncer processes is high	Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,alert syncer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alerter processes is high	Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,alerter,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"}`\|Average	Manual close: Yes
Zabbix server: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer worker processes is high	Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,configuration syncer worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of escalator processes is high	Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,escalator,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history poller processes is high	Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,history poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD manager processes is high	Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,lld manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD worker processes is high	Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,lld worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector manager processes is high	Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,connector manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector worker processes is high	Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,connector worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,discovery manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,discovery worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy poller processes is high	Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,proxy poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy group manager processes is high	Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,proxy group manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report manager processes is high	Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,report manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report writer processes is high	Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,report writer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of timer processes is high	Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,timer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of service manager processes is high	Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,service manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trigger housekeeper processes is high	Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,trigger housekeeper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix server: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,agent poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,http agent poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,snmp poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,internal poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health/zabbix[process,browser poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix server: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive value cache usage	Consider increasing `ValueCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[vcache,buffer,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"}`\|Average	Manual close: Yes
Zabbix server: Zabbix value cache working in low-memory mode	Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner.	`last(/Zabbix server health/zabbix[vcache,cache,mode])=1`\|High	Manual close: Yes
Zabbix server: Wrong template assigned	Check that the template has been selected correctly.	`last(/Zabbix server health/zabbix[triggers])=0`\|Disaster	Manual close: Yes
Zabbix server: Version has changed	Zabbix server version has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health/zabbix[version],#1)<>last(/Zabbix server health/zabbix[version],#2) and length(last(/Zabbix server health/zabbix[version]))>0`\|Info	Manual close: Yes
Zabbix server: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[wcache,history,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[wcache,index,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive trends cache usage	Consider increasing `TrendCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[wcache,trend,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"}`\|Average	Manual close: Yes

LLD rule Zabbix proxy discovery

Name Description Type Key and additional info

Zabbix proxy discovery

Name	Description	Type	Key and additional info
Zabbix proxy discovery	LLD rule with item and trigger prototypes for proxy discovery.	Dependent item	zabbix.proxy.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for proxy discovery.

Dependent item

zabbix.proxy.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Zabbix proxy discovery

Name	Description	Type	Key and additional info
Proxy [{#PROXY.NAME}]: Stats	The statistics for the discovered proxy.	Dependent item	zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing JSON Path: `$.[?(@.name=="{#PROXY.NAME}")].first()`
Proxy [{#PROXY.NAME}]: Mode	The mode of Zabbix proxy.	Dependent item	zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing JSON Path: `$.passive` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Unencrypted	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing JSON Path: `$.unencrypted` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: PSK	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing JSON Path: `$.psk` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Certificate	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing JSON Path: `$.cert` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Compression	The compression status of a proxy.	Dependent item	zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing JSON Path: `$.compression` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Item count	The number of enabled items on enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.items[{#PROXY.NAME}] Preprocessing JSON Path: `$.items` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Host count	The number of enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing JSON Path: `$.hosts` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Version	A version of Zabbix proxy.	Dependent item	zabbix.proxy.version[{#PROXY.NAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Last seen, in seconds	The time when a proxy was last seen by a server.	Dependent item	zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing JSON Path: `$.last_seen`
Proxy [{#PROXY.NAME}]: Compatibility	Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version).	Dependent item	zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing JSON Path: `$.compatibility` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing JSON Path: `$.requiredperformance` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy discovery

Name	Description	Expression
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX}`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated	Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available.	`last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported	Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version.	`last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3`\|High

LLD rule Zabbix proxy groups discovery

Name Description Type Key and additional info

Zabbix proxy groups discovery

Name	Description	Type	Key and additional info
Zabbix proxy groups discovery	LLD rule with item and trigger prototypes for proxy groups discovery.	Dependent item	zabbix.proxy.groups.discovery Preprocessing JSON Path: `$.data` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for proxy groups discovery.

Dependent item

zabbix.proxy.groups.discovery

Preprocessing

JSON Path: $.data
Discard unchanged with heartbeat: 1h

Item prototypes for Zabbix proxy groups discovery

Name	Description	Type	Key and additional info
Proxy group [{#PROXY.GROUP.NAME}]: Stats	The statistics for the discovered proxy group.	Dependent item	zabbix.proxy.group.stats[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.rtdata["{#PROXY.GROUP.NAME}"]`
Proxy group [{#PROXY.GROUP.NAME}]: State	State of the Zabbix proxy group. Possible values: 0 - unknown; 1 - offline; 2 - recovering; 3 - online; 4 - degrading.	Dependent item	zabbix.proxy.group.state[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.state` Discard unchanged with heartbeat: `12h`
Proxy group [{#PROXY.GROUP.NAME}]: Available proxies	Count of available proxies in the Zabbix proxy group.	Dependent item	zabbix.proxy.group.avail.proxies[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.available` Discard unchanged with heartbeat: `12h`
Proxy group [{#PROXY.GROUP.NAME}]: Available proxies, in %	Percentage of available proxies in the Zabbix proxy group.	Dependent item	zabbix.proxy.group.avail.proxies.percent[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.pavailable` Discard unchanged with heartbeat: `12h`
Proxy group [{#PROXY.GROUP.NAME}]: Settings	The settings for the discovered proxy group.	Dependent item	zabbix.proxy.group.settings[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.data[?(@.name=="{#PROXY.GROUP.NAME}")].first()`
Proxy group [{#PROXY.GROUP.NAME}]: Failover period	Failover period in the Zabbix proxy group.	Dependent item	zabbix.proxy.group.failover[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.failover_delay` Discard unchanged with heartbeat: `12h`
Proxy group [{#PROXY.GROUP.NAME}]: Minimum number of proxies	Minimum number of proxies online in the Zabbix proxy group.	Dependent item	zabbix.proxy.group.online.min[{#PROXY.GROUP.NAME}] Preprocessing JSON Path: `$.min_online` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy groups discovery

Name	Description	Expression	Severity
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status "offline"	The state of the Zabbix proxy group is "offline".	`last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#1)=1`\|High
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status "degrading"	The state of the Zabbix proxy group is "degrading".	`last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#1)=4`\|Average	Depends on: Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status "offline"
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status changed	The state of the Zabbix proxy group has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#1)<>last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#2) and length(last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}]))>0`\|Info	Manual close: Yes Depends on: Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status "degrading"
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Availability too low	The availability of proxies in a proxy group is below {$PROXY.GROUP.AVAIL.PERCENT.MIN}% for at least 3 minutes.	`max(/Zabbix server health/zabbix.proxy.group.avail.proxies.percent[{#PROXY.GROUP.NAME}],3m)<{$PROXY.GROUP.AVAIL.PERCENT.MIN}`\|Warning
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Failover invalid value	Proxy group failover has an invalid value.	`last(/Zabbix server health/zabbix.proxy.group.failover[{#PROXY.GROUP.NAME}],#1)=-1`\|Warning
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Minimum number of proxies invalid value	Proxy group minimum number of proxies has an invalid value.	`last(/Zabbix server health/zabbix.proxy.group.online.min[{#PROXY.GROUP.NAME}],#1)=-1`\|Warning

LLD rule High availability cluster node discovery

Name Description Type Key and additional info

High availability cluster node discovery

Name	Description	Type	Key and additional info
High availability cluster node discovery	LLD rule with item and trigger prototypes for node discovery.	Dependent item	zabbix.nodes.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for node discovery.

Dependent item

zabbix.nodes.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for High availability cluster node discovery

Name	Description	Type	Key and additional info
Cluster node [{#NODE.NAME}]: Stats	Provides the statistics of a node.	Dependent item	zabbix.node.stats[{#NODE.ID}] Preprocessing JSON Path: `$.[?(@.id=="{#NODE.ID}")].first()`
Cluster node [{#NODE.NAME}]: Address	The IPv4 address of a node.	Dependent item	zabbix.node.address[{#NODE.ID}] Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `12h`
Cluster node [{#NODE.NAME}]: Last access time	Last access time.	Dependent item	zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess`
Cluster node [{#NODE.NAME}]: Last access age	The time between the database's `unix_timestamp()` and the last access time.	Dependent item	zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess_age`
Cluster node [{#NODE.NAME}]: Status	The status of a node.	Dependent item	zabbix.node.status[{#NODE.ID}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `12h`

Trigger prototypes for High availability cluster node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed	The state of the node has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health/zabbix.node.status[{#NODE.ID}],#1)<>last(/Zabbix server health/zabbix.node.status[{#NODE.ID}],#2)`\|Info	Manual close: Yes

Remote Zabbix server health

Overview

This template is designed to monitor internal Zabbix metrics on the remote Zabbix server.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix server 7.0

Configuration

Setup

Specify the address of the remote Zabbix server by changing the {$ZABBIX.SERVER.ADDRESS} and {$ZABBIX.SERVER.PORT} macros. Don't forget to adjust the StatsAllowedIP parameter in the remote server's configuration file to allow the collection of statistics.

Macros used

Name	Description	Default
{$ZABBIX.SERVER.ADDRESS}	IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1).
{$ZABBIX.SERVER.PORT}	Port of server to be remotely queried (default is 10051).
{$ZABBIX.PROXY.LAST_SEEN.MAX}	The maximum number of seconds that Zabbix proxy has not been seen.	`600`
{$ZABBIX.SERVER.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expressions.	`5m`
{$ZABBIX.SERVER.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.SERVER.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.SERVER.UTIL.MAX:"value cache"}	Maximum threshold for value cache utilization triggers.	`95`

Items

Name	Description	Type	Key and additional info
Zabbix stats	The master item of Zabbix server statistics.	Zabbix internal	zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}]
Zabbix proxies stats	The master item of Zabbix proxies' statistics.	Dependent item	zabbix.proxies.stats Preprocessing JSON Path: `$.data.proxy`
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix internal	zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix internal	zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing JSON Path: `$.queue`
Utilization of alert manager internal processes, in %	The average percentage of the time during which the alert manager processes have been busy for the last minute.	Dependent item	process.alert_manager.avg.busy Preprocessing JSON Path: `$.data.process['alert manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert manager" processes started.`
Utilization of alert syncer internal processes, in %	The average percentage of the time during which the alert syncer processes have been busy for the last minute.	Dependent item	process.alert_syncer.avg.busy Preprocessing JSON Path: `$.data.process['alert syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert syncer" processes started.`
Utilization of alerter internal processes, in %	The average percentage of the time during which the alerter processes have been busy for the last minute.	Dependent item	process.alerter.avg.busy Preprocessing JSON Path: `$.data.process['alerter'].busy.avg` ⛔️Custom on fail: Set error to: `No "alerter" processes started.`
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "availability manager" processes started.`
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "configuration syncer" processes started.`
Utilization of configuration syncer worker internal processes, in %	The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute.	Dependent item	process.configurationsyncerworker.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer worker'].busy.avg`
Utilization of escalator internal processes, in %	The average percentage of the time during which the escalator processes have been busy for the last minute.	Dependent item	process.escalator.avg.busy Preprocessing JSON Path: `$.data.process['escalator'].busy.avg` ⛔️Custom on fail: Set error to: `No "escalator" processes started.`
Utilization of history poller internal processes, in %	The average percentage of the time during which the history poller processes have been busy for the last minute.	Dependent item	process.history_poller.avg.busy Preprocessing JSON Path: `$.data.process['history poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "history poller" processes started.`
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "history syncer" processes started.`
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "housekeeper" processes started.`
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http poller" processes started.`
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `No "icmp pinger" processes started.`
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi manager" processes started.`
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi poller" processes started.`
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "java poller" processes started.`
Utilization of LLD manager internal processes, in %	The average percentage of the time during which the LLD manager processes have been busy for the last minute.	Dependent item	process.lld_manager.avg.busy Preprocessing JSON Path: `$.data.process['lld manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD manager" processes started.`
Utilization of LLD worker internal processes, in %	The average percentage of the time during which the LLD worker processes have been busy for the last minute.	Dependent item	process.lld_worker.avg.busy Preprocessing JSON Path: `$.data.process['lld worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD worker" processes started.`
Utilization of connector manager internal processes, in %	The average percentage of the time during which the connector manager processes have been busy for the last minute.	Dependent item	process.connector_manager.avg.busy Preprocessing JSON Path: `$.data.process['connector manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector manager" processes started.`
Utilization of connector worker internal processes, in %	The average percentage of the time during which the connector worker processes have been busy for the last minute.	Dependent item	process.connector_worker.avg.busy Preprocessing JSON Path: `$.data.process['connector worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector worker" processes started.`
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Dependent item	process.discovery_manager.avg.busy Preprocessing JSON Path: `$.data.process['discovery manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery manager" processes started.`
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Dependent item	process.discovery_worker.avg.busy Preprocessing JSON Path: `$.data.process['discovery worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery worker" processes started.`
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "poller" processes started.`
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing worker" processes started.`
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing manager" processes started.`
Utilization of proxy poller data collector processes, in %	The average percentage of the time during which the proxy poller processes have been busy for the last minute.	Dependent item	process.proxy_poller.avg.busy Preprocessing JSON Path: `$.data.process['proxy poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "proxy poller" processes started.`
Utilization of proxy group manager internal processes, in %	The average percentage of the time during which the proxy group manager processes have been busy for the last minute.	Dependent item	process.proxygroupmanager.avg.busy Preprocessing JSON Path: `$.data.process['proxy group manager'].busy.avg`
Utilization of report manager internal processes, in %	The average percentage of the time during which the report manager processes have been busy for the last minute.	Dependent item	process.report_manager.avg.busy Preprocessing JSON Path: `$.data.process['report manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "report manager" processes started.`
Utilization of report writer internal processes, in %	The average percentage of the time during which the report writer processes have been busy for the last minute.	Dependent item	process.report_writer.avg.busy Preprocessing JSON Path: `$.data.process['report writer'].busy.avg` ⛔️Custom on fail: Set error to: `No "report writer" processes started.`
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `No "self-monitoring" processes started.`
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp trapper" processes started.`
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "task manager" processes started.`
Utilization of timer internal processes, in %	The average percentage of the time during which the timer processes have been busy for the last minute.	Dependent item	process.timer.avg.busy Preprocessing JSON Path: `$.data.process['timer'].busy.avg` ⛔️Custom on fail: Set error to: `No "timer" processes started.`
Utilization of service manager internal processes, in %	The average percentage of the time during which the service manager processes have been busy for the last minute.	Dependent item	process.service_manager.avg.busy Preprocessing JSON Path: `$.data.process['service manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "service manager" processes started.`
Utilization of trigger housekeeper internal processes, in %	The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute.	Dependent item	process.trigger_housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['trigger housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trigger housekeeper" processes started.`
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trapper" processes started.`
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "unreachable poller" processes started.`
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Dependent item	process.agent_poller.avg.busy Preprocessing JSON Path: `$.data.process['agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "agent poller" processes started.`
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Dependent item	process.httpagentpoller.avg.busy Preprocessing JSON Path: `$.data.process['http agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http agent poller" processes started.`
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Dependent item	process.snmp_poller.avg.busy Preprocessing JSON Path: `$.data.process['snmp poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp poller" processes started.`
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Dependent item	process.internal_poller.avg.busy Preprocessing JSON Path: `$.data.process['internal poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "internal poller" processes started.`
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Dependent item	process.browser_poller.avg.busy Preprocessing JSON Path: `$.data.process['browser poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "browser poller" processes started.`
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Trend function cache, % of unique requests	The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced.	Dependent item	tcache.pitems Preprocessing JSON Path: `$.data.tcache.pitems` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Trend function cache, % of misses	The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses.	Dependent item	tcache.pmisses Preprocessing JSON Path: `$.data.tcache.pmisses` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Value cache, % used	The availability statistics of Zabbix value cache. The percentage of used data buffer.	Dependent item	vcache.buffer.pused Preprocessing JSON Path: `$.data.vcache.buffer.pused`
Value cache hits	The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache).	Dependent item	vcache.cache.hits Preprocessing JSON Path: `$.data.vcache.cache.hits` Change per second
Value cache misses	The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database).	Dependent item	vcache.cache.misses Preprocessing JSON Path: `$.data.vcache.cache.misses` Change per second
Value cache operating mode	The operating mode of the value cache.	Dependent item	vcache.cache.mode Preprocessing JSON Path: `$.data.vcache.cache.mode`
Zabbix server check	Flag indicating whether it is a server or not.	Dependent item	server_check Preprocessing JSON Path: `$.data.triggers` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix server.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Trend write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour.	Dependent item	wcache.trend.pused Preprocessing JSON Path: `$.data.wcache.trend.pused`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Number of values synchronized with the database per second	Average quantity of values written to the database, recalculated once per minute.	Dependent item	vps.written Preprocessing JSON Path: `$.data.vps.written_total` Change per second
LLD queue	The number of values enqueued in the low-level discovery processing queue.	Dependent item	lld_queue Preprocessing JSON Path: `$.data.lld_queue`
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing.queue`
Preprocessing throughput	Reflects the throughput of the preprocessing.	Dependent item	preprocessing_throughput Preprocessing JSON Path: `$.data.preprocessing.queued.size` ⛔️Custom on fail: Discard value Change per second
Connector queue	The count of values enqueued in the connector queue.	Dependent item	connector_queue Preprocessing JSON Path: `$.data.connector_queue` ⛔️Custom on fail: Set error to: `No "connector" processes started. Please check "StartConnectors" parameter in the server configuration file.`
Discovery queue	The count of values enqueued in the discovery queue.	Dependent item	discovery_queue Preprocessing JSON Path: `$.data.discovery_queue` ⛔️Custom on fail: Set error to: `No "discoverer" processes started. Please check "StartDiscoverers" parameter in the server configuration file.`

Triggers

Name	Description	Expression	Severity
Zabbix server: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Remote Zabbix server health/zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix server: Utilization of alert manager processes is high	Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.alert_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alert syncer processes is high	Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.alert_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alerter processes is high	Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.alerter.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"}`\|Average	Manual close: Yes
Zabbix server: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.availability_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer worker processes is high	Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.configuration_syncer_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of escalator processes is high	Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.escalator.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history poller processes is high	Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.history_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.history_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.http_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.java_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD manager processes is high	Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.lld_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD worker processes is high	Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.lld_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector manager processes is high	Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.connector_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector worker processes is high	Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.connector_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.discovery_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.discovery_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy poller processes is high	Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.proxy_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy group manager processes is high	Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.proxy_group_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report manager processes is high	Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.report_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report writer processes is high	Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.report_writer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.task_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of timer processes is high	Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.timer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of service manager processes is high	Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.service_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trigger housekeeper processes is high	Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.trigger_housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix server: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.snmp_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.internal_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix server health/process.browser_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix server: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/rcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix server: Failed to fetch stats data	Zabbix has not received statistics data for `{$ZABBIX.SERVER.NODATA_TIMEOUT}`.	`nodata(/Remote Zabbix server health/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1`\|Warning
Zabbix server: Excessive value cache usage	Consider increasing `ValueCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/vcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"}`\|Average	Manual close: Yes
Zabbix server: Zabbix value cache working in low-memory mode	Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner.	`last(/Remote Zabbix server health/vcache.cache.mode)=1`\|High	Manual close: Yes
Zabbix server: Wrong template assigned	Check that the template has been selected correctly.	`last(/Remote Zabbix server health/server_check)=0`\|Disaster	Manual close: Yes
Zabbix server: Version has changed	Zabbix server version has changed. Acknowledge to close the problem manually.	`last(/Remote Zabbix server health/version,#1)<>last(/Remote Zabbix server health/version,#2) and length(last(/Remote Zabbix server health/version))>0`\|Info	Manual close: Yes
Zabbix server: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/vmware.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/wcache.history.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/wcache.index.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive trends cache usage	Consider increasing `TrendCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/wcache.trend.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"}`\|Average	Manual close: Yes

LLD rule Zabbix proxy discovery

Name Description Type Key and additional info

Zabbix proxy discovery

Name	Description	Type	Key and additional info
Zabbix proxy discovery	LLD rule with item and trigger prototypes for proxy discovery.	Dependent item	zabbix.proxy.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for proxy discovery.

Dependent item

zabbix.proxy.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Zabbix proxy discovery

Name	Description	Type	Key and additional info
Proxy [{#PROXY.NAME}]: Stats	The statistics for the discovered proxy.	Dependent item	zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing JSON Path: `$.[?(@.name=="{#PROXY.NAME}")].first()`
Proxy [{#PROXY.NAME}]: Mode	The mode of Zabbix proxy.	Dependent item	zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing JSON Path: `$.passive` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Unencrypted	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing JSON Path: `$.unencrypted` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: PSK	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing JSON Path: `$.psk` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Certificate	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing JSON Path: `$.cert` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Compression	The compression status of a proxy.	Dependent item	zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing JSON Path: `$.compression` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Item count	The number of enabled items on enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.items[{#PROXY.NAME}] Preprocessing JSON Path: `$.items` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Host count	The number of enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing JSON Path: `$.hosts` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Version	A version of Zabbix proxy.	Dependent item	zabbix.proxy.version[{#PROXY.NAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Last seen, in seconds	The time when a proxy was last seen by a server.	Dependent item	zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing JSON Path: `$.last_seen`
Proxy [{#PROXY.NAME}]: Compatibility	Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version).	Dependent item	zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing JSON Path: `$.compatibility` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing JSON Path: `$.requiredperformance` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy discovery

Name	Description	Expression
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen	Zabbix proxy is not updating the configuration data.	`last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX}`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen	Zabbix proxy is not updating the configuration data.	`last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated	Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available.	`last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported	Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version.	`last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3`\|High

LLD rule High availability cluster node discovery

Name Description Type Key and additional info

High availability cluster node discovery

Name	Description	Type	Key and additional info
High availability cluster node discovery	LLD rule with item and trigger prototypes for node discovery.	Dependent item	zabbix.nodes.discovery Preprocessing JSON Path: `$.data.ha` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for node discovery.

Dependent item

zabbix.nodes.discovery

Preprocessing

JSON Path: $.data.ha
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for High availability cluster node discovery

Name	Description	Type	Key and additional info
Cluster node [{#NODE.NAME}]: Stats	Provides the statistics of a node.	Dependent item	zabbix.node.stats[{#NODE.ID}] Preprocessing JSON Path: `$.data.ha[?(@.id=="{#NODE.ID}")].first()`
Cluster node [{#NODE.NAME}]: Address	The IPv4 address of a node.	Dependent item	zabbix.node.address[{#NODE.ID}] Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `12h`
Cluster node [{#NODE.NAME}]: Last access time	Last access time.	Dependent item	zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess`
Cluster node [{#NODE.NAME}]: Last access age	The time between the database's `unix_timestamp()` and the last access time.	Dependent item	zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess_age`
Cluster node [{#NODE.NAME}]: Status	The status of a node.	Dependent item	zabbix.nodes.status[{#NODE.ID}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `12h`

Trigger prototypes for High availability cluster node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed	The state of the node has changed. Acknowledge to close the problem manually.	`last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2)`\|Info	Manual close: Yes

Zabbix server health by Zabbix agent

Overview

This template is designed to monitor Zabbix server metrics via the passive Zabbix agent.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix server 7.0

Configuration

Setup

Macros used

Name	Description	Default
{$ZABBIX.SERVER.ADDRESS}	IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1).
{$ZABBIX.SERVER.PORT}	Port of server to be remotely queried (default is 10051).
{$ZABBIX.PROXY.LAST_SEEN.MAX}	The maximum number of seconds that Zabbix proxy has not been seen.	`600`
{$ZABBIX.SERVER.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expressions.	`5m`
{$ZABBIX.SERVER.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.SERVER.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.SERVER.UTIL.MAX:"value cache"}	Maximum threshold for value cache utilization triggers.	`95`

Items

Name	Description	Type	Key and additional info
Zabbix stats	The master item of Zabbix server statistics.	Zabbix agent	zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}]
Zabbix proxies stats	The master item of Zabbix proxies' statistics.	Dependent item	zabbix.proxies.stats Preprocessing JSON Path: `$.data.proxy`
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix agent	zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix agent	zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing JSON Path: `$.queue`
Utilization of alert manager internal processes, in %	The average percentage of the time during which the alert manager processes have been busy for the last minute.	Dependent item	process.alert_manager.avg.busy Preprocessing JSON Path: `$.data.process['alert manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert manager" processes started.`
Utilization of alert syncer internal processes, in %	The average percentage of the time during which the alert syncer processes have been busy for the last minute.	Dependent item	process.alert_syncer.avg.busy Preprocessing JSON Path: `$.data.process['alert syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert syncer" processes started.`
Utilization of alerter internal processes, in %	The average percentage of the time during which the alerter processes have been busy for the last minute.	Dependent item	process.alerter.avg.busy Preprocessing JSON Path: `$.data.process['alerter'].busy.avg` ⛔️Custom on fail: Set error to: `No "alerter" processes started.`
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "availability manager" processes started.`
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "configuration syncer" processes started.`
Utilization of configuration syncer worker internal processes, in %	The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute.	Dependent item	process.configurationsyncerworker.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer worker'].busy.avg`
Utilization of escalator internal processes, in %	The average percentage of the time during which the escalator processes have been busy for the last minute.	Dependent item	process.escalator.avg.busy Preprocessing JSON Path: `$.data.process['escalator'].busy.avg` ⛔️Custom on fail: Set error to: `No "escalator" processes started.`
Utilization of history poller internal processes, in %	The average percentage of the time during which the history poller processes have been busy for the last minute.	Dependent item	process.history_poller.avg.busy Preprocessing JSON Path: `$.data.process['history poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "history poller" processes started.`
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "history syncer" processes started.`
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "housekeeper" processes started.`
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http poller" processes started.`
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `No "icmp pinger" processes started.`
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi manager" processes started.`
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi poller" processes started.`
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "java poller" processes started.`
Utilization of LLD manager internal processes, in %	The average percentage of the time during which the LLD manager processes have been busy for the last minute.	Dependent item	process.lld_manager.avg.busy Preprocessing JSON Path: `$.data.process['lld manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD manager" processes started.`
Utilization of LLD worker internal processes, in %	The average percentage of the time during which the LLD worker processes have been busy for the last minute.	Dependent item	process.lld_worker.avg.busy Preprocessing JSON Path: `$.data.process['lld worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD worker" processes started.`
Utilization of connector manager internal processes, in %	The average percentage of the time during which the connector manager processes have been busy for the last minute.	Dependent item	process.connector_manager.avg.busy Preprocessing JSON Path: `$.data.process['connector manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector manager" processes started.`
Utilization of connector worker internal processes, in %	The average percentage of the time during which the connector worker processes have been busy for the last minute.	Dependent item	process.connector_worker.avg.busy Preprocessing JSON Path: `$.data.process['connector worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector worker" processes started.`
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Dependent item	process.discovery_manager.avg.busy Preprocessing JSON Path: `$.data.process['discovery manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery manager" processes started.`
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Dependent item	process.discovery_worker.avg.busy Preprocessing JSON Path: `$.data.process['discovery worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery worker" processes started.`
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "poller" processes started.`
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing worker" processes started.`
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing manager" processes started.`
Utilization of proxy poller data collector processes, in %	The average percentage of the time during which the proxy poller processes have been busy for the last minute.	Dependent item	process.proxy_poller.avg.busy Preprocessing JSON Path: `$.data.process['proxy poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "proxy poller" processes started.`
Utilization of proxy group manager internal processes, in %	The average percentage of the time during which the proxy group manager processes have been busy for the last minute.	Dependent item	process.proxygroupmanager.avg.busy Preprocessing JSON Path: `$.data.process['proxy group manager'].busy.avg`
Utilization of report manager internal processes, in %	The average percentage of the time during which the report manager processes have been busy for the last minute.	Dependent item	process.report_manager.avg.busy Preprocessing JSON Path: `$.data.process['report manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "report manager" processes started.`
Utilization of report writer internal processes, in %	The average percentage of the time during which the report writer processes have been busy for the last minute.	Dependent item	process.report_writer.avg.busy Preprocessing JSON Path: `$.data.process['report writer'].busy.avg` ⛔️Custom on fail: Set error to: `No "report writer" processes started.`
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `No "self-monitoring" processes started.`
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp trapper" processes started.`
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "task manager" processes started.`
Utilization of timer internal processes, in %	The average percentage of the time during which the timer processes have been busy for the last minute.	Dependent item	process.timer.avg.busy Preprocessing JSON Path: `$.data.process['timer'].busy.avg` ⛔️Custom on fail: Set error to: `No "timer" processes started.`
Utilization of service manager internal processes, in %	The average percentage of the time during which the service manager processes have been busy for the last minute.	Dependent item	process.service_manager.avg.busy Preprocessing JSON Path: `$.data.process['service manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "service manager" processes started.`
Utilization of trigger housekeeper internal processes, in %	The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute.	Dependent item	process.trigger_housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['trigger housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trigger housekeeper" processes started.`
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trapper" processes started.`
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "unreachable poller" processes started.`
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Dependent item	process.agent_poller.avg.busy Preprocessing JSON Path: `$.data.process['agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "agent poller" processes started.`
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Dependent item	process.httpagentpoller.avg.busy Preprocessing JSON Path: `$.data.process['http agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http agent poller" processes started.`
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Dependent item	process.snmp_poller.avg.busy Preprocessing JSON Path: `$.data.process['snmp poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp poller" processes started.`
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Dependent item	process.internal_poller.avg.busy Preprocessing JSON Path: `$.data.process['internal poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "internal poller" processes started.`
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Dependent item	process.browser_poller.avg.busy Preprocessing JSON Path: `$.data.process['browser poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "browser poller" processes started.`
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Trend function cache, % of unique requests	The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced.	Dependent item	tcache.pitems Preprocessing JSON Path: `$.data.tcache.pitems` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Trend function cache, % of misses	The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses.	Dependent item	tcache.pmisses Preprocessing JSON Path: `$.data.tcache.pmisses` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Value cache, % used	The availability statistics of Zabbix value cache. The percentage of used data buffer.	Dependent item	vcache.buffer.pused Preprocessing JSON Path: `$.data.vcache.buffer.pused`
Value cache hits	The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache).	Dependent item	vcache.cache.hits Preprocessing JSON Path: `$.data.vcache.cache.hits` Change per second
Value cache misses	The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database).	Dependent item	vcache.cache.misses Preprocessing JSON Path: `$.data.vcache.cache.misses` Change per second
Value cache operating mode	The operating mode of the value cache.	Dependent item	vcache.cache.mode Preprocessing JSON Path: `$.data.vcache.cache.mode`
Zabbix server check	Flag indicating whether it is a server or not.	Dependent item	server_check Preprocessing JSON Path: `$.data.triggers` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix server.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Trend write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour.	Dependent item	wcache.trend.pused Preprocessing JSON Path: `$.data.wcache.trend.pused`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Number of values synchronized with the database per second	Average quantity of values written to the database, recalculated once per minute.	Dependent item	vps.written Preprocessing JSON Path: `$.data.vps.written_total` Change per second
LLD queue	The number of values enqueued in the low-level discovery processing queue.	Dependent item	lld_queue Preprocessing JSON Path: `$.data.lld_queue`
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing.queue`
Preprocessing throughput	Reflects the throughput of the preprocessing.	Dependent item	preprocessing_throughput Preprocessing JSON Path: `$.data.preprocessing.queued.size` ⛔️Custom on fail: Discard value Change per second
Connector queue	The count of values enqueued in the connector queue.	Dependent item	connector_queue Preprocessing JSON Path: `$.data.connector_queue` ⛔️Custom on fail: Set error to: `No "connector" processes started. Please check "StartConnectors" parameter in the server configuration file.`
Discovery queue	The count of values enqueued in the discovery queue.	Dependent item	discovery_queue Preprocessing JSON Path: `$.data.discovery_queue` ⛔️Custom on fail: Set error to: `No "discoverer" processes started. Please check "StartDiscoverers" parameter in the server configuration file.`

Triggers

Name	Description	Expression	Severity
Zabbix server: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Zabbix server health by Zabbix agent/zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix server: Utilization of alert manager processes is high	Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.alert_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alert syncer processes is high	Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.alert_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alerter processes is high	Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.alerter.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"}`\|Average	Manual close: Yes
Zabbix server: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.availability_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer worker processes is high	Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.configuration_syncer_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of escalator processes is high	Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.escalator.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history poller processes is high	Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.history_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.odbc_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.history_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.http_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.java_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD manager processes is high	Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.lld_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD worker processes is high	Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.lld_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector manager processes is high	Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.connector_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector worker processes is high	Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.connector_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.discovery_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.discovery_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy poller processes is high	Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.proxy_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy group manager processes is high	Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.proxy_group_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report manager processes is high	Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.report_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report writer processes is high	Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.report_writer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.self-monitoring.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.task_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of timer processes is high	Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.timer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of service manager processes is high	Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.service_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trigger housekeeper processes is high	Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.trigger_housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.vmware_collector.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix server: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.snmp_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.internal_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent/process.browser_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix server: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent/rcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix server: Failed to fetch stats data	Zabbix has not received statistics data for `{$ZABBIX.SERVER.NODATA_TIMEOUT}`.	`nodata(/Zabbix server health by Zabbix agent/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1`\|Warning
Zabbix server: Excessive value cache usage	Consider increasing `ValueCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent/vcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"}`\|Average	Manual close: Yes
Zabbix server: Zabbix value cache working in low-memory mode	Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner.	`last(/Zabbix server health by Zabbix agent/vcache.cache.mode)=1`\|High	Manual close: Yes
Zabbix server: Wrong template assigned	Check that the template has been selected correctly.	`last(/Zabbix server health by Zabbix agent/server_check)=0`\|Disaster	Manual close: Yes
Zabbix server: Version has changed	Zabbix server version has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health by Zabbix agent/version,#1)<>last(/Zabbix server health by Zabbix agent/version,#2) and length(last(/Zabbix server health by Zabbix agent/version))>0`\|Info	Manual close: Yes
Zabbix server: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent/vmware.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent/wcache.history.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent/wcache.index.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive trends cache usage	Consider increasing `TrendCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent/wcache.trend.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"}`\|Average	Manual close: Yes

LLD rule Zabbix proxy discovery

Name Description Type Key and additional info

Zabbix proxy discovery

Name	Description	Type	Key and additional info
Zabbix proxy discovery	LLD rule with item and trigger prototypes for proxy discovery.	Dependent item	zabbix.proxy.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for proxy discovery.

Dependent item

zabbix.proxy.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Zabbix proxy discovery

Name	Description	Type	Key and additional info
Proxy [{#PROXY.NAME}]: Stats	The statistics for the discovered proxy.	Dependent item	zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing JSON Path: `$.[?(@.name=="{#PROXY.NAME}")].first()`
Proxy [{#PROXY.NAME}]: Mode	The mode of Zabbix proxy.	Dependent item	zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing JSON Path: `$.passive` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Unencrypted	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing JSON Path: `$.unencrypted` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: PSK	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing JSON Path: `$.psk` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Certificate	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing JSON Path: `$.cert` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Compression	The compression status of a proxy.	Dependent item	zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing JSON Path: `$.compression` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Item count	The number of enabled items on enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.items[{#PROXY.NAME}] Preprocessing JSON Path: `$.items` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Host count	The number of enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing JSON Path: `$.hosts` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Version	A version of Zabbix proxy.	Dependent item	zabbix.proxy.version[{#PROXY.NAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Last seen, in seconds	The time when a proxy was last seen by a server.	Dependent item	zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing JSON Path: `$.last_seen`
Proxy [{#PROXY.NAME}]: Compatibility	Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version).	Dependent item	zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing JSON Path: `$.compatibility` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing JSON Path: `$.requiredperformance` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy discovery

Name	Description	Expression
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health by Zabbix agent/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX}`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health by Zabbix agent/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated	Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available.	`last(/Zabbix server health by Zabbix agent/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported	Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version.	`last(/Zabbix server health by Zabbix agent/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3`\|High

LLD rule High availability cluster node discovery

Name Description Type Key and additional info

High availability cluster node discovery

Name	Description	Type	Key and additional info
High availability cluster node discovery	LLD rule with item and trigger prototypes for node discovery.	Dependent item	zabbix.nodes.discovery Preprocessing JSON Path: `$.data.ha` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for node discovery.

Dependent item

zabbix.nodes.discovery

Preprocessing

JSON Path: $.data.ha
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for High availability cluster node discovery

Name	Description	Type	Key and additional info
Cluster node [{#NODE.NAME}]: Stats	Provides the statistics of a node.	Dependent item	zabbix.node.stats[{#NODE.ID}] Preprocessing JSON Path: `$.data.ha[?(@.id=="{#NODE.ID}")].first()`
Cluster node [{#NODE.NAME}]: Address	The IPv4 address of a node.	Dependent item	zabbix.node.address[{#NODE.ID}] Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `12h`
Cluster node [{#NODE.NAME}]: Last access time	Last access time.	Dependent item	zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess`
Cluster node [{#NODE.NAME}]: Last access age	The time between the database's `unix_timestamp()` and the last access time.	Dependent item	zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess_age`
Cluster node [{#NODE.NAME}]: Status	The status of a node.	Dependent item	zabbix.nodes.status[{#NODE.ID}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `12h`

Trigger prototypes for High availability cluster node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed	The state of the node has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health by Zabbix agent/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health by Zabbix agent/zabbix.nodes.status[{#NODE.ID}],#2)`\|Info	Manual close: Yes

Zabbix server health by Zabbix agent active

Overview

This template is designed to monitor Zabbix server metrics via the active Zabbix agent.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix server 7.0

Configuration

Setup

Macros used

Name	Description	Default
{$ZABBIX.SERVER.ADDRESS}	IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1).
{$ZABBIX.SERVER.PORT}	Port of server to be remotely queried (default is 10051).
{$ZABBIX.PROXY.LAST_SEEN.MAX}	The maximum number of seconds that Zabbix proxy has not been seen.	`600`
{$ZABBIX.SERVER.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expression.	`5m`
{$ZABBIX.SERVER.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.SERVER.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.SERVER.UTIL.MAX:"value cache"}	Maximum threshold for value cache utilization triggers.	`95`

Items

Name	Description	Type	Key and additional info
Zabbix stats	The master item of Zabbix server statistics.	Zabbix agent (active)	zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}]
Zabbix proxies stats	The master item of Zabbix proxies' statistics.	Dependent item	zabbix.proxies.stats Preprocessing JSON Path: `$.data.proxy`
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix agent (active)	zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix agent (active)	zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing JSON Path: `$.queue`
Utilization of alert manager internal processes, in %	The average percentage of the time during which the alert manager processes have been busy for the last minute.	Dependent item	process.alert_manager.avg.busy Preprocessing JSON Path: `$.data.process['alert manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert manager" processes started.`
Utilization of alert syncer internal processes, in %	The average percentage of the time during which the alert syncer processes have been busy for the last minute.	Dependent item	process.alert_syncer.avg.busy Preprocessing JSON Path: `$.data.process['alert syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert syncer" processes started.`
Utilization of alerter internal processes, in %	The average percentage of the time during which the alerter processes have been busy for the last minute.	Dependent item	process.alerter.avg.busy Preprocessing JSON Path: `$.data.process['alerter'].busy.avg` ⛔️Custom on fail: Set error to: `No "alerter" processes started.`
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "availability manager" processes started.`
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "configuration syncer" processes started.`
Utilization of configuration syncer worker internal processes, in %	The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute.	Dependent item	process.configurationsyncerworker.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer worker'].busy.avg`
Utilization of escalator internal processes, in %	The average percentage of the time during which the escalator processes have been busy for the last minute.	Dependent item	process.escalator.avg.busy Preprocessing JSON Path: `$.data.process['escalator'].busy.avg` ⛔️Custom on fail: Set error to: `No "escalator" processes started.`
Utilization of history poller internal processes, in %	The average percentage of the time during which the history poller processes have been busy for the last minute.	Dependent item	process.history_poller.avg.busy Preprocessing JSON Path: `$.data.process['history poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "history poller" processes started.`
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "history syncer" processes started.`
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "housekeeper" processes started.`
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http poller" processes started.`
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `No "icmp pinger" processes started.`
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi manager" processes started.`
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi poller" processes started.`
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "java poller" processes started.`
Utilization of LLD manager internal processes, in %	The average percentage of the time during which the LLD manager processes have been busy for the last minute.	Dependent item	process.lld_manager.avg.busy Preprocessing JSON Path: `$.data.process['lld manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD manager" processes started.`
Utilization of LLD worker internal processes, in %	The average percentage of the time during which the LLD worker processes have been busy for the last minute.	Dependent item	process.lld_worker.avg.busy Preprocessing JSON Path: `$.data.process['lld worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD worker" processes started.`
Utilization of connector manager internal processes, in %	The average percentage of the time during which the connector manager processes have been busy for the last minute.	Dependent item	process.connector_manager.avg.busy Preprocessing JSON Path: `$.data.process['connector manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector manager" processes started.`
Utilization of connector worker internal processes, in %	The average percentage of the time during which the connector worker processes have been busy for the last minute.	Dependent item	process.connector_worker.avg.busy Preprocessing JSON Path: `$.data.process['connector worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector worker" processes started.`
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Dependent item	process.discovery_manager.avg.busy Preprocessing JSON Path: `$.data.process['discovery manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery manager" processes started.`
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Dependent item	process.discovery_worker.avg.busy Preprocessing JSON Path: `$.data.process['discovery worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery worker" processes started.`
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "poller" processes started.`
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing worker" processes started.`
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing manager" processes started.`
Utilization of proxy poller data collector processes, in %	The average percentage of the time during which the proxy poller processes have been busy for the last minute.	Dependent item	process.proxy_poller.avg.busy Preprocessing JSON Path: `$.data.process['proxy poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "proxy poller" processes started.`
Utilization of proxy group manager internal processes, in %	The average percentage of the time during which the proxy group manager processes have been busy for the last minute.	Dependent item	process.proxygroupmanager.avg.busy Preprocessing JSON Path: `$.data.process['proxy group manager'].busy.avg`
Utilization of report manager internal processes, in %	The average percentage of the time during which the report manager processes have been busy for the last minute.	Dependent item	process.report_manager.avg.busy Preprocessing JSON Path: `$.data.process['report manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "report manager" processes started.`
Utilization of report writer internal processes, in %	The average percentage of the time during which the report writer processes have been busy for the last minute.	Dependent item	process.report_writer.avg.busy Preprocessing JSON Path: `$.data.process['report writer'].busy.avg` ⛔️Custom on fail: Set error to: `No "report writer" processes started.`
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `No "self-monitoring" processes started.`
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp trapper" processes started.`
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "task manager" processes started.`
Utilization of timer internal processes, in %	The average percentage of the time during which the timer processes have been busy for the last minute.	Dependent item	process.timer.avg.busy Preprocessing JSON Path: `$.data.process['timer'].busy.avg` ⛔️Custom on fail: Set error to: `No "timer" processes started.`
Utilization of service manager internal processes, in %	The average percentage of the time during which the service manager processes have been busy for the last minute.	Dependent item	process.service_manager.avg.busy Preprocessing JSON Path: `$.data.process['service manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "service manager" processes started.`
Utilization of trigger housekeeper internal processes, in %	The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute.	Dependent item	process.trigger_housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['trigger housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trigger housekeeper" processes started.`
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trapper" processes started.`
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "unreachable poller" processes started.`
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Dependent item	process.agent_poller.avg.busy Preprocessing JSON Path: `$.data.process['agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "agent poller" processes started.`
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Dependent item	process.httpagentpoller.avg.busy Preprocessing JSON Path: `$.data.process['http agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http agent poller" processes started.`
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Dependent item	process.snmp_poller.avg.busy Preprocessing JSON Path: `$.data.process['snmp poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp poller" processes started.`
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Dependent item	process.internal_poller.avg.busy Preprocessing JSON Path: `$.data.process['internal poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "internal poller" processes started.`
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Dependent item	process.browser_poller.avg.busy Preprocessing JSON Path: `$.data.process['browser poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "browser poller" processes started.`
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Trend function cache, % of unique requests	The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced.	Dependent item	tcache.pitems Preprocessing JSON Path: `$.data.tcache.pitems` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Trend function cache, % of misses	The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses.	Dependent item	tcache.pmisses Preprocessing JSON Path: `$.data.tcache.pmisses` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Value cache, % used	The availability statistics of Zabbix value cache. The percentage of used data buffer.	Dependent item	vcache.buffer.pused Preprocessing JSON Path: `$.data.vcache.buffer.pused`
Value cache hits	The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache).	Dependent item	vcache.cache.hits Preprocessing JSON Path: `$.data.vcache.cache.hits` Change per second
Value cache misses	The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database).	Dependent item	vcache.cache.misses Preprocessing JSON Path: `$.data.vcache.cache.misses` Change per second
Value cache operating mode	The operating mode of the value cache.	Dependent item	vcache.cache.mode Preprocessing JSON Path: `$.data.vcache.cache.mode`
Zabbix server check	Flag indicating whether it is a server or not.	Dependent item	server_check Preprocessing JSON Path: `$.data.triggers` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix server.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Trend write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour.	Dependent item	wcache.trend.pused Preprocessing JSON Path: `$.data.wcache.trend.pused`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Number of values synchronized with the database per second	Average quantity of values written to the database, recalculated once per minute.	Dependent item	vps.written Preprocessing JSON Path: `$.data.vps.written_total` Change per second
LLD queue	The number of values enqueued in the low-level discovery processing queue.	Dependent item	lld_queue Preprocessing JSON Path: `$.data.lld_queue`
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing.queue`
Preprocessing throughput	Reflects the throughput of the preprocessing.	Dependent item	preprocessing_throughput Preprocessing JSON Path: `$.data.preprocessing.queued.size` ⛔️Custom on fail: Discard value Change per second
Connector queue	The count of values enqueued in the connector queue.	Dependent item	connector_queue Preprocessing JSON Path: `$.data.connector_queue` ⛔️Custom on fail: Set error to: `No "connector" processes started. Please check "StartConnectors" parameter in the server configuration file.`
Discovery queue	The count of values enqueued in the discovery queue.	Dependent item	discovery_queue Preprocessing JSON Path: `$.data.discovery_queue` ⛔️Custom on fail: Set error to: `No "discoverer" processes started. Please check "StartDiscoverers" parameter in the server configuration file.`

Triggers

Name	Description	Expression	Severity
Zabbix server: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Zabbix server health by Zabbix agent active/zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix server: Utilization of alert manager processes is high	Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.alert_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alert syncer processes is high	Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.alert_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of alerter processes is high	Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.alerter.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"}`\|Average	Manual close: Yes
Zabbix server: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.availability_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer worker processes is high	Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.configuration_syncer_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of escalator processes is high	Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.escalator.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history poller processes is high	Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.history_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.odbc_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.history_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.http_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.java_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD manager processes is high	Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.lld_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of LLD worker processes is high	Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.lld_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector manager processes is high	Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.connector_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of connector worker processes is high	Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.connector_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.discovery_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.discovery_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy poller processes is high	Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.proxy_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy group manager processes is high	Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.proxy_group_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report manager processes is high	Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.report_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of report writer processes is high	Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.report_writer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.self-monitoring.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.task_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of timer processes is high	Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.timer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"}`\|Average	Manual close: Yes
Zabbix server: Utilization of service manager processes is high	Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.service_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trigger housekeeper processes is high	Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.trigger_housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix server: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.vmware_collector.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix server: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.snmp_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.internal_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix server: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix server health by Zabbix agent active/process.browser_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix server: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent active/rcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix server: Failed to fetch stats data	Zabbix has not received statistics data for `{$ZABBIX.SERVER.NODATA_TIMEOUT}`.	`nodata(/Zabbix server health by Zabbix agent active/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1`\|Warning
Zabbix server: Excessive value cache usage	Consider increasing `ValueCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent active/vcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"}`\|Average	Manual close: Yes
Zabbix server: Zabbix value cache working in low-memory mode	Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner.	`last(/Zabbix server health by Zabbix agent active/vcache.cache.mode)=1`\|High	Manual close: Yes
Zabbix server: Wrong template assigned	Check that the template has been selected correctly.	`last(/Zabbix server health by Zabbix agent active/server_check)=0`\|Disaster	Manual close: Yes
Zabbix server: Version has changed	Zabbix server version has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health by Zabbix agent active/version,#1)<>last(/Zabbix server health by Zabbix agent active/version,#2) and length(last(/Zabbix server health by Zabbix agent active/version))>0`\|Info	Manual close: Yes
Zabbix server: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent active/vmware.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent active/wcache.history.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent active/wcache.index.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix server: Excessive trends cache usage	Consider increasing `TrendCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health by Zabbix agent active/wcache.trend.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"}`\|Average	Manual close: Yes

LLD rule Zabbix proxy discovery

Name Description Type Key and additional info

Zabbix proxy discovery

Name	Description	Type	Key and additional info
Zabbix proxy discovery	LLD rule with item and trigger prototypes for proxy discovery.	Dependent item	zabbix.proxy.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for proxy discovery.

Dependent item

zabbix.proxy.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Zabbix proxy discovery

Name	Description	Type	Key and additional info
Proxy [{#PROXY.NAME}]: Stats	The statistics for the discovered proxy.	Dependent item	zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing JSON Path: `$.[?(@.name=="{#PROXY.NAME}")].first()`
Proxy [{#PROXY.NAME}]: Mode	The mode of Zabbix proxy.	Dependent item	zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing JSON Path: `$.passive` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Unencrypted	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing JSON Path: `$.unencrypted` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: PSK	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing JSON Path: `$.psk` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Certificate	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing JSON Path: `$.cert` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Compression	The compression status of a proxy.	Dependent item	zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing JSON Path: `$.compression` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Item count	The number of enabled items on enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.items[{#PROXY.NAME}] Preprocessing JSON Path: `$.items` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Host count	The number of enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing JSON Path: `$.hosts` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Version	A version of Zabbix proxy.	Dependent item	zabbix.proxy.version[{#PROXY.NAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Last seen, in seconds	The time when a proxy was last seen by a server.	Dependent item	zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing JSON Path: `$.last_seen`
Proxy [{#PROXY.NAME}]: Compatibility	Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version).	Dependent item	zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing JSON Path: `$.compatibility` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing JSON Path: `$.requiredperformance` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy discovery

Name	Description	Expression
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health by Zabbix agent active/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX}`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health by Zabbix agent active/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated	Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available.	`last(/Zabbix server health by Zabbix agent active/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2`\|Warning
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported	Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version.	`last(/Zabbix server health by Zabbix agent active/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3`\|High

LLD rule High availability cluster node discovery

Name Description Type Key and additional info

High availability cluster node discovery

Name	Description	Type	Key and additional info
High availability cluster node discovery	LLD rule with item and trigger prototypes for node discovery.	Dependent item	zabbix.nodes.discovery Preprocessing JSON Path: `$.data.ha` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

LLD rule with item and trigger prototypes for node discovery.

Dependent item

zabbix.nodes.discovery

Preprocessing

JSON Path: $.data.ha
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for High availability cluster node discovery

Name	Description	Type	Key and additional info
Cluster node [{#NODE.NAME}]: Stats	Provides the statistics of a node.	Dependent item	zabbix.node.stats[{#NODE.ID}] Preprocessing JSON Path: `$.data.ha[?(@.id=="{#NODE.ID}")].first()`
Cluster node [{#NODE.NAME}]: Address	The IPv4 address of a node.	Dependent item	zabbix.node.address[{#NODE.ID}] Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `12h`
Cluster node [{#NODE.NAME}]: Last access time	Last access time.	Dependent item	zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess`
Cluster node [{#NODE.NAME}]: Last access age	The time between the database's `unix_timestamp()` and the last access time.	Dependent item	zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess_age`
Cluster node [{#NODE.NAME}]: Status	The status of a node.	Dependent item	zabbix.nodes.status[{#NODE.ID}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `12h`

Trigger prototypes for High availability cluster node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed	The state of the node has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health by Zabbix agent active/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health by Zabbix agent active/zabbix.nodes.status[{#NODE.ID}],#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_zabbix_proxy

View README Download JSON

Zabbix proxy health

Overview

This template is designed to monitor internal Zabbix metrics on the local Zabbix proxy.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix proxy 7.0

Configuration

Setup

Link this template to the local Zabbix proxy host.

Macros used

Name	Description	Default
{$ZABBIX.PROXY.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.PROXY.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`

Items

Name	Description	Type	Key and additional info
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix internal	zabbix[queue,10m]
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix internal	zabbix[queue]
Utilization of data sender internal processes, in %	The average percentage of the time during which the data sender processes have been busy for the last minute.	Zabbix internal	zabbix[process,data sender,avg,busy]
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,availability manager,avg,busy]
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,configuration syncer,avg,busy]
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,discovery manager,avg,busy]
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,discovery worker,avg,busy]
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,odbc poller,avg,busy]
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,history syncer,avg,busy]
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Zabbix internal	zabbix[process,housekeeper,avg,busy]
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,http poller,avg,busy]
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Zabbix internal	zabbix[process,icmp pinger,avg,busy]
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,ipmi manager,avg,busy]
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,ipmi poller,avg,busy]
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,java poller,avg,busy]
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,poller,avg,busy]
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,preprocessing worker,avg,busy]
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,preprocessing manager,avg,busy]
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Zabbix internal	zabbix[process,self-monitoring,avg,busy]
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Zabbix internal	zabbix[process,snmp trapper,avg,busy]
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,task manager,avg,busy]
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Zabbix internal	zabbix[process,trapper,avg,busy]
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,unreachable poller,avg,busy]
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Zabbix internal	zabbix[process,vmware collector,avg,busy]
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,agent poller,avg,busy]
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,http agent poller,avg,busy]
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,snmp poller,avg,busy]
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,internal poller,avg,busy]
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,browser poller,avg,busy]
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Zabbix internal	zabbix[rcache,buffer,pused]
Zabbix proxy check	Flag indicating whether it is a proxy or not.	Zabbix internal	zabbix[triggers] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix proxy.	Zabbix internal	zabbix[version] Preprocessing Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Zabbix internal	zabbix[vmware,buffer,pused]
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Zabbix internal	zabbix[wcache,history,pused]
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Zabbix internal	zabbix[wcache,index,pused]
Proxy memory buffer, % used	Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database.	Zabbix internal	zabbix[proxy_buffer,buffer,pused]
Proxy buffer, state	The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory.	Zabbix internal	zabbix[proxy_buffer,state,current] Preprocessing Discard unchanged with heartbeat: `1h`
Proxy buffer, state changes	The number of state changes between disk/memory buffer modes since proxy start.	Zabbix internal	zabbix[proxy_buffer,state,changes] Preprocessing Discard unchanged with heartbeat: `1h`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Zabbix internal	zabbix[wcache,values] Preprocessing Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Zabbix internal	zabbix[wcache,values,uint] Preprocessing Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Zabbix internal	zabbix[wcache,values,float] Preprocessing Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Zabbix internal	zabbix[wcache,values,log] Preprocessing Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Zabbix internal	zabbix[wcache,values,not supported] Preprocessing Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Zabbix internal	zabbix[wcache,values,str] Preprocessing Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Zabbix internal	zabbix[wcache,values,text] Preprocessing Change per second
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Zabbix internal	zabbix[preprocessing_queue]
Discovery queue	The count of values enqueued in the discovery queue.	Zabbix internal	zabbix[discovery_queue]
Values waiting to be sent	The number of values in the proxy history table waiting to be sent to the server.	Zabbix internal	zabbix[proxy_history]
Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Zabbix internal	zabbix[requiredperformance] Preprocessing Discard unchanged with heartbeat: `12h`
Uptime	Uptime of the Zabbix proxy process in seconds.	Zabbix internal	zabbix[uptime]

Triggers

Name	Description	Expression	Severity
Zabbix proxy: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Zabbix proxy health/zabbix[queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix proxy: Utilization of data sender processes is high	Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,data sender,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,discovery manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,discovery worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,agent poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,http agent poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,snmp poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,internal poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health/zabbix[process,browser poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix proxy: Wrong template assigned	Check that the template has been selected correctly.	`last(/Zabbix proxy health/zabbix[triggers])=1`\|Disaster	Manual close: Yes
Zabbix proxy: Version has changed	Zabbix proxy version has changed. Acknowledge to close the problem manually.	`last(/Zabbix proxy health/zabbix[version],#1)<>last(/Zabbix proxy health/zabbix[version],#2) and length(last(/Zabbix proxy health/zabbix[version]))>0`\|Info	Manual close: Yes
Zabbix proxy: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health/zabbix[wcache,history,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health/zabbix[wcache,index,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive proxy memory buffer usage	Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health/zabbix[proxy_buffer,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"}`\|Average	Manual close: Yes
Zabbix proxy: {HOST.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Zabbix proxy health/zabbix[uptime])<10m`\|Info	Manual close: Yes

Remote Zabbix proxy health

Overview

This template is designed to monitor internal Zabbix metrics on the remote Zabbix proxy.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix proxy 7.0

Configuration

Setup

Specify the address of the remote Zabbix proxy by updating the {$ZABBIX.PROXY.ADDRESS} and {$ZABBIX.PROXY.PORT} macros. Don't forget to adjust the StatsAllowedIP parameter in the remote proxy's configuration file to allow the collection of statistics.

Macros used

Name	Description	Default
{$ZABBIX.PROXY.ADDRESS}	IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1).
{$ZABBIX.PROXY.PORT}	Port of proxy to be remotely queried (default is 10051).
{$ZABBIX.PROXY.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.PROXY.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.PROXY.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expressions.	`5m`

Items

Name	Description	Type	Key and additional info
Zabbix stats	Zabbix server statistics master item.	Zabbix internal	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}]
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix internal	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix internal	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing JSON Path: `$.queue`
Utilization of data sender internal processes, in %	The average percentage of the time during which the data sender processes have been busy for the last minute.	Dependent item	process.data_sender.avg.busy Preprocessing JSON Path: `$.data.process['data sender'].busy.avg` ⛔️Custom on fail: Set error to: `Processes data sender not started`
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg`
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes configuration syncer not started`
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Dependent item	process.discovery_manager.avg.busy Preprocessing JSON Path: `$.data.process['discovery manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery manager" processes started.`
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Dependent item	process.discovery_worker.avg.busy Preprocessing JSON Path: `$.data.process['discovery worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery worker" processes started.`
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes history syncer not started`
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes housekeeper not started`
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http poller not started`
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `Processes icmp pinger not started`
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi manager not started`
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi poller not started`
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes java poller not started`
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes poller not started`
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing worker not started`
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing manager not started`
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `Processes self-monitoring not started`
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp trapper not started`
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes task manager not started`
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes trapper not started`
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes unreachable poller not started`
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `Processes vmware collector not started`
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Dependent item	process.agent_poller.avg.busy Preprocessing JSON Path: `$.data.process['agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes agent poller not started`
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Dependent item	process.httpagentpoller.avg.busy Preprocessing JSON Path: `$.data.process['http agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http agent poller not started`
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Dependent item	process.snmp_poller.avg.busy Preprocessing JSON Path: `$.data.process['snmp poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp poller not started`
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Dependent item	process.internal_poller.avg.busy Preprocessing JSON Path: `$.data.process['internal poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "internal poller" processes started.`
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Dependent item	process.browser_poller.avg.busy Preprocessing JSON Path: `$.data.process['browser poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "browser poller" processes started.`
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Zabbix proxy check	Flag indicating whether it is a proxy or not.	Dependent item	proxy_check Preprocessing JSON Path: `$.data.triggers` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix proxy.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No vmware collector processes started`
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Proxy memory buffer, % used	Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database.	Dependent item	proxy_buffer.buffer.pused Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].memory.pused` ⛔️Custom on fail: Set error to: `Proxy memory buffer is disabled.`
Proxy buffer, state	The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory.	Dependent item	proxy_buffer.state.current Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].state` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Proxy buffer, state changes	The number of state changes between disk/memory buffer modes since proxy start.	Dependent item	proxy_buffer.state.changes Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].['state change']` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing_queue`
Discovery queue	The count of values enqueued in the discovery queue.	Dependent item	discovery_queue Preprocessing JSON Path: `$.data.discovery_queue` ⛔️Custom on fail: Set error to: `No "discoverer" processes started. Please check "StartDiscoverers" parameter in the server configuration file.`
Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	requiredperformance Preprocessing JSON Path: `$.data.requiredperformance` Discard unchanged with heartbeat: `12h`
Uptime	Uptime of the Zabbix proxy process in seconds.	Dependent item	uptime Preprocessing JSON Path: `$.data.uptime`

Triggers

Name	Description	Expression	Severity
Zabbix proxy: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Remote Zabbix proxy health/zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix proxy: Utilization of data sender processes is high	Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.discovery_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.discovery_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.snmp_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.internal_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Remote Zabbix proxy health/process.browser_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Remote Zabbix proxy health/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix proxy: Failed to fetch stats data	Zabbix has not received statistics data for `{$ZABBIX.PROXY.NODATA_TIMEOUT}`.	`nodata(/Remote Zabbix proxy health/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1`\|Warning
Zabbix proxy: Wrong template assigned	Check that the template has been selected correctly.	`last(/Remote Zabbix proxy health/proxy_check)=1`\|Disaster	Manual close: Yes
Zabbix proxy: Version has changed	Zabbix proxy version has changed. Acknowledge to close the problem manually.	`last(/Remote Zabbix proxy health/version,#1)<>last(/Remote Zabbix proxy health/version,#2) and length(last(/Remote Zabbix proxy health/version))>0`\|Info	Manual close: Yes
Zabbix proxy: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Remote Zabbix proxy health/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Remote Zabbix proxy health/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Remote Zabbix proxy health/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive proxy memory buffer usage	Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file.	`max(/Remote Zabbix proxy health/proxy_buffer.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"}`\|Average	Manual close: Yes
Zabbix proxy: {HOST.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Remote Zabbix proxy health/uptime)<10m`\|Info	Manual close: Yes

Zabbix proxy health by Zabbix agent

Overview

This template is designed to monitor internal Zabbix metrics on the remote Zabbix proxy via the passive Zabbix agent.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix proxy 7.0

Configuration

Setup

Specify the address of the remote Zabbix proxy by changing the {$ZABBIX.PROXY.ADDRESS} and {$ZABBIX.PROXY.PORT} macros. Don't forget to adjust the StatsAllowedIP parameter in the remote proxy's configuration file to allow the collection of statistics.

Macros used

Name	Description	Default
{$ZABBIX.PROXY.ADDRESS}	IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1).
{$ZABBIX.PROXY.PORT}	Port of proxy to be remotely queried (default is 10051).
{$ZABBIX.PROXY.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.PROXY.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.PROXY.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expressions.	`5m`

Items

Name	Description	Type	Key and additional info
Zabbix stats	The master item of Zabbix proxy statistics.	Zabbix agent	zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}]
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix agent	zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix agent	zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing JSON Path: `$.queue`
Utilization of data sender internal processes, in %	The average percentage of the time during which the data sender processes have been busy for the last minute.	Dependent item	process.data_sender.avg.busy Preprocessing JSON Path: `$.data.process['data sender'].busy.avg` ⛔️Custom on fail: Set error to: `Processes data sender not started`
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg`
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes configuration syncer not started`
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Dependent item	process.discovery_manager.avg.busy Preprocessing JSON Path: `$.data.process['discovery manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery manager" processes started.`
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Dependent item	process.discovery_worker.avg.busy Preprocessing JSON Path: `$.data.process['discovery worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery worker" processes started.`
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes history syncer not started`
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes housekeeper not started`
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http poller not started`
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `Processes icmp pinger not started`
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi manager not started`
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi poller not started`
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes java poller not started`
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes poller not started`
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing worker not started`
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing manager not started`
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `Processes self-monitoring not started`
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp trapper not started`
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes task manager not started`
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes trapper not started`
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes unreachable poller not started`
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `Processes vmware collector not started`
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Dependent item	process.agent_poller.avg.busy Preprocessing JSON Path: `$.data.process['agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes agent poller not started`
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Dependent item	process.httpagentpoller.avg.busy Preprocessing JSON Path: `$.data.process['http agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http agent poller not started`
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Dependent item	process.snmp_poller.avg.busy Preprocessing JSON Path: `$.data.process['snmp poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp poller not started`
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Dependent item	process.internal_poller.avg.busy Preprocessing JSON Path: `$.data.process['internal poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "internal poller" processes started.`
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Dependent item	process.browser_poller.avg.busy Preprocessing JSON Path: `$.data.process['browser poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "browser poller" processes started.`
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Zabbix proxy check	Flag indicating whether it is a proxy or not.	Dependent item	proxy_check Preprocessing JSON Path: `$.data.triggers` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix proxy.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No vmware collector processes started`
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Proxy memory buffer, % used	Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database.	Dependent item	proxy_buffer.buffer.pused Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].memory.pused` ⛔️Custom on fail: Set error to: `Proxy memory buffer is disabled.`
Proxy buffer, state	The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory.	Dependent item	proxy_buffer.state.current Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].state` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Proxy buffer, state changes	The number of state changes between disk/memory buffer modes since proxy start.	Dependent item	proxy_buffer.state.changes Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].['state change']` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing_queue`
Discovery queue	The count of values enqueued in the discovery queue.	Dependent item	discovery_queue Preprocessing JSON Path: `$.data.discovery_queue` ⛔️Custom on fail: Set error to: `No "discoverer" processes started. Please check "StartDiscoverers" parameter in the server configuration file.`
Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	requiredperformance Preprocessing JSON Path: `$.data.requiredperformance` Discard unchanged with heartbeat: `12h`
Uptime	Uptime of the Zabbix proxy process in seconds.	Dependent item	uptime Preprocessing JSON Path: `$.data.uptime`

Triggers

Name	Description	Expression	Severity
Zabbix proxy: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Zabbix proxy health by Zabbix agent/zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix proxy: Utilization of data sender processes is high	Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.discovery_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.discovery_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.snmp_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.internal_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent/process.browser_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix proxy: Failed to fetch stats data	Zabbix has not received statistics data for `{$ZABBIX.PROXY.NODATA_TIMEOUT}`.	`nodata(/Zabbix proxy health by Zabbix agent/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1`\|Warning
Zabbix proxy: Wrong template assigned	Check that the template has been selected correctly.	`last(/Zabbix proxy health by Zabbix agent/proxy_check)=1`\|Disaster	Manual close: Yes
Zabbix proxy: Version has changed	Zabbix proxy version has changed. Acknowledge to close the problem manually.	`last(/Zabbix proxy health by Zabbix agent/version,#1)<>last(/Zabbix proxy health by Zabbix agent/version,#2) and length(last(/Zabbix proxy health by Zabbix agent/version))>0`\|Info	Manual close: Yes
Zabbix proxy: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive proxy memory buffer usage	Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health by Zabbix agent/proxy_buffer.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"}`\|Average	Manual close: Yes
Zabbix proxy: {HOST.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Zabbix proxy health by Zabbix agent/uptime)<10m`\|Info	Manual close: Yes

Zabbix proxy health by Zabbix agent active

Overview

This template is designed to monitor internal Zabbix metrics on the remote Zabbix proxy via the active Zabbix agent.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Zabbix proxy 7.0

Configuration

Setup

Macros used

Name	Description	Default
{$ZABBIX.PROXY.ADDRESS}	IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1).
{$ZABBIX.PROXY.PORT}	Port of proxy to be remotely queried (default is 10051).
{$ZABBIX.PROXY.UTIL.MAX}	Default maximum threshold for percentage utilization triggers (use macro context for specification).	`75`
{$ZABBIX.PROXY.UTIL.MIN}	Default minimum threshold for percentage utilization triggers (use macro context for specification).	`65`
{$ZABBIX.PROXY.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expressions.	`5m`

Items

Name	Description	Type	Key and additional info
Zabbix stats	The master item of Zabbix proxy statistics.	Zabbix agent (active)	zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}]
Queue over 10 minutes	The number of monitored items in the queue that are delayed by at least 10 minutes.	Zabbix agent (active)	zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Queue	The number of monitored items in the queue that are delayed by at least 6 seconds.	Zabbix agent (active)	zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing JSON Path: `$.queue`
Utilization of data sender internal processes, in %	The average percentage of the time during which the data sender processes have been busy for the last minute.	Dependent item	process.data_sender.avg.busy Preprocessing JSON Path: `$.data.process['data sender'].busy.avg` ⛔️Custom on fail: Set error to: `Processes data sender not started`
Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg`
Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes configuration syncer not started`
Utilization of discovery manager internal processes, in %	The average percentage of the time during which the discovery manager processes have been busy for the last minute.	Dependent item	process.discovery_manager.avg.busy Preprocessing JSON Path: `$.data.process['discovery manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery manager" processes started.`
Utilization of discovery worker internal processes, in %	The average percentage of the time during which the discovery worker processes have been busy for the last minute.	Dependent item	process.discovery_worker.avg.busy Preprocessing JSON Path: `$.data.process['discovery worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "discovery worker" processes started.`
Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes history syncer not started`
Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes housekeeper not started`
Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http poller not started`
Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `Processes icmp pinger not started`
Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi manager not started`
Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi poller not started`
Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes java poller not started`
Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes poller not started`
Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing worker not started`
Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing manager not started`
Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `Processes self-monitoring not started`
Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp trapper not started`
Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes task manager not started`
Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes trapper not started`
Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes unreachable poller not started`
Utilization of vmware collector data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `Processes vmware collector not started`
Utilization of agent poller data collector processes, in %	The average percentage of the time during which the agent poller processes have been busy for the last minute.	Dependent item	process.agent_poller.avg.busy Preprocessing JSON Path: `$.data.process['agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes agent poller not started`
Utilization of http agent poller data collector processes, in %	The average percentage of the time during which the http agent poller processes have been busy for the last minute.	Dependent item	process.httpagentpoller.avg.busy Preprocessing JSON Path: `$.data.process['http agent poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http agent poller not started`
Utilization of snmp poller data collector processes, in %	The average percentage of the time during which the snmp poller processes have been busy for the last minute.	Dependent item	process.snmp_poller.avg.busy Preprocessing JSON Path: `$.data.process['snmp poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp poller not started`
Utilization of internal poller data collector processes, in %	The average percentage of the time during which the internal poller processes have been busy for the last minute.	Dependent item	process.internal_poller.avg.busy Preprocessing JSON Path: `$.data.process['internal poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "internal poller" processes started.`
Utilization of browser poller data collector processes, in %	The average percentage of the time during which the browser poller processes have been busy for the last minute.	Dependent item	process.browser_poller.avg.busy Preprocessing JSON Path: `$.data.process['browser poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "browser poller" processes started.`
Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Zabbix proxy check	Flag indicating whether it is a proxy or not.	Dependent item	proxy_check Preprocessing JSON Path: `$.data.triggers` ⛔️Custom on fail: Set value to: `0` In range: `-> 0` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`
Version	The version of Zabbix proxy.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No vmware collector processes started`
History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Proxy memory buffer, % used	Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database.	Dependent item	proxy_buffer.buffer.pused Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].memory.pused` ⛔️Custom on fail: Set error to: `Proxy memory buffer is disabled.`
Proxy buffer, state	The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory.	Dependent item	proxy_buffer.state.current Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].state` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Proxy buffer, state changes	The number of state changes between disk/memory buffer modes since proxy start.	Dependent item	proxy_buffer.state.changes Preprocessing JSON Path: `$.data.wcache.['proxy buffer'].['state change']` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (float) values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Preprocessing queue	The number of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing_queue`
Discovery queue	The count of values enqueued in the discovery queue.	Dependent item	discovery_queue Preprocessing JSON Path: `$.data.discovery_queue` ⛔️Custom on fail: Set error to: `No "discoverer" processes started. Please check "StartDiscoverers" parameter in the server configuration file.`
Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	requiredperformance Preprocessing JSON Path: `$.data.requiredperformance` Discard unchanged with heartbeat: `12h`
Uptime	Uptime of the Zabbix proxy process in seconds.	Dependent item	uptime Preprocessing JSON Path: `$.data.uptime`

Triggers

Name	Description	Expression	Severity
Zabbix proxy: More than 100 items have been missing data for over 10 minutes	Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention.	`min(/Zabbix proxy health by Zabbix agent active/zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix proxy: Utilization of data sender processes is high	Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of availability manager processes is high	Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of configuration syncer processes is high	Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery manager processes is high	Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.discovery_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discovery worker processes is high	Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.discovery_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ODBC poller processes is high	Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of history syncer processes is high	Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of housekeeper processes is high	Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http poller processes is high	Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of icmp pinger processes is high	Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi manager processes is high	Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi poller processes is high	Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of java poller processes is high	Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of poller processes is high	Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing worker processes is high	Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing manager processes is high	Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of self-monitoring processes is high	Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp trapper processes is high	Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of task manager processes is high	Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of trapper processes is high	Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of unreachable poller processes is high	Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of vmware collector processes is high	Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of agent poller processes is high	Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http agent poller processes is high	Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp poller processes is high	Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.snmp_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of internal poller processes is high	Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.internal_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of browser poller processes is high	Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times.	`avg(/Zabbix proxy health by Zabbix agent active/process.browser_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive configuration cache usage	Consider increasing `CacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent active/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"}`\|Average	Manual close: Yes
Zabbix proxy: Failed to fetch stats data	Zabbix has not received statistics data for `{$ZABBIX.PROXY.NODATA_TIMEOUT}`.	`nodata(/Zabbix proxy health by Zabbix agent active/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1`\|Warning
Zabbix proxy: Wrong template assigned	Check that the template has been selected correctly.	`last(/Zabbix proxy health by Zabbix agent active/proxy_check)=1`\|Disaster	Manual close: Yes
Zabbix proxy: Version has changed	Zabbix proxy version has changed. Acknowledge to close the problem manually.	`last(/Zabbix proxy health by Zabbix agent active/version,#1)<>last(/Zabbix proxy health by Zabbix agent active/version,#2) and length(last(/Zabbix proxy health by Zabbix agent active/version))>0`\|Info	Manual close: Yes
Zabbix proxy: Excessive vmware cache usage	Consider increasing `VMwareCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent active/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history cache usage	Consider increasing `HistoryCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent active/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive history index cache usage	Consider increasing `HistoryIndexCacheSize` in the `zabbix_proxy.conf` configuration file.	`max(/Zabbix proxy health by Zabbix agent active/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"}`\|Average	Manual close: Yes
Zabbix proxy: Excessive proxy memory buffer usage	Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health by Zabbix agent active/proxy_buffer.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"}`\|Average	Manual close: Yes
Zabbix proxy: {HOST.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Zabbix proxy health by Zabbix agent active/uptime)<10m`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

module_zabbix_agent

View README Download JSON

Zabbix agent

Macros used

Name	Description	Default
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable. Works only for agents reachable from Zabbix server/proxy (passive mode).	`3m`

Items

Name	Description	Type	Key and additional info
Version of Zabbix agent running		Zabbix agent	agent.version Preprocessing Discard unchanged with heartbeat: `1d`
Host name of Zabbix agent running		Zabbix agent	agent.hostname Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix agent ping	The agent always returns "1" for this item. May be used in combination with `nodata()` for the availability check.	Zabbix agent	agent.ping
Zabbix agent availability	Used for monitoring the availability status of the agent.	Zabbix internal	zabbix[host,agent,available]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix agent is not available	For passive agents only, host availability is used with `{$AGENT.TIMEOUT}` as a time threshold.	`max(/Zabbix agent/zabbix[host,agent,available],{$AGENT.TIMEOUT})=0`\|Average	Manual close: Yes

Zabbix agent active

Macros used

Name	Description	Default
{$AGENT.NODATA_TIMEOUT}	No data timeout for active agents. Consider to keep it relatively high.	`30m`
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable.	`5m`

Items

Name	Description	Type	Key and additional info
Version of Zabbix agent running		Zabbix agent (active)	agent.version Preprocessing Discard unchanged with heartbeat: `1d`
Host name of Zabbix agent running		Zabbix agent (active)	agent.hostname Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix agent ping	The agent always returns "1" for this item. May be used in combination with `nodata()` for the availability check.	Zabbix agent (active)	agent.ping
Active agent availability	Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - unknown 1 - available 2 - not available	Zabbix internal	zabbix[host,active_agent,available]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix agent is not available	For active agents, `nodata()` with `agent.ping` is used with `{$AGENT.NODATA_TIMEOUT}` as a time threshold.	`nodata(/Zabbix agent active/agent.ping,{$AGENT.NODATA_TIMEOUT})=1`\|Average	Manual close: Yes
Active checks are not available	Active checks are considered unavailable. Agent has not sent a heartbeat for a prolonged time.	`min(/Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2`\|High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_wildfly_server_jmx

View README Download JSON

WildFly Server by JMX

Overview

Official JMX Template for WildFly server.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

WildFly 22.6.0

Configuration

Setup

Metrics are collected by JMX. This template works with standalone and domain instances.

Enable and configure JMX access to WildFly. See documentation for instructions.
Copy jboss-client.jar from /(wildfly,EAP,Jboss,AS)/bin/client in to directory /usr/share/zabbix-java-gateway/lib
Restart Zabbix Java gateway
Set the user name and password in host macros {$WILDFLY.USER} and {$WILDFLY.PASSWORD}. Depending on your server setup, you may need to specify a custom JMX scheme in macro {$WILDFLY.JMX.PROTOCOL} (default: remote+http)

Macros used

Name	Description	Default
{$WILDFLY.USER}		`zabbix`
{$WILDFLY.PASSWORD}		`zabbix`
{$WILDFLY.JMX.PROTOCOL}		`remote+http`
{$WILDFLY.DEPLOYMENT.MATCHES}	Filter of discoverable deployments	`.*`
{$WILDFLY.DEPLOYMENT.NOT_MATCHES}	Filter to exclude discovered deployments	`CHANGE_IF_NEEDED`
{$WILDFLY.CONN.USAGE.WARN.MAX}	The maximum connection usage percent for trigger expression.	`80`
{$WILDFLY.CONN.WAIT.MAX.WARN}	The maximum number of waiting connections for trigger expression.	`300`

Items

Name	Description	Type	Key and additional info
Launch type	The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine.	JMX agent	jmx["jboss.as:management-root=server","launchType"] Preprocessing Discard unchanged with heartbeat: `3h`
Name	For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain.	JMX agent	jmx["jboss.as:management-root=server","name"] Preprocessing Discard unchanged with heartbeat: `3h`
Process type	The type of process represented by this root resource.	JMX agent	jmx["jboss.as:management-root=server","processType"] Preprocessing Discard unchanged with heartbeat: `3h`
Runtime configuration state	The current persistent configuration state, one of starting, ok, reload-required, restart-required, stopping or stopped.	JMX agent	jmx["jboss.as:management-root=server","runtimeConfigurationState"] Preprocessing Discard unchanged with heartbeat: `3h`
Server controller state	The current state of the server controller; either STARTING, RUNNING, RESTARTREQUIRED, RELOADREQUIRED or STOPPING.	JMX agent	jmx["jboss.as:management-root=server","serverState"] Preprocessing Discard unchanged with heartbeat: `3h`
Version	The version of the WildFly Core based product release.	JMX agent	jmx["jboss.as:management-root=server","productVersion"] Preprocessing Discard unchanged with heartbeat: `3h`
Uptime	WildFly server uptime.	JMX agent	jmx["java.lang:type=Runtime","Uptime"] Preprocessing Custom multiplier: `0.001`
Transactions: Total, rate	The total number of transactions (top-level and nested) created per second.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfTransactions"] Preprocessing Change per second
Transactions: Aborted, rate	The number of aborted (i.e. rolledback) transactions per second.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfAbortedTransactions"] Preprocessing Change per second
Transactions: Application rollbacks, rate	The number of transactions that have been rolled back by application request. This includes those that timeout, since the timeout behavior is considered an attribute of the application configuration.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfApplicationRollbacks"] Preprocessing Change per second
Transactions: Committed, rate	The number of committed transactions.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfCommittedTransactions"] Preprocessing Change per second
Transactions: Heuristics, rate	The number of transactions which have terminated with heuristic outcomes.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfHeuristics"] Preprocessing Change per second
Transactions: Current	The number of transactions that have begun but not yet terminated.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfInflightTransactions"]
Transactions: Nested, rate	The total number of nested (sub) transactions created.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfNestedTransactions"] Preprocessing Change per second
Transactions: ResourceRollbacks, rate	The number of transactions that rolled back due to resource (participant) failure.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfResourceRollbacks"] Preprocessing Change per second
Transactions: System rollbacks, rate	The number of transactions that have been rolled back due to internal system errors.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfSystemRollbacks"] Preprocessing Change per second
Transactions: Timed out, rate	The number of transactions that have rolled back due to timeout.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfTimedOutTransactions"] Preprocessing Change per second

Triggers

Name	Description	Expression	Severity
WildFly Server: Server needs to restart for configuration change.		`find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","runtimeConfigurationState"],,"like","ok")=0`\|Warning
WildFly Server: Server controller is not in RUNNING state		`find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","serverState"],,"like","running")=0`\|Warning	Depends on: WildFly Server: Server needs to restart for configuration change.
WildFly Server: Version has changed	WildFly version has changed. Acknowledge to close the problem manually.	`last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0`\|Info	Manual close: Yes
WildFly Server: Host has been restarted	Uptime is less than 10 minutes.	`last(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m`\|Info	Manual close: Yes
WildFly Server: Failed to fetch info data	Zabbix has not received data for items for the last 15 minutes	`nodata(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"],15m)=1`\|Warning

LLD rule Deployments discovery

Name	Description	Type	Key and additional info
Deployments discovery	Discovery deployments metrics.	JMX agent	jmx.get[beans,"jboss.as.expr:deployment=*"]

Item prototypes for Deployments discovery

Name	Description	Type	Key and additional info
Deployment [{#DEPLOYMENT}]: Status	The current runtime status of a deployment. Possible status modes are OK, FAILED, and STOPPED. FAILED indicates a dependency is missing or a service could not start. STOPPED indicates that the deployment was not enabled or was manually stopped.	JMX agent	jmx["{#JMXOBJ}",status] Preprocessing Discard unchanged with heartbeat: `3h`
Deployment [{#DEPLOYMENT}]: Enabled	Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts).	JMX agent	jmx["{#JMXOBJ}",enabled] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
Deployment [{#DEPLOYMENT}]: Managed	Indicates if the deployment is managed (aka uses the ContentRepository).	JMX agent	jmx["{#JMXOBJ}",managed] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
Deployment [{#DEPLOYMENT}]: Persistent	Indicates if the deployment is managed (aka uses the ContentRepository).	JMX agent	jmx["{#JMXOBJ}",persistent] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
Deployment [{#DEPLOYMENT}]: Enabled time	Indicates if the deployment is managed (aka uses the ContentRepository).	JMX agent	jmx["{#JMXOBJ}",enabledTime] Preprocessing Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Deployments discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly Server: Deployment [{#DEPLOYMENT}]: Deployment status has changed	Deployment status has changed. Acknowledge to close the problem manually.	`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status]))>0`\|Warning	Manual close: Yes

LLD rule JDBC metrics discovery

Name	Description	Type	Key and additional info
JDBC metrics discovery		JMX agent	jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=jdbc"]

Item prototypes for JDBC metrics discovery

Name	Description	Type	Key and additional info
{#JMXDATASOURCE}: Cache access, rate	The number of times that the statement cache was accessed per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheAccessCount] Preprocessing Change per second
{#JMXDATASOURCE}: Cache add, rate	The number of statements added to the statement cache per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheAddCount] Preprocessing Change per second
{#JMXDATASOURCE}: Cache current size	The number of prepared and callable statements currently cached in the statement cache.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheCurrentSize]
{#JMXDATASOURCE}: Cache delete, rate	The number of statements discarded from the cache per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheDeleteCount] Preprocessing Change per second
{#JMXDATASOURCE}: Cache hit, rate	The number of times that statements from the cache were used per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheHitCount] Preprocessing Change per second
{#JMXDATASOURCE}: Cache miss, rate	The number of times that a statement request could not be satisfied with a statement from the cache per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheMissCount] Preprocessing Change per second
{#JMXDATASOURCE}: Statistics enabled	Define whether runtime statistics are enabled or not.	JMX agent	jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`

Trigger prototypes for JDBC metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly Server: {#JMXDATASOURCE}: JDBC monitoring statistic is not enabled		`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"])=0`\|Info

LLD rule Pools metrics discovery

Name	Description	Type	Key and additional info
Pools metrics discovery		JMX agent	jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=pool"]

Item prototypes for Pools metrics discovery

Name	Description	Type	Key and additional info
{#JMXDATASOURCE}: Connections: Active	The number of open connections.	JMX agent	jmx["{#JMXOBJ}",ActiveCount]
{#JMXDATASOURCE}: Connections: Available	The available count.	JMX agent	jmx["{#JMXOBJ}",AvailableCount]
{#JMXDATASOURCE}: Blocking time, avg	Average Blocking Time for pool.	JMX agent	jmx["{#JMXOBJ}",AverageBlockingTime]
{#JMXDATASOURCE}: Connections: Creating time, avg	The average time spent creating a physical connection.	JMX agent	jmx["{#JMXOBJ}",AverageCreationTime]
{#JMXDATASOURCE}: Connections: Get time, avg	The average time spent obtaining a physical connection.	JMX agent	jmx["{#JMXOBJ}",AverageGetTime]
{#JMXDATASOURCE}: Connections: Pool time, avg	The average time for a physical connection spent in the pool.	JMX agent	jmx["{#JMXOBJ}",AveragePoolTime]
{#JMXDATASOURCE}: Connections: Usage time, avg	The average time spent using a physical connection	JMX agent	jmx["{#JMXOBJ}",AverageUsageTime]
{#JMXDATASOURCE}: Connections: Blocking failure, rate	The number of failures trying to obtain a physical connection per second.	JMX agent	jmx["{#JMXOBJ}",BlockingFailureCount] Preprocessing Change per second
{#JMXDATASOURCE}: Connections: Created, rate	The created per second	JMX agent	jmx["{#JMXOBJ}",CreatedCount] Preprocessing Change per second
{#JMXDATASOURCE}: Connections: Destroyed, rate	The destroyed count.	JMX agent	jmx["{#JMXOBJ}",DestroyedCount] Preprocessing Change per second
{#JMXDATASOURCE}: Connections: Idle	The number of physical connections currently idle.	JMX agent	jmx["{#JMXOBJ}",IdleCount]
{#JMXDATASOURCE}: Connections: In use	The number of physical connections currently in use.	JMX agent	jmx["{#JMXOBJ}",InUseCount]
{#JMXDATASOURCE}: Connections: Used, max	The maximum number of connections used.	JMX agent	jmx["{#JMXOBJ}",MaxUsedCount]
{#JMXDATASOURCE}: Statistics enabled	Define whether runtime statistics are enabled or not.	JMX agent	jmx["{#JMXOBJ}",statisticsEnabled] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
{#JMXDATASOURCE}: Connections: Timed out, rate	The timed out connections per second.	JMX agent	jmx["{#JMXOBJ}",TimedOut] Preprocessing Change per second
{#JMXDATASOURCE}: Connections: Wait	The number of requests that had to wait to obtain a physical connection.	JMX agent	jmx["{#JMXOBJ}",WaitCount]
{#JMXDATASOURCE}: XA: Commit time, avg	The average time for a XAResource commit invocation.	JMX agent	jmx["{#JMXOBJ}",XACommitAverageTime]
{#JMXDATASOURCE}: XA: Commit, rate	The number of XAResource commit invocations per second.	JMX agent	jmx["{#JMXOBJ}",XACommitCount] Preprocessing Change per second
{#JMXDATASOURCE}: XA: End time, avg	The average time for a XAResource end invocation.	JMX agent	jmx["{#JMXOBJ}",XAEndAverageTime]
{#JMXDATASOURCE}: XA: End, rate	The number of XAResource end invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAEndCount] Preprocessing Change per second
{#JMXDATASOURCE}: XA: Forget time, avg	The average time for a XAResource forget invocation.	JMX agent	jmx["{#JMXOBJ}",XAForgetAverageTime]
{#JMXDATASOURCE}: XA: Forget, rate	The number of XAResource forget invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAForgetCount] Preprocessing Change per second
{#JMXDATASOURCE}: XA: Prepare time, avg	The average time for a XAResource prepare invocation.	JMX agent	jmx["{#JMXOBJ}",XAPrepareAverageTime]
{#JMXDATASOURCE}: XA: Prepare, rate	The number of XAResource prepare invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAPrepareCount] Preprocessing Change per second
{#JMXDATASOURCE}: XA: Recover time, avg	The average time for a XAResource recover invocation.	JMX agent	jmx["{#JMXOBJ}",XARecoverAverageTime]
{#JMXDATASOURCE}: XA: Recover, rate	The number of XAResource recover invocations per second.	JMX agent	jmx["{#JMXOBJ}",XARecoverCount] Preprocessing Change per second
{#JMXDATASOURCE}: XA: Rollback time, avg	The average time for a XAResource rollback invocation.	JMX agent	jmx["{#JMXOBJ}",XARollbackAverageTime]
{#JMXDATASOURCE}: XA: Rollback, rate	The number of XAResource rollback invocations per second.	JMX agent	jmx["{#JMXOBJ}",XARollbackCount] Preprocessing Change per second
{#JMXDATASOURCE}: XA: Start time, avg	The average time for a XAResource start invocation.	JMX agent	jmx["{#JMXOBJ}",XAStartAverageTime]
{#JMXDATASOURCE}: XA: Start rate	The number of XAResource start invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAStartCount] Preprocessing Change per second

Trigger prototypes for Pools metrics discovery

Name	Description	Expression
WildFly Server: {#JMXDATASOURCE}: There are no active connections for 5m		`max(/WildFly Server by JMX/jmx["{#JMXOBJ}",ActiveCount],5m)=0`\|Warning
WildFly Server: {#JMXDATASOURCE}: Connection usage is too high		`min(/WildFly Server by JMX/jmx["{#JMXOBJ}",InUseCount],5m)/last(/WildFly Server by JMX/jmx["{#JMXOBJ}",AvailableCount])*100>{$WILDFLY.CONN.USAGE.WARN.MAX}`\|High
WildFly Server: {#JMXDATASOURCE}: Pools monitoring statistic is not enabled	Zabbix has not received data for items for the last 15 minutes	`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled])=0`\|Info
WildFly Server: {#JMXDATASOURCE}: There are timeout connections		`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",TimedOut])>0`\|Warning
WildFly Server: {#JMXDATASOURCE}: Too many waiting connections		`min(/WildFly Server by JMX/jmx["{#JMXOBJ}",WaitCount],5m)>{$WILDFLY.CONN.WAIT.MAX.WARN}`\|Warning

LLD rule Undertow metrics discovery

Name	Description	Type	Key and additional info
Undertow metrics discovery		JMX agent	jmx.get[beans,"jboss.as:subsystem=undertow,server=,http-listener="]

Item prototypes for Undertow metrics discovery

Name	Description	Type	Key and additional info
Listener {#HTTP_LISTENER}: Errors, rate	The number of 500 responses that have been sent by this listener per second.	JMX agent	jmx["{#JMXOBJ}",errorCount] Preprocessing Change per second
Listener {#HTTP_LISTENER}: Requests, rate	The number of requests this listener has served per second.	JMX agent	jmx["{#JMXOBJ}",requestCount] Preprocessing Change per second
Listener {#HTTP_LISTENER}: Bytes sent, rate	The number of bytes that have been sent out on this listener per second.	JMX agent	jmx["{#JMXOBJ}",bytesSent] Preprocessing Change per second
Listener {#HTTP_LISTENER}: Bytes received, rate	The number of bytes that have been received by this listener per second.	JMX agent	jmx["{#JMXOBJ}",bytesReceived] Preprocessing Change per second

Trigger prototypes for Undertow metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly Server: Listener {#HTTP_LISTENER}: There are 500 responses by this listener.		`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",errorCount])>0`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_wildfly_domain_jmx

View README Download JSON

WildFly Domain by JMX

Overview

Official JMX Template for WildFly Domain Controller.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

WildFly 22.6.0

Configuration

Setup

Metrics are collected by JMX. This template works with Domain Controller.

Enable and configure JMX access to WildFly. See documentation for instructions.
Copy jboss-client.jar from /(wildfly,EAP,Jboss,AS)/bin/client in to directory /usr/share/zabbix-java-gateway/lib
Restart Zabbix Java gateway
Set the user name and password in host macros {$WILDFLY.USER} and {$WILDFLY.PASSWORD}. Depending on your server setup, you may need to specify a custom JMX scheme in macro {$WILDFLY.JMX.PROTOCOL} (default: remote+http)

Macros used

Name	Description	Default
{$WILDFLY.USER}		`zabbix`
{$WILDFLY.PASSWORD}		`zabbix`
{$WILDFLY.JMX.PROTOCOL}		`remote+http`
{$WILDFLY.DEPLOYMENT.MATCHES}	Filter of discoverable deployments	`.*`
{$WILDFLY.DEPLOYMENT.NOT_MATCHES}	Filter to exclude discovered deployments	`CHANGE_IF_NEEDED`
{$WILDFLY.SERVER.MATCHES}	Filter of discoverable servers	`.*`
{$WILDFLY.SERVER.NOT_MATCHES}	Filter to exclude discovered servers	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Launch type	The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine.	JMX agent	jmx["jboss.as:management-root=server","launchType"] Preprocessing Discard unchanged with heartbeat: `3h`
Name	For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain	JMX agent	jmx["jboss.as:management-root=server","name"] Preprocessing Discard unchanged with heartbeat: `3h`
Process type	The type of process represented by this root resource.	JMX agent	jmx["jboss.as:management-root=server","processType"] Preprocessing Discard unchanged with heartbeat: `3h`
Version	The version of the WildFly Core based product release.	JMX agent	jmx["jboss.as:management-root=server","productVersion"] Preprocessing Discard unchanged with heartbeat: `3h`
Uptime	WildFly server uptime.	JMX agent	jmx["java.lang:type=Runtime","Uptime"] Preprocessing Custom multiplier: `0.001`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
WildFly Domain: WildFly: Version has changed	WildFly version has changed. Acknowledge to close the problem manually.	`last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0`\|Info	Manual close: Yes
WildFly Domain: WildFly: Host has been restarted	Uptime is less than 10 minutes.	`last(/WildFly Domain by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m`\|Info	Manual close: Yes

LLD rule Deployments discovery

Name	Description	Type	Key and additional info
Deployments discovery	Discovery deployments metrics.	JMX agent	jmx.get[beans,"jboss.as.expr:deployment=,server-group="]

Item prototypes for Deployments discovery

Name Description Type Key and additional info

WildFly deployment [{#DEPLOYMENT}]: Enabled

Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts).

JMX agent

jmx["{#JMXOBJ}",enabled]

Preprocessing

Boolean to decimal
Discard unchanged with heartbeat: 3h

WildFly deployment [{#DEPLOYMENT}]: Managed

Indicates if the deployment is managed (aka uses the ContentRepository).

JMX agent

jmx["{#JMXOBJ}",managed]

Preprocessing

Boolean to decimal
Discard unchanged with heartbeat: 3h

LLD rule Servers discovery

Name	Description	Type	Key and additional info
Servers discovery	Discovery instances in domain.	JMX agent	jmx.get[beans,"jboss.as:host=master,server-config=*"]

Item prototypes for Servers discovery

Name Description Type Key and additional info

Server {#SERVER}: Autostart

Whether or not this server should be started when the Host Controller starts.

JMX agent

jmx["{#JMXOBJ}",autoStart]

Preprocessing

Boolean to decimal
Discard unchanged with heartbeat: 3h

Server {#SERVER}: Status

The current status of the server.

JMX agent

jmx["{#JMXOBJ}",status]

Preprocessing

Discard unchanged with heartbeat: 3h

Server {#SERVER}: Server group

The name of a server group from the domain model.

JMX agent

jmx["{#JMXOBJ}",group]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for Servers discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly Domain: Server {#SERVER}: Server status has changed	Server status has changed. Acknowledge to close the problem manually.	`last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status]))>0`\|Warning	Manual close: Yes
WildFly Domain: Server {#SERVER}: Server group has changed	Server group has changed. Acknowledge to close the problem manually.	`last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group]))>0`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_website_browser

View README Download JSON

Website by Browser

Overview

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

ChromeDriver 124.0.6367.207, selenium-server-4.0.0-alpha-6

Configuration

Setup

Install WebDriver. For more information, please refer to the Selenium WebDriver page. Run selenium-server. Add in configuration file WebDriver interface HTTP[S] URL. For example http://localhost:4444

Macros used

Name	Description	Default
{$WEBSITE.BROWSER}	Browser to be used for data collection.	`chrome`
{$WEBSITE.DOMAIN}	The domain name.	`www.example.com`
{$WEBSITE.PATH}	The path to resource.
{$WEBSITE.SCHEME}	The request scheme, which may be either HTTP or HTTPS.	`https`
{$WEBSITE.SCREEN.WIDTH}	Screen size width in pixels, used for screenshot.	`1920`
{$WEBSITE.SCREEN.HEIGHT}	Screen size height in pixels, used for screenshot.	`1080`
{$WEBSITE.RESOURCE.LOAD.MAX.WARN}	The maximum browser response time expressed in seconds for a trigger expression.	`5`
{$WEBSITE.NAVIGATION.LOAD.MAX.WARN}	The maximum browser response time expressed in seconds for a trigger expression.	`5`
{$WEBSITE.GET.DATA.INTERVAL}	Update interval for get raw data item.	`0s;m/15`

Items

Name	Description	Type	Key and additional info
Website {$WEBSITE.DOMAIN} Get data	Returns the JSON with performance counters of the requested website.	Browser	website.get.data Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Check that the performance counters of the requested website data has been received correctly.	Dependent item	website.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Website {$WEBSITE.DOMAIN} Screenshot	Website {$WEBSITE.DOMAIN} screenshot.	Dependent item	website.screenshot Preprocessing JSON Path: `$.screenshot` ⛔️Custom on fail: Discard value
Website {$WEBSITE.DOMAIN} Navigation load event time	Measuring of load finished time (loadEventEnd).	Dependent item	website.navigation.load_time Preprocessing JSON Path: `$.performance_data.summary.navigation.load_finished` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation response time	Measuring of time spend on the response (responseEnd - responseStart).	Dependent item	website.navigation.response_time Preprocessing JSON Path: `$.performance_data.summary.navigation.response_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation request time	Measuring of time spend on the request (responseStart - requestStart).	Dependent item	website.navigation.request_time Preprocessing JSON Path: `$.performance_data.summary.navigation.request_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation resource fetch time	Measuring of time spent to fetch the resource (without redirects) (responseEnd - fetchStart).	Dependent item	website.navigation.resourcefetchtime Preprocessing JSON Path: `$.performance_data.summary.navigation.resource_fetch_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation service worker processing time	Measuring of sum of time spend on browser's service worker processing (fetchStart - workerStart).	Dependent item	website.navigation.serviceworkerprocessing_time Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation domContentLoaded time	Measuring of time spent on DOM content loading (domContentLoadedEventEnd - domContentLoadedEventStart).	Dependent item	website.navigation.domcontentloaded_time Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation DNS lookup time	Measuring of time spent on DNS lookup (domainLookupEnd - domainLookupStart).	Dependent item	website.navigation.dnslookuptime Preprocessing JSON Path: `$.performance_data.summary.navigation.dns_lookup_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation TCP handshake time	Measuring of time spent on TCP handshake (connectEnd - connectStart).	Dependent item	website.navigation.tcphandshaketime Preprocessing JSON Path: `$.performance_data.summary.navigation.tcp_handshake_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation TLS negotiation time	Measuring of time spent on TLS negotiation (requestStart - secureConnectionStart).	Dependent item	website.navigation.tlsnegotiationtime Preprocessing JSON Path: `$.performance_data.summary.navigation.tls_negotiation_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Navigation encodedBody size	Measuring of encoded size (encodedBodySize).	Dependent item	website.navigation.encoded_size Preprocessing JSON Path: `$.performance_data.summary.navigation.encoded_size` ⛔️Custom on fail: Discard value
Website {$WEBSITE.DOMAIN} Navigation decodedBody size	Measuring of total size (decodedBodySize).	Dependent item	website.navigation.total_size Preprocessing JSON Path: `$.performance_data.summary.navigation.total_size` ⛔️Custom on fail: Discard value
Website {$WEBSITE.DOMAIN} Navigation transfer size	Measuring of transferred size (transferSize).	Dependent item	website.navigation.transferred_size Preprocessing JSON Path: `$.performance_data.summary.navigation.transferred_size` ⛔️Custom on fail: Discard value
Website {$WEBSITE.DOMAIN} Resource load event time	Measuring of load finished time (loadEventEnd).	Dependent item	website.resource.load_time Preprocessing JSON Path: `$.performance_data.summary.resource.load_finished` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource response time	Measuring of time spend on the response (responseEnd - responseStart).	Dependent item	website.resource.response_time Preprocessing JSON Path: `$.performance_data.summary.resource.response_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource request time	Measuring of time spend on the request (responseStart - requestStart).	Dependent item	website.resource.request_time Preprocessing JSON Path: `$.performance_data.summary.resource.request_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource fetch time	Measuring of time spent to fetch the resource (without redirects) (responseEnd - fetchStart).	Dependent item	website.resource.fetch_time Preprocessing JSON Path: `$.performance_data.summary.resource.resource_fetch_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource service worker processing time	Measuring of sum of time spend on browser's service worker processing (fetchStart - workerStart).	Dependent item	website.resource.serviceworkerprocessing_time Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource domContentLoaded time	Measuring of time spent on DOM content loading (domContentLoadedEventEnd - domContentLoadedEventStart).	Dependent item	website.resource.domcontentloaded_time Preprocessing JSON Path: `$.performance_data.summary.resource.dom_content_loading_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource DNS lookup time	Measuring of time spent on DNS lookup (domainLookupEnd - domainLookupStart).	Dependent item	website.resource.dnslookuptime Preprocessing JSON Path: `$.performance_data.summary.resource.dns_lookup_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource TCP handshake time	Measuring of time spent on TCP handshake (connectEnd - connectStart).	Dependent item	website.resource.tcphandshaketime Preprocessing JSON Path: `$.performance_data.summary.resource.tcp_handshake_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource TLS negotiation time	Measuring of time spent on TLS negotiation (requestStart - secureConnectionStart).	Dependent item	website.resource.tlsnegotiationtime Preprocessing JSON Path: `$.performance_data.summary.resource.tls_negotiation_time` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Website {$WEBSITE.DOMAIN} Resource encodedBody size	Measuring of encoded size (encodedBodySize).	Dependent item	website.resource.encoded_size Preprocessing JSON Path: `$.performance_data.summary.resource.encoded_size` ⛔️Custom on fail: Discard value
Website {$WEBSITE.DOMAIN} Resource decodedBody size	Measuring of total size (decodedBodySize).	Dependent item	website.resource.total_size Preprocessing JSON Path: `$.performance_data.summary.resource.total_size` ⛔️Custom on fail: Discard value
Website {$WEBSITE.DOMAIN} Resource transfer size	Measuring of transferred size (transferSize).	Dependent item	website.resource.transferred_size Preprocessing JSON Path: `$.performance_data.summary.resource.transferred_size` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
Website by Browser: Failed to get metrics data	Failed to get JSON with performance counters of the requested website '{$WEBSITE.DOMAIN}'.	`length(last(/Website by Browser/website.metrics.check))>0`\|High
Website by Browser: Website navigation load event time is too slow		`last(/Website by Browser/website.navigation.load_time)>{$WEBSITE.NAVIGATION.LOAD.MAX.WARN}`\|Warning	Depends on: Website by Browser: Failed to get metrics data
Website by Browser: Website resource load event time is too slow		`last(/Website by Browser/website.resource.load_time)>{$WEBSITE.RESOURCE.LOAD.MAX.WARN}`\|Warning	Depends on: Website by Browser: Failed to get metrics data

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_vmware_fqdn

View README Download JSON

VMware FQDN

Overview

This template set is designed for the effortless deployment of VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.

The template "VMware Guest" is used in discovery and normally should not be manually linked to a host.
The template "VMware Hypervisor" can be used in discovery as well as manually linked to a host.

For additional information, please see Zabbix documentation on VM monitoring.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

VMware 6.0, 6.7, 7.0, 8.0

Configuration

Setup

Compile Zabbix server with the required options (--with-libxml2 and --with-libcurl)
Set the StartVMwareCollectors option in the Zabbix server configuration file to "1" or more
Create a new host
If you want to use a separate user for monitoring, make sure that the user is a member of the SystemConfiguration.ReadOnly and vStatsGroup groups Set the host macros (on the host or template level) required for VMware authentication:

{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}

Link the template to the host created earlier

Note: To enable discovery of hardware sensors of VMware hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY} to the value true on the discovered host level.

Additional resources:

How to create a custom performance counter
How to get all supported counters and generate a path for the custom performance counter

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service `{$USERNAME}` user password.
{$VMWARE.PROXY}	Sets the HTTP proxy for script items. If this parameter is empty, then no proxy is used.
{$VMWARE.HV.SENSOR.DISCOVERY}	Set "true"/"false" to enable or disable monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to be allowed in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to be ignored in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.VM.POWERSTATE}	Possibility to filter out VMs by power state.	`poweredOn\|poweredOff\|suspended`
{$VMWARE.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`

Items

Name	Description	Type	Key and additional info
Get alarms	Get alarm status.	Simple check	vmware.alarms.get[{$VMWARE.URL}]
Event log	Collect VMware event log. See also: https://www.zabbix.com/documentation/7.0/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords	Simple check	vmware.eventlog[{$VMWARE.URL},skip]
Full name	VMware service full name.	Simple check	vmware.fullname[{$VMWARE.URL}] Preprocessing Discard unchanged with heartbeat: `1d`
Version	VMware service version.	Simple check	vmware.version[{$VMWARE.URL}] Preprocessing Discard unchanged with heartbeat: `1d`
Get Overall Health VC State	Gets overall health of the system. This item works only with VMware vCenter versions above 6.5.	Script	vmware.health.get
Overall Health VC State error check	Data collection error check.	Dependent item	vmware.health.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Overall Health VC State	VMware Overall health of system. One of the following: - Gray: No health data is available for this service. - Green: Service is healthy. - Yellow: The service is in a healthy state, but experiencing some level of problems. - Orange: The service health is degraded. The service might have serious problems. - Red: The service is unavailable, not functioning properly, or will stop functioning soon. - Not available: The health status is unavailable (not supported on the vCenter or ESXi side).	Dependent item	vmware.health.state Preprocessing JSON Path: `$.health` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
VMware FQDN: Failed to get Overall Health VC State	Failed to get data. Check debug log for more information.	`length(last(/VMware FQDN/vmware.health.check))>0`\|Warning
VMware FQDN: Overall Health VC State is not Green	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive.	`last(/VMware FQDN/vmware.health.state)>0 and last(/VMware FQDN/vmware.health.state)<>6`\|Average

LLD rule VMware alarm discovery

Name Description Type Key and additional info

VMware alarm discovery

Discovery of alarms.

Dependent item

vmware.alarms.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for VMware alarm discovery

Name Description Type Key and additional info

{#VMWARE.ALARMS.NAME}

VMware alarm status.

Dependent item

vmware.alarms.status["{#VMWARE.ALARMS.KEY}"]

Preprocessing

JSON Path: $.[?(@.key == "{#VMWARE.ALARMS.KEY}")].key.first()
⛔️Custom on fail: Set value to: -1
Discard unchanged with heartbeat: 1h

Trigger prototypes for VMware alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware FQDN: {#VMWARE.ALARMS.NAME}	{#VMWARE.ALARMS.DESC}	`last(/VMware FQDN/vmware.alarms.status["{#VMWARE.ALARMS.KEY}"])<>-1`\|Not_classified

LLD rule VMware cluster discovery

Name	Description	Type	Key and additional info
VMware cluster discovery	Discovery of clusters.	Simple check	vmware.cluster.discovery[{$VMWARE.URL}]

Item prototypes for VMware cluster discovery

Name

Description

Type

Key and additional info

Status of [{#CLUSTER.NAME}] cluster

VMware cluster status. One of the following:

- Gray: Unknown;

- Green: OK;

- Yellow: It might have a problem;

- Red: It has a problem.

Simple check

vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}]

Trigger prototypes for VMware cluster discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware FQDN: The [{#CLUSTER.NAME}] status is Red	A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, when resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html	`last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3`\|High
VMware FQDN: The [{#CLUSTER.NAME}] status is Yellow	A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all the resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html	`last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2`\|Average	Depends on: VMware FQDN: The [{#CLUSTER.NAME}] status is Red

LLD rule VMware datastore discovery

Name	Description	Type	Key and additional info
VMware datastore discovery	Discovery of VMware datastores.	Simple check	vmware.datastore.discovery[{$VMWARE.URL}]

Item prototypes for VMware datastore discovery

Name	Description	Type	Key and additional info
Average read IOPS of the datastore [{#DATASTORE}]	IOPS for a read operation from the datastore (milliseconds).	Simple check	vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},rps]
Average write IOPS of the datastore [{#DATASTORE}]	IOPS for a write operation to the datastore (milliseconds).	Simple check	vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},rps]
Average read latency of the datastore [{#DATASTORE}]	Amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},latency]
Free space on datastore [{#DATASTORE}] (percentage)	VMware datastore free space (percentage from the total).	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree]
Total size of datastore [{#DATASTORE}]	VMware datastore space in bytes.	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID}] Preprocessing Discard unchanged with heartbeat: `1h`
Average write latency of the datastore [{#DATASTORE}]	Amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},latency]

Trigger prototypes for VMware datastore discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware FQDN: [{#DATASTORE}]: Free space is critically low	Datastore free space has fallen below the critical threshold.	`last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT}`\|High
VMware FQDN: [{#DATASTORE}]: Free space is low	Datastore free space has fallen below the warning threshold.	`last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware FQDN: [{#DATASTORE}]: Free space is critically low

LLD rule VMware hypervisor discovery

Name	Description	Type	Key and additional info
VMware hypervisor discovery	Discovery of hypervisors.	Simple check	vmware.hv.discovery[{$VMWARE.URL}]

LLD rule VMware VM FQDN discovery

Name	Description	Type	Key and additional info
VMware VM FQDN discovery	Discovery of guest virtual machines.	Simple check	vmware.vm.discovery[{$VMWARE.URL}]

VMware Guest

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service `{$USERNAME}` user password.
{$VMWARE.VM.FS.PFREE.MIN.WARN}	VMware guest free space threshold for the warning trigger.	`20`
{$VMWARE.VM.FS.PFREE.MIN.CRIT}	VMware guest free space threshold for the critical trigger.	`10`
{$VMWARE.VM.FS.TRIGGER.USED}	VMware guest used free space trigger. Set to "1"/"0" to enable or disable the trigger.	`0`

Items

Name	Description	Type	Key and additional info
Snapshot consolidation needed	Displays whether snapshot consolidation is needed or not. One of the following: - True; - False.	Simple check	vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Snapshot count	Snapshot count of the guest VM.	Dependent item	vmware.vm.snapshot.count Preprocessing JSON Path: `$.count` Discard unchanged with heartbeat: `1d`
Get snapshots	Snapshots of the guest VM.	Simple check	vmware.vm.snapshot.get[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Snapshot latest date	Latest snapshot date of the guest VM.	Dependent item	vmware.vm.snapshot.latestdate Preprocessing JSON Path: `$.latestdate` Discard unchanged with heartbeat: `1d`
VM state	VMware virtual machine state. One of the following: - Not running; - Resetting; - Running; - Shutting down; - Standby; - Unknown.	Simple check	vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
VMware Tools status	Monitoring of VMware Tools. One of the following: - Guest tools executing scripts: VMware Tools is starting. - Guest tools not running: VMware Tools is not running. - Guest tools running: VMware Tools is running.	Simple check	vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
VMware Tools version	Monitoring of the VMware Tools version.	Simple check	vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},version] Preprocessing Discard unchanged with heartbeat: `12h`
Cluster name	Cluster name of the guest VM.	Simple check	vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Number of virtual CPUs	Number of virtual CPUs assigned to the guest.	Simple check	vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU ready	Time that the VM was ready, but unable to get scheduled to run on the physical CPU during the last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds).	Simple check	vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU usage	Current upper-bound on CPU usage. The upper-bound is based on the host the VM is current running on, as well as limits configured on the VM itself or any parent resource pool. Valid while the VM is running.	Simple check	vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Datacenter name	Datacenter name of the guest VM.	Simple check	vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Hypervisor name	Hypervisor name of the guest VM.	Simple check	vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver.	Simple check	vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Compressed memory	The amount of memory currently in the compression cache for this VM.	Simple check	vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Private memory	Amount of memory backed by host memory and not being shared.	Simple check	vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Shared memory	The amount of guest physical memory shared through transparent page sharing.	Simple check	vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Swapped memory	The amount of guest physical memory swapped out to the VM's swap device by ESX.	Simple check	vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Guest memory usage	The amount of guest physical memory that is being used by the VM.	Simple check	vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Host memory usage	The amount of host physical memory allocated to the VM, accounting for the amount saved from memory sharing with other VMs.	Simple check	vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Memory size	Total size of configured memory.	Simple check	vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Power state	The current power state of the VM. One of the following: - Powered off; - Powered on; - Suspended.	Simple check	vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1h`
Committed storage space	Total storage space, in bytes, committed to this VM across all datastores.	Simple check	vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Uncommitted storage space	Additional storage space, in bytes, potentially used by this VM on all datastores.	Simple check	vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Unshared storage space	Total storage space, in bytes, occupied by the VM across all datastores that is not shared with any other VM.	Simple check	vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Uptime	System uptime.	Simple check	vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Guest memory swapped	Amount of guest physical memory that is swapped out to the swap space.	Simple check	vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Host memory consumed	Amount of host physical memory consumed for backing up guest physical memory pages.	Simple check	vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Host memory usage in percent	Percentage of host physical memory that has been consumed.	Simple check	vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU usage in percent	CPU usage as a percentage during the interval.	Simple check	vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU latency in percent	Percentage of time the VM is unable to run because it is contending for access to the physical CPU(s).	Simple check	vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU readiness latency in percent	Percentage of time that the virtual machine was ready, but was unable to get scheduled to run on the physical CPU.	Simple check	vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU swap-in latency in percent	Percentage of CPU time spent waiting for a swap-in.	Simple check	vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Uptime of guest OS	Total time elapsed since the last operating system boot-up (in seconds).	Simple check	vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Triggers

Name	Description	Expression	Severity
VMware Guest: Snapshot consolidation needed	Snapshot consolidation needed.	`last(/VMware Guest/vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}])=0`\|Average	Manual close: Yes
VMware Guest: VM is not running	VMware virtual machine is not running.	`last(/VMware Guest/vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}]) <> 2`\|Average
VMware Guest: VMware Tools is not running	VMware Tools is not running on the VM.	`last(/VMware Guest/vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status]) = 1`\|Warning	Depends on: VMware Guest: VM is not running
VMware Guest: VM has been restarted	Uptime is less than 10 minutes.	`(between(last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/VMware Guest/vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1) and last(/VMware Guest/vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}]) = 1`\|Warning	Manual close: Yes

LLD rule Network device discovery

Name	Description	Type	Key and additional info
Network device discovery	Discovery of all network devices.	Simple check	vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Network device discovery

Name	Description	Type	Key and additional info
Number of bytes received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface input statistics (bytes per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
Number of packets received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface input statistics (packets per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
Number of bytes transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface output statistics (bytes per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
Number of packets transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface output statistics (packets per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
Network utilization on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network utilization (combined transmit and receive rates) during the interval.	Simple check	vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}]

LLD rule Disk device discovery

Name	Description	Type	Key and additional info
Disk device discovery	Discovery of all disk devices.	Simple check	vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Disk device discovery

Name	Description	Type	Key and additional info
Average number of bytes read from the disk [{#DISKDESC}]	VMware virtual machine disk device read statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
Average number of reads from the disk [{#DISKDESC}]	VMware virtual machine disk device read statistics (operations per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
Average number of bytes written to the disk [{#DISKDESC}]	VMware virtual machine disk device write statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
Average number of writes to the disk [{#DISKDESC}]	VMware virtual machine disk device write statistics (operations per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
Average number of outstanding read requests to the disk [{#DISKDESC}]	Average number of outstanding read requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
Average number of outstanding write requests to the disk [{#DISKDESC}]	Average number of outstanding write requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
Average write latency to the disk [{#DISKDESC}]	The average time a write to the virtual disk takes.	Simple check	vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
Average read latency from the disk [{#DISKDESC}]	The average time a read from the virtual disk takes.	Simple check	vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]

LLD rule Mounted filesystem discovery

Name	Description	Type	Key and additional info
Mounted filesystem discovery	Discovery of all guest file systems.	Simple check	vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Mounted filesystem discovery

Name	Description	Type	Key and additional info
Free disk space on [{#FSNAME}]	VMware virtual machine file system statistics (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free]
Free disk space on [{#FSNAME}] (percentage)	VMware virtual machine file system statistics (percentage).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree]
Total disk space on [{#FSNAME}]	VMware virtual machine total disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing Discard unchanged with heartbeat: `1d`
Used disk space on [{#FSNAME}]	VMware virtual machine used disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used]

Trigger prototypes for Mounted filesystem discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware Guest: [{#FSNAME}]: Disk space is critically low	The disk free space on [{#FSNAME}] has been less than `{$VMWARE.VM.FS.PFREE.MIN.CRIT:"{#FSNAME}"}`% for 5m.	`max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.CRIT:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1`\|Average	Manual close: Yes
VMware Guest: [{#FSNAME}]: Disk space is low	The disk free space on [{#FSNAME}] has been less than `{$VMWARE.VM.FS.PFREE.MIN.WARN:"{#FSNAME}"}`% for 5m.	`max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.WARN:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1`\|Warning	Manual close: Yes Depends on: VMware Guest: [{#FSNAME}]: Disk space is critically low

VMware Hypervisor

Overview

This template is designed for the effortless deployment of VMware ESX hypervisor monitoring and doesn't require any external scripts.

This template can be used in discovery as well as manually linked to a host.

For additional information, please see Zabbix documentation on VM monitoring.

To use this template as manually linked to a host, attach it to the host and manually set the value of the {$VMWARE.HV.UUID} macro.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

VMware 6.0, 6.7, 7.0, 8.0

Configuration

Setup

To use this template as manually linked to a host:

Compile Zabbix server with the required options (--with-libxml2 and --with-libcurl)
Set the StartVMwareCollectors option in the Zabbix server configuration file to "1" or more
Create a new host
Set the host macros (on the host or template level) required for VMware authentication:
```
{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
```
To get the hypervisor UUID, enable access to the hypervisor via SSH and log in via SSH using a valid login and password.
Run the following command and specify the UUID in the macro {$VMWARE.HV.UUID}:
```
vim-cmd hostsvc/hostsummary | grep uuid
```
Add the agent interface on the host with the address (IP or DNS) of the VMware hypervisor
Link the template to the host created earlier

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service `{$USERNAME}` user password.
{$VMWARE.HV.SENSOR.DISCOVERY}	Set to "true"/"false" to enable or disable the monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to be allowed in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to be ignored in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.HV.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.HV.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`
{$VMWARE.HV.UUID}	UUID of hypervisor.

Items

Name	Description	Type	Key and additional info
Connection state	VMware hypervisor connection state. One of the following: - Connected; - Disconnected; - Not responding.	Simple check	vmware.hv.connectionstate[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Number of errors received	VMware hypervisor network input statistics (errors).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},errors]
Number of broadcasts received	VMware hypervisor network input statistics (broadcasts).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast]
Number of dropped received packets	VMware hypervisor network input statistics (packets dropped).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped]
Number of broadcasts transmitted	VMware hypervisor network output statistics (broadcasts).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast]
Number of dropped transmitted packets	VMware hypervisor network output statistics (packets dropped).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped]
Number of errors transmitted	VMware hypervisor network output statistics (errors).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},errors]
Hypervisor ping	Checks if the hypervisor is running and accepting ICMP pings. One of the following: - Down; - Up.	Simple check	icmpping[] Preprocessing Discard unchanged with heartbeat: `10m`
Cluster name	Cluster name of the guest VM.	Simple check	vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU usage	Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected.	Simple check	vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU usage in percent	CPU usage as a percentage during the interval.	Simple check	vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU utilization	CPU utilization as a percentage during the interval depends on power management or hyper-threading.	Simple check	vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Power usage	Current power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Power usage maximum allowed	Maximum allowed power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing Discard unchanged with heartbeat: `6h`
Datacenter name	Datacenter name of the hypervisor.	Simple check	vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Full name	The complete product name, including the version information.	Simple check	vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU frequency	The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and the number of cores is approximately equal to the sum of the MHz for all the individual cores on the host.	Simple check	vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU model	The CPU model.	Simple check	vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU cores	Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package.	Simple check	vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU threads	Number of physical CPU threads on the host.	Simple check	vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Total memory	The physical memory size.	Simple check	vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Model	The system model identification.	Simple check	vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Bios UUID	The hardware BIOS identification.	Simple check	vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Vendor	The hardware vendor identification.	Simple check	vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs.	Simple check	vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Used memory	Physical memory usage on the host.	Simple check	vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Number of bytes received	VMware hypervisor network input statistics (bytes per second).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
Number of bytes transmitted	VMware hypervisor network output statistics (bytes per second).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
Overall status	The overall alarm status of the host. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem.	Simple check	vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Uptime	System uptime.	Simple check	vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Version	Dot-separated version string.	Simple check	vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Number of guest VMs	Number of guest virtual machines.	Simple check	vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Get sensors	Master item for sensor data.	Simple check	vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Triggers

Name	Description	Expression	Severity
VMware Hypervisor: Hypervisor is down	The service is unavailable or is not accepting ICMP pings.	`last(/VMware Hypervisor/icmpping[])=0`\|Average	Manual close: Yes
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3`\|High
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might soon become overloaded.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2`\|Average	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red
VMware Hypervisor: Hypervisor has been restarted	Uptime is less than 10 minutes.	`last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m`\|Warning	Manual close: Yes

LLD rule Network interface discovery

Name	Description	Type	Key and additional info
Network interface discovery	Discovery of VMware hypervisor network interfaces.	Simple check	vmware.hv.net.if.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Item prototypes for Network interface discovery

Name Description Type Key and additional info

[{#IFNAME}] network interface speed

VMware hypervisor network interface speed.

Simple check

vmware.hv.network.linkspeed[{$VMWARE.URL},{$VMWARE.HV.UUID},{#IFNAME}]

Preprocessing

Custom multiplier: 1000000

LLD rule Datastore discovery

Name	Description	Type	Key and additional info
Datastore discovery	Discovery of VMware datastores.	Simple check	vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Item prototypes for Datastore discovery

Name	Description	Type	Key and additional info
Average read IOPS of the datastore [{#DATASTORE}]	Average IOPS for a read operation from the datastore.	Simple check	vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps]
Average write IOPS of the datastore [{#DATASTORE}]	Average IOPS for a write operation to the datastore (milliseconds).	Simple check	vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps]
Average read latency of the datastore [{#DATASTORE}]	Average amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency]
Free space on datastore [{#DATASTORE}] (percentage)	VMware datastore free space (percentage from the total).	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree]
Total size of datastore [{#DATASTORE}]	VMware datastore space in bytes.	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Average write latency of the datastore [{#DATASTORE}]	Average amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency]
Multipath count for datastore [{#DATASTORE}]	Number of available datastore paths.	Simple check	vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}]

Trigger prototypes for Datastore discovery

Name	Description	Expression	Severity
VMware Hypervisor: [{#DATASTORE}]: Free space is critically low	Datastore free space has fallen below the critical threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT}`\|High
VMware Hypervisor: [{#DATASTORE}]: Free space is low	Datastore free space has fallen below the warning threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware Hypervisor: [{#DATASTORE}]: Free space is critically low
VMware Hypervisor: The multipath count has been changed	The number of available datastore paths is less than registered (`{#MULTIPATH.COUNT}`).	`last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}])<{#MULTIPATH.COUNT}`\|Average	Manual close: Yes

LLD rule Serial number discovery

Name Description Type Key and additional info

Serial number discovery

VMware hypervisor serial number discovery. This item works only with VMware hypervisor versions above 6.7.

Dependent item

vmware.hv.serial.number.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Serial number discovery

Name Description Type Key and additional info

Serial number

VMware hypervisor serial number.

Simple check

vmware.hv.hw.serialnumber[{$VMWARE.URL},{#VMWARE.HV.UUID}]

Preprocessing

Discard unchanged with heartbeat: 1d

LLD rule Healthcheck discovery

Name Description Type Key and additional info

Healthcheck discovery

VMware Rollup Health State sensor discovery.

Dependent item

vmware.hv.healthcheck.discovery

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Set value to: []
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Healthcheck discovery

Name Description Type Key and additional info

Health state rollup

The host's Rollup Health State sensor value. One of the following:

- Gray: Unknown;

- Green: OK;

- Yellow: It might have a problem;

- Red: It has a problem.

Dependent item

vmware.hv.sensor.health.state[{#SINGLETON}]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Healthcheck discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=3`\|High	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might soon become overloaded.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=2`\|Average	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red

LLD rule Sensor discovery

Name Description Type Key and additional info

Sensor discovery

VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system.

Dependent item

vmware.hv.sensors.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Sensor discovery

Name Description Type Key and additional info

Sensor [{#NAME}] health state

VMware hardware sensor health state. One of the following:

- Gray: Unknown;

- Green: OK;

- Yellow: It might have a problem;

- Red: It has a problem.

Dependent item

vmware.hv.sensor.state["{#NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Trigger prototypes for Sensor discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware Hypervisor: Sensor [{#NAME}] health state is Red	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3`\|High	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red
VMware Hypervisor: Sensor [{#NAME}] health state is Yellow	One or more components in the appliance might soon become overloaded.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2`\|Average	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow VMware Hypervisor: Sensor [{#NAME}] health state is Red

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_vmware

View README Download JSON

VMware

Overview

This template set is designed for the effortless deployment of VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.

The template "VMware Guest" is used in discovery and normally should not be manually linked to a host.
The template "VMware Hypervisor" can be used in discovery as well as manually linked to a host.

For additional information, please see Zabbix documentation on VM monitoring.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

VMware 6.0, 6.7, 7.0, 8.0

Configuration

Setup

Compile Zabbix server with the required options (--with-libxml2 and --with-libcurl)
Set the StartVMwareCollectors option in the Zabbix server configuration file to "1" or more
Create a new host
If you want to use a separate user for monitoring, make sure that the user is a member of the SystemConfiguration.ReadOnly and vStatsGroup groups Set the host macros (on the host or template level) required for VMware authentication:

{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}

Link the template to the host created earlier

Note: To enable discovery of hardware sensors of VMware hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY} to the value true on the discovered host level.

Additional resources:

How to create a custom performance counter
How to get all supported counters and generate a path for the custom performance counter

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service `{$USERNAME}` user password.
{$VMWARE.PROXY}	Sets the HTTP proxy for script items. If this parameter is empty, then no proxy is used.
{$VMWARE.HV.SENSOR.DISCOVERY}	Set "true"/"false" to enable or disable monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to be allowed in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to be ignored in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.VM.POWERSTATE}	Possibility to filter out VMs by power state.	`poweredOn\|poweredOff\|suspended`
{$VMWARE.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`

Items

Name	Description	Type	Key and additional info
Get alarms	Get alarm status.	Simple check	vmware.alarms.get[{$VMWARE.URL}]
Event log	Collect VMware event log. See also: https://www.zabbix.com/documentation/7.0/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords	Simple check	vmware.eventlog[{$VMWARE.URL},skip]
Full name	VMware service full name.	Simple check	vmware.fullname[{$VMWARE.URL}] Preprocessing Discard unchanged with heartbeat: `1d`
Version	VMware service version.	Simple check	vmware.version[{$VMWARE.URL}] Preprocessing Discard unchanged with heartbeat: `1d`
Get Overall Health VC State	Gets overall health of the system. This item works only with VMware vCenter versions above 6.5.	Script	vmware.health.get
Overall Health VC State error check	Data collection error check.	Dependent item	vmware.health.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Overall Health VC State	VMware Overall health of system. One of the following: - Gray: No health data is available for this service. - Green: Service is healthy. - Yellow: The service is in a healthy state, but experiencing some level of problems. - Orange: The service health is degraded. The service might have serious problems. - Red: The service is unavailable, not functioning properly, or will stop functioning soon. - Not available: The health status is unavailable (not supported on the vCenter or ESXi side).	Dependent item	vmware.health.state Preprocessing JSON Path: `$.health` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
VMware: Failed to get Overall Health VC State	Failed to get data. Check debug log for more information.	`length(last(/VMware/vmware.health.check))>0`\|Warning
VMware: Overall Health VC State is not Green	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive.	`last(/VMware/vmware.health.state)>0 and last(/VMware/vmware.health.state)<>6`\|Average

LLD rule VMware alarm discovery

Name Description Type Key and additional info

VMware alarm discovery

Discovery of alarms.

Dependent item

vmware.alarms.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for VMware alarm discovery

Name Description Type Key and additional info

{#VMWARE.ALARMS.NAME}

VMware alarm status.

Dependent item

vmware.alarms.status["{#VMWARE.ALARMS.KEY}"]

Preprocessing

JSON Path: $.[?(@.key == "{#VMWARE.ALARMS.KEY}")].key.first()
⛔️Custom on fail: Set value to: -1
Discard unchanged with heartbeat: 1h

Trigger prototypes for VMware alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: {#VMWARE.ALARMS.NAME}	{#VMWARE.ALARMS.DESC}	`last(/VMware/vmware.alarms.status["{#VMWARE.ALARMS.KEY}"])<>-1`\|Not_classified

LLD rule VMware cluster discovery

Name	Description	Type	Key and additional info
VMware cluster discovery	Discovery of clusters.	Simple check	vmware.cluster.discovery[{$VMWARE.URL}]

Item prototypes for VMware cluster discovery

Name

Description

Type

Key and additional info

Status of [{#CLUSTER.NAME}] cluster

VMware cluster status. One of the following:

- Gray: Unknown;

- Green: OK;

- Yellow: It might have a problem;

- Red: It has a problem.

Simple check

vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}]

Trigger prototypes for VMware cluster discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: The [{#CLUSTER.NAME}] status is Red	A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, when resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html	`last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3`\|High
VMware: The [{#CLUSTER.NAME}] status is Yellow	A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all the resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html	`last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2`\|Average	Depends on: VMware: The [{#CLUSTER.NAME}] status is Red

LLD rule VMware datastore discovery

Name	Description	Type	Key and additional info
VMware datastore discovery	Discovery of VMware datastores.	Simple check	vmware.datastore.discovery[{$VMWARE.URL}]

Item prototypes for VMware datastore discovery

Name	Description	Type	Key and additional info
Average read IOPS of the datastore [{#DATASTORE}]	IOPS for a read operation from the datastore (milliseconds).	Simple check	vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},rps]
Average write IOPS of the datastore [{#DATASTORE}]	IOPS for a write operation to the datastore (milliseconds).	Simple check	vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},rps]
Average read latency of the datastore [{#DATASTORE}]	Amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},latency]
Free space on datastore [{#DATASTORE}] (percentage)	VMware datastore free space (percentage from the total).	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree]
Total size of datastore [{#DATASTORE}]	VMware datastore space in bytes.	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID}] Preprocessing Discard unchanged with heartbeat: `1h`
Average write latency of the datastore [{#DATASTORE}]	Amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},latency]

Trigger prototypes for VMware datastore discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: [{#DATASTORE}]: Free space is critically low	Datastore free space has fallen below the critical threshold.	`last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT}`\|High
VMware: [{#DATASTORE}]: Free space is low	Datastore free space has fallen below the warning threshold.	`last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware: [{#DATASTORE}]: Free space is critically low

LLD rule VMware hypervisor discovery

Name	Description	Type	Key and additional info
VMware hypervisor discovery	Discovery of hypervisors.	Simple check	vmware.hv.discovery[{$VMWARE.URL}]

LLD rule VMware VM discovery

Name	Description	Type	Key and additional info
VMware VM discovery	Discovery of guest virtual machines.	Simple check	vmware.vm.discovery[{$VMWARE.URL}]

VMware Guest

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service `{$USERNAME}` user password.
{$VMWARE.VM.FS.PFREE.MIN.WARN}	VMware guest free space threshold for the warning trigger.	`20`
{$VMWARE.VM.FS.PFREE.MIN.CRIT}	VMware guest free space threshold for the critical trigger.	`10`
{$VMWARE.VM.FS.TRIGGER.USED}	VMware guest used free space trigger. Set to "1"/"0" to enable or disable the trigger.	`0`

Items

Name	Description	Type	Key and additional info
Snapshot consolidation needed	Displays whether snapshot consolidation is needed or not. One of the following: - True; - False.	Simple check	vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Snapshot count	Snapshot count of the guest VM.	Dependent item	vmware.vm.snapshot.count Preprocessing JSON Path: `$.count` Discard unchanged with heartbeat: `1d`
Get snapshots	Snapshots of the guest VM.	Simple check	vmware.vm.snapshot.get[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Snapshot latest date	Latest snapshot date of the guest VM.	Dependent item	vmware.vm.snapshot.latestdate Preprocessing JSON Path: `$.latestdate` Discard unchanged with heartbeat: `1d`
VM state	VMware virtual machine state. One of the following: - Not running; - Resetting; - Running; - Shutting down; - Standby; - Unknown.	Simple check	vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
VMware Tools status	Monitoring of VMware Tools. One of the following: - Guest tools executing scripts: VMware Tools is starting. - Guest tools not running: VMware Tools is not running. - Guest tools running: VMware Tools is running.	Simple check	vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
VMware Tools version	Monitoring of the VMware Tools version.	Simple check	vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},version] Preprocessing Discard unchanged with heartbeat: `12h`
Cluster name	Cluster name of the guest VM.	Simple check	vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Number of virtual CPUs	Number of virtual CPUs assigned to the guest.	Simple check	vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU ready	Time that the VM was ready, but unable to get scheduled to run on the physical CPU during the last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds).	Simple check	vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU usage	Current upper-bound on CPU usage. The upper-bound is based on the host the VM is current running on, as well as limits configured on the VM itself or any parent resource pool. Valid while the VM is running.	Simple check	vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Datacenter name	Datacenter name of the guest VM.	Simple check	vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Hypervisor name	Hypervisor name of the guest VM.	Simple check	vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver.	Simple check	vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Compressed memory	The amount of memory currently in the compression cache for this VM.	Simple check	vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Private memory	Amount of memory backed by host memory and not being shared.	Simple check	vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Shared memory	The amount of guest physical memory shared through transparent page sharing.	Simple check	vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Swapped memory	The amount of guest physical memory swapped out to the VM's swap device by ESX.	Simple check	vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Guest memory usage	The amount of guest physical memory that is being used by the VM.	Simple check	vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Host memory usage	The amount of host physical memory allocated to the VM, accounting for the amount saved from memory sharing with other VMs.	Simple check	vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Memory size	Total size of configured memory.	Simple check	vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Power state	The current power state of the VM. One of the following: - Powered off; - Powered on; - Suspended.	Simple check	vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1h`
Committed storage space	Total storage space, in bytes, committed to this VM across all datastores.	Simple check	vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Uncommitted storage space	Additional storage space, in bytes, potentially used by this VM on all datastores.	Simple check	vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Unshared storage space	Total storage space, in bytes, occupied by the VM across all datastores that is not shared with any other VM.	Simple check	vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Uptime	System uptime.	Simple check	vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Guest memory swapped	Amount of guest physical memory that is swapped out to the swap space.	Simple check	vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Host memory consumed	Amount of host physical memory consumed for backing up guest physical memory pages.	Simple check	vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Host memory usage in percent	Percentage of host physical memory that has been consumed.	Simple check	vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU usage in percent	CPU usage as a percentage during the interval.	Simple check	vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU latency in percent	Percentage of time the VM is unable to run because it is contending for access to the physical CPU(s).	Simple check	vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU readiness latency in percent	Percentage of time that the virtual machine was ready, but was unable to get scheduled to run on the physical CPU.	Simple check	vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}]
CPU swap-in latency in percent	Percentage of CPU time spent waiting for a swap-in.	Simple check	vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}]
Uptime of guest OS	Total time elapsed since the last operating system boot-up (in seconds).	Simple check	vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Triggers

Name	Description	Expression	Severity
VMware Guest: Snapshot consolidation needed	Snapshot consolidation needed.	`last(/VMware Guest/vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}])=0`\|Average	Manual close: Yes
VMware Guest: VM is not running	VMware virtual machine is not running.	`last(/VMware Guest/vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}]) <> 2`\|Average
VMware Guest: VMware Tools is not running	VMware Tools is not running on the VM.	`last(/VMware Guest/vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status]) = 1`\|Warning	Depends on: VMware Guest: VM is not running
VMware Guest: VM has been restarted	Uptime is less than 10 minutes.	`(between(last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/VMware Guest/vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1) and last(/VMware Guest/vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}]) = 1`\|Warning	Manual close: Yes

LLD rule Network device discovery

Name	Description	Type	Key and additional info
Network device discovery	Discovery of all network devices.	Simple check	vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Network device discovery

Name	Description	Type	Key and additional info
Number of bytes received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface input statistics (bytes per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
Number of packets received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface input statistics (packets per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
Number of bytes transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface output statistics (bytes per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
Number of packets transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network interface output statistics (packets per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
Network utilization on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}]	VMware virtual machine network utilization (combined transmit and receive rates) during the interval.	Simple check	vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}]

LLD rule Disk device discovery

Name	Description	Type	Key and additional info
Disk device discovery	Discovery of all disk devices.	Simple check	vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Disk device discovery

Name	Description	Type	Key and additional info
Average number of bytes read from the disk [{#DISKDESC}]	VMware virtual machine disk device read statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
Average number of reads from the disk [{#DISKDESC}]	VMware virtual machine disk device read statistics (operations per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
Average number of bytes written to the disk [{#DISKDESC}]	VMware virtual machine disk device write statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
Average number of writes to the disk [{#DISKDESC}]	VMware virtual machine disk device write statistics (operations per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
Average number of outstanding read requests to the disk [{#DISKDESC}]	Average number of outstanding read requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
Average number of outstanding write requests to the disk [{#DISKDESC}]	Average number of outstanding write requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
Average write latency to the disk [{#DISKDESC}]	The average time a write to the virtual disk takes.	Simple check	vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
Average read latency from the disk [{#DISKDESC}]	The average time a read from the virtual disk takes.	Simple check	vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]

LLD rule Mounted filesystem discovery

Name	Description	Type	Key and additional info
Mounted filesystem discovery	Discovery of all guest file systems.	Simple check	vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Mounted filesystem discovery

Name	Description	Type	Key and additional info
Free disk space on [{#FSNAME}]	VMware virtual machine file system statistics (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free]
Free disk space on [{#FSNAME}] (percentage)	VMware virtual machine file system statistics (percentage).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree]
Total disk space on [{#FSNAME}]	VMware virtual machine total disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing Discard unchanged with heartbeat: `1d`
Used disk space on [{#FSNAME}]	VMware virtual machine used disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used]

Trigger prototypes for Mounted filesystem discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware Guest: [{#FSNAME}]: Disk space is critically low	The disk free space on [{#FSNAME}] has been less than `{$VMWARE.VM.FS.PFREE.MIN.CRIT:"{#FSNAME}"}`% for 5m.	`max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.CRIT:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1`\|Average	Manual close: Yes
VMware Guest: [{#FSNAME}]: Disk space is low	The disk free space on [{#FSNAME}] has been less than `{$VMWARE.VM.FS.PFREE.MIN.WARN:"{#FSNAME}"}`% for 5m.	`max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.WARN:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1`\|Warning	Manual close: Yes Depends on: VMware Guest: [{#FSNAME}]: Disk space is critically low

VMware Hypervisor

Overview

This template is designed for the effortless deployment of VMware ESX hypervisor monitoring and doesn't require any external scripts.

This template can be used in discovery as well as manually linked to a host.

For additional information, please see Zabbix documentation on VM monitoring.

To use this template as manually linked to a host, attach it to the host and manually set the value of the {$VMWARE.HV.UUID} macro.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

VMware 6.0, 6.7, 7.0, 8.0

Configuration

Setup

To use this template as manually linked to a host:

Compile Zabbix server with the required options (--with-libxml2 and --with-libcurl)
Set the StartVMwareCollectors option in the Zabbix server configuration file to "1" or more
Create a new host
Set the host macros (on the host or template level) required for VMware authentication:
```
{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
```
To get the hypervisor UUID, enable access to the hypervisor via SSH and log in via SSH using a valid login and password.
Run the following command and specify the UUID in the macro {$VMWARE.HV.UUID}:
```
vim-cmd hostsvc/hostsummary | grep uuid
```
Add the agent interface on the host with the address (IP or DNS) of the VMware hypervisor
Link the template to the host created earlier

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service `{$USERNAME}` user password.
{$VMWARE.HV.SENSOR.DISCOVERY}	Set to "true"/"false" to enable or disable the monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to be allowed in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to be ignored in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.HV.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.HV.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`
{$VMWARE.HV.UUID}	UUID of hypervisor.

Items

Name	Description	Type	Key and additional info
Connection state	VMware hypervisor connection state. One of the following: - Connected; - Disconnected; - Not responding.	Simple check	vmware.hv.connectionstate[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Number of errors received	VMware hypervisor network input statistics (errors).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},errors]
Number of broadcasts received	VMware hypervisor network input statistics (broadcasts).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast]
Number of dropped received packets	VMware hypervisor network input statistics (packets dropped).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped]
Number of broadcasts transmitted	VMware hypervisor network output statistics (broadcasts).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast]
Number of dropped transmitted packets	VMware hypervisor network output statistics (packets dropped).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped]
Number of errors transmitted	VMware hypervisor network output statistics (errors).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},errors]
Hypervisor ping	Checks if the hypervisor is running and accepting ICMP pings. One of the following: - Down; - Up.	Simple check	icmpping[] Preprocessing Discard unchanged with heartbeat: `10m`
Cluster name	Cluster name of the guest VM.	Simple check	vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU usage	Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected.	Simple check	vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU usage in percent	CPU usage as a percentage during the interval.	Simple check	vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU utilization	CPU utilization as a percentage during the interval depends on power management or hyper-threading.	Simple check	vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Power usage	Current power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Power usage maximum allowed	Maximum allowed power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing Discard unchanged with heartbeat: `6h`
Datacenter name	Datacenter name of the hypervisor.	Simple check	vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Full name	The complete product name, including the version information.	Simple check	vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU frequency	The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and the number of cores is approximately equal to the sum of the MHz for all the individual cores on the host.	Simple check	vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU model	The CPU model.	Simple check	vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
CPU cores	Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package.	Simple check	vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
CPU threads	Number of physical CPU threads on the host.	Simple check	vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Total memory	The physical memory size.	Simple check	vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Model	The system model identification.	Simple check	vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Bios UUID	The hardware BIOS identification.	Simple check	vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Vendor	The hardware vendor identification.	Simple check	vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs.	Simple check	vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Used memory	Physical memory usage on the host.	Simple check	vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Number of bytes received	VMware hypervisor network input statistics (bytes per second).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
Number of bytes transmitted	VMware hypervisor network output statistics (bytes per second).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
Overall status	The overall alarm status of the host. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem.	Simple check	vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Uptime	System uptime.	Simple check	vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Version	Dot-separated version string.	Simple check	vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Number of guest VMs	Number of guest virtual machines.	Simple check	vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}]
Get sensors	Master item for sensor data.	Simple check	vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Triggers

Name	Description	Expression	Severity
VMware Hypervisor: Hypervisor is down	The service is unavailable or is not accepting ICMP pings.	`last(/VMware Hypervisor/icmpping[])=0`\|Average	Manual close: Yes
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3`\|High
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might soon become overloaded.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2`\|Average	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red
VMware Hypervisor: Hypervisor has been restarted	Uptime is less than 10 minutes.	`last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m`\|Warning	Manual close: Yes

LLD rule Network interface discovery

Name	Description	Type	Key and additional info
Network interface discovery	Discovery of VMware hypervisor network interfaces.	Simple check	vmware.hv.net.if.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Item prototypes for Network interface discovery

Name Description Type Key and additional info

[{#IFNAME}] network interface speed

VMware hypervisor network interface speed.

Simple check

vmware.hv.network.linkspeed[{$VMWARE.URL},{$VMWARE.HV.UUID},{#IFNAME}]

Preprocessing

Custom multiplier: 1000000

LLD rule Datastore discovery

Name	Description	Type	Key and additional info
Datastore discovery	Discovery of VMware datastores.	Simple check	vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Item prototypes for Datastore discovery

Name	Description	Type	Key and additional info
Average read IOPS of the datastore [{#DATASTORE}]	Average IOPS for a read operation from the datastore.	Simple check	vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps]
Average write IOPS of the datastore [{#DATASTORE}]	Average IOPS for a write operation to the datastore (milliseconds).	Simple check	vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps]
Average read latency of the datastore [{#DATASTORE}]	Average amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency]
Free space on datastore [{#DATASTORE}] (percentage)	VMware datastore free space (percentage from the total).	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree]
Total size of datastore [{#DATASTORE}]	VMware datastore space in bytes.	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
Average write latency of the datastore [{#DATASTORE}]	Average amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency]
Multipath count for datastore [{#DATASTORE}]	Number of available datastore paths.	Simple check	vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}]

Trigger prototypes for Datastore discovery

Name	Description	Expression	Severity
VMware Hypervisor: [{#DATASTORE}]: Free space is critically low	Datastore free space has fallen below the critical threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT}`\|High
VMware Hypervisor: [{#DATASTORE}]: Free space is low	Datastore free space has fallen below the warning threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware Hypervisor: [{#DATASTORE}]: Free space is critically low
VMware Hypervisor: The multipath count has been changed	The number of available datastore paths is less than registered (`{#MULTIPATH.COUNT}`).	`last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}])<{#MULTIPATH.COUNT}`\|Average	Manual close: Yes

LLD rule Serial number discovery

Name Description Type Key and additional info

Serial number discovery

VMware hypervisor serial number discovery. This item works only with VMware hypervisor versions above 6.7.

Dependent item

vmware.hv.serial.number.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Serial number discovery

Name Description Type Key and additional info

Serial number

VMware hypervisor serial number.

Simple check

vmware.hv.hw.serialnumber[{$VMWARE.URL},{#VMWARE.HV.UUID}]

Preprocessing

Discard unchanged with heartbeat: 1d

LLD rule Healthcheck discovery

Name Description Type Key and additional info

Healthcheck discovery

VMware Rollup Health State sensor discovery.

Dependent item

vmware.hv.healthcheck.discovery

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Set value to: []
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Healthcheck discovery

Name Description Type Key and additional info

Health state rollup

The host's Rollup Health State sensor value. One of the following:

- Gray: Unknown;

- Green: OK;

- Yellow: It might have a problem;

- Red: It has a problem.

Dependent item

vmware.hv.sensor.health.state[{#SINGLETON}]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Healthcheck discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=3`\|High	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might soon become overloaded.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=2`\|Average	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red

LLD rule Sensor discovery

Name Description Type Key and additional info

Sensor discovery

VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system.

Dependent item

vmware.hv.sensors.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Sensor discovery

Name Description Type Key and additional info

Sensor [{#NAME}] health state

VMware hardware sensor health state. One of the following:

- Gray: Unknown;

- Green: OK;

- Yellow: It might have a problem;

- Red: It has a problem.

Dependent item

vmware.hv.sensor.state["{#NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Trigger prototypes for Sensor discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware Hypervisor: Sensor [{#NAME}] health state is Red	One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3`\|High	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red
VMware Hypervisor: Sensor [{#NAME}] health state is Yellow	One or more components in the appliance might soon become overloaded.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2`\|Average	Depends on: VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow VMware Hypervisor: Sensor [{#NAME}] health state is Red

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

veeam_enterprise_manager_http

View README Download JSON

Veeam Backup Enterprise Manager by HTTP

Overview

It works without any external scripts and uses the script item.

NOTE: Veeam Backup Enterprise Manager REST API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:

Veeam Universal License (VUL) editions:

Foundation
Advanced
Premium

Veeam Socket License editions:

Enterprise Socket
Enterprise Plus Socket

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Veeam Backup and Replication, version 11.0

Configuration

Setup

Create a user to monitor the service, or use an existing read-only account. Similarly to the user authentication in the Veeam Backup Enterprise Manager Web UI, the client authentication in the REST API dictates which operations a client is allowed to perform when working with the REST API. That is, if the client is authenticated using an account that does not have enough permissions to perform some actions, it will not be able to execute them. You can also obtain the collected jobs if you are logged in under an account having only Portal Administrator role. > See Veeam Help Center for more details.
Link the template to a host.
Configure the following macros: {$VEEAM.MANAGER.API.URL}, {$VEEAM.MANAGER.USER}, {$VEEAM.MANAGER.PASSWORD}.

Macros used

Name	Description	Default
{$VEEAM.MANAGER.API.URL}	Veeam Backup Enterprise Manager API endpoint is a URL in the format: `<scheme>://<host>:<port>`.	`https://localhost:9398`
{$VEEAM.MANAGER.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$VEEAM.MANAGER.PASSWORD}	The `password` of the Veeam Backup Enterprise Manager account.
{$VEEAM.MANAGER.USER}	The `user name` of the Veeam Backup Enterprise Manager account .
{$VEEAM.MANAGER.DATA.TIMEOUT}	A response timeout for API.	`10`
{$BACKUP.TYPE.MATCHES}	This macro is used in backup discovery rule.	`.*`
{$BACKUP.TYPE.NOT_MATCHES}	This macro is used in backup discovery rule.	`CHANGE_IF_NEEDED`
{$BACKUP.NAME.MATCHES}	This macro is used in backup discovery rule.	`.*`
{$BACKUP.NAME.NOT_MATCHES}	This macro is used in backup discovery rule.	`CHANGE_IF_NEEDED`
{$VEEAM.MANAGER.JOB.MAX.WARN}	The maximum score of warning jobs (for a trigger expression).	`10`
{$VEEAM.MANAGER.JOB.MAX.FAIL}	The maximum score of failed jobs (for a trigger expression).	`5`

Items

Name	Description	Type	Key and additional info
Get metrics	The result of API requests is expressed in the JSON.	Script	veeam.manager.get.metrics
Get errors	The errors from API requests.	Dependent item	veeam.manager.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `1h`
Running Jobs	Informs about the running jobs.	Dependent item	veeam.manager.running.jobs Preprocessing JSON Path: `$.JobStatistics.RunningJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Scheduled Jobs	Informs about the scheduled jobs.	Dependent item	veeam.manager.scheduled.jobs Preprocessing JSON Path: `$.JobStatistics.ScheduledJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Scheduled Backup Jobs	Informs about the scheduled backup jobs.	Dependent item	veeam.manager.scheduled.backup.jobs Preprocessing JSON Path: `$.JobStatistics.ScheduledBackupJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Scheduled Replica Jobs	Informs about the scheduled replica jobs.	Dependent item	veeam.manager.scheduled.replica.jobs Preprocessing JSON Path: `$.JobStatistics.ScheduledReplicaJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Total Job Runs	Informs about the total job runs.	Dependent item	veeam.manager.scheduled.total.jobs Preprocessing JSON Path: `$.JobStatistics.TotalJobRuns` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Warnings Job Runs	Informs about the warning job runs.	Dependent item	veeam.manager.warning.jobs Preprocessing JSON Path: `$.JobStatistics.WarningsJobRuns` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Failed Job Runs	Informs about the failed job runs.	Dependent item	veeam.manager.failed.jobs Preprocessing JSON Path: `$.JobStatistics.FailedJobRuns` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Veeam Backup: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.get.errors))>0`\|Average
Veeam Backup: Warning job runs is too high		`last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.warning.jobs)>{$VEEAM.MANAGER.JOB.MAX.WARN}`\|Warning	Manual close: Yes
Veeam Backup: Failed job runs is too high		`last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.failed.jobs)>{$VEEAM.MANAGER.JOB.MAX.FAIL}`\|Average	Manual close: Yes

LLD rule Backup Files discovery

Name Description Type Key and additional info

Backup Files discovery

Discovery of all backup files created on, or imported to the backup servers that are connected to Veeam Backup Enterprise Manager.

Dependent item

veeam.backup.files.discovery

Preprocessing

JSON Path: $.backupFiles.Refs
Discard unchanged with heartbeat: 6h

Item prototypes for Backup Files discovery

Name	Description	Type	Key and additional info
Backup Size [{#NAME}]	Gets the backup size with the name `[{#NAME}]`.	Dependent item	veeam.backup.file.size[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.BackupSize` ⛔️Custom on fail: Discard value
Data Size [{#NAME}]	Gets the data size with the name `[{#NAME}]`.	Dependent item	veeam.backup.data.size[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.DataSize` ⛔️Custom on fail: Discard value
Compression ratio [{#NAME}]	Gets the data compression ratio with the name `[{#NAME}]`.	Dependent item	veeam.backup.compress.ratio[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.CompressRatio` ⛔️Custom on fail: Discard value
Deduplication Ratio [{#NAME}]	Gets the data deduplication ratio with the name `[{#NAME}]`.	Dependent item	veeam.backup.deduplication.ratio[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.DeduplicationRatio` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

veeam_backup_replication_http

View README Download JSON

Veeam Backup and Replication by HTTP

Overview

This template is designed to monitor Veeam Backup and Replication. It works without any external scripts and uses the script item.

NOTE: Since the RESTful API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:

Veeam Universal License (VUL) editions:

Foundation
Advanced
Premium

Veeam Socket License editions:

Enterprise Plus Socket

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Veeam Backup and Replication, version 11.0

Configuration

Setup

Create a user to monitor the service or use an existing read-only account. > See Veeam Help Center for more details.
Link the template to a host.
Configure the following macros: {$VEEAM.API.URL}, {$VEEAM.USER}, and {$VEEAM.PASSWORD}.

Macros used

Name	Description	Default
{$VEEAM.API.URL}	The Veeam API endpoint is a URL in the format `<scheme>://<host>:<port>`.	`https://localhost:9419`
{$VEEAM.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$VEEAM.PASSWORD}	The `password` of the Veeam Backup and Replication account. It is used to obtain an access token.
{$VEEAM.USER}	The `username` of the Veeam Backup and Replication account. It is used to obtain an access token.
{$VEEAM.DATA.TIMEOUT}	A response timeout for the API.	`10`
{$CREATED.AFTER}	Returns sessions that are created after chosen days.	`7`
{$SESSION.NAME.MATCHES}	This macro is used in discovery rule to evaluate sessions.	`.*`
{$SESSION.NAME.NOT_MATCHES}	This macro is used in discovery rule to evaluate sessions.	`CHANGE_IF_NEEDED`
{$SESSION.TYPE.MATCHES}	This macro is used in discovery rule to evaluate sessions.	`.*`
{$SESSION.TYPE.NOT_MATCHES}	This macro is used in discovery rule to evaluate sessions.	`CHANGE_IF_NEEDED`
{$PROXIES.NAME.MATCHES}	This macro is used in proxies discovery rule.	`.*`
{$PROXIES.NAME.NOT_MATCHES}	This macro is used in proxies discovery rule.	`CHANGE_IF_NEEDED`
{$PROXIES.TYPE.MATCHES}	This macro is used in proxies discovery rule.	`.*`
{$PROXIES.TYPE.NOT_MATCHES}	This macro is used in proxies discovery rule.	`CHANGE_IF_NEEDED`
{$REPOSITORIES.NAME.MATCHES}	This macro is used in repositories discovery rule.	`.*`
{$REPOSITORIES.NAME.NOT_MATCHES}	This macro is used in repositories discovery rule.	`CHANGE_IF_NEEDED`
{$REPOSITORIES.TYPE.MATCHES}	This macro is used in repositories discovery rule.	`.*`
{$REPOSITORIES.TYPE.NOT_MATCHES}	This macro is used in repositories discovery rule.	`CHANGE_IF_NEEDED`
{$JOB.NAME.MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`.*`
{$JOB.NAME.NOT_MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`CHANGE_IF_NEEDED`
{$JOB.TYPE.MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`.*`
{$JOB.TYPE.NOT_MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`CHANGE_IF_NEEDED`
{$JOB.STATUS.MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`.*`
{$JOB.STATUS.NOT_MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Get metrics

The result of API requests is expressed in the JSON.

Script

veeam.get.metrics

Get errors

The errors from API requests.

Dependent item

veeam.get.errors

Preprocessing

JSON Path: $.error
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: 1h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Veeam Backup: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Veeam Backup and Replication by HTTP/veeam.get.errors))>0`\|Average

LLD rule Proxies discovery

Name Description Type Key and additional info

Proxies discovery

Discovery of proxies.

Dependent item

veeam.proxies.discovery

Preprocessing

JSON Path: $.proxies.data
Discard unchanged with heartbeat: 6h

Item prototypes for Proxies discovery

Name	Description	Type	Key and additional info
Server [{#NAME}]: Get data	Gets raw data collected by the proxy server.	Dependent item	veeam.proxy.server.raw[{#NAME}] Preprocessing JSON Path: `$.managedServers.data.[?(@.id=='{#HOSTID}')].first()`
Proxy [{#NAME}] [{#TYPE}]: Get data	Gets raw data collected by the proxy with the name `[{#NAME}]`, `[{#TYPE}]`.	Dependent item	veeam.proxy.raw[{#NAME}] Preprocessing JSON Path: `$.proxies.data.[?(@.id=='{#ID}')].first()`
Proxy [{#NAME}] [{#TYPE}]: Max Task Count	The maximum number of concurrent tasks.	Dependent item	veeam.proxy.maxtask[{#NAME}] Preprocessing JSON Path: `$.server.maxTaskCount`
Proxy [{#NAME}] [{#TYPE}]: Host name	The name of the proxy server.	Dependent item	veeam.proxy.server.name[{#NAME}] Preprocessing JSON Path: `$.name`
Proxy [{#NAME}] [{#TYPE}]: Host type	The type of the proxy server.	Dependent item	veeam.proxy.server.type[{#NAME}] Preprocessing JSON Path: `$.type`

LLD rule Repositories discovery

Name Description Type Key and additional info

Repositories discovery

Discovery of repositories.

Dependent item

veeam.repositories.discovery

Preprocessing

JSON Path: $.repositories_states.data
Discard unchanged with heartbeat: 6h

Item prototypes for Repositories discovery

Name Description Type Key and additional info

Repository [{#NAME}] [{#TYPE}]: Get data

Gets raw data from repository with the name: [{#NAME}], [{#TYPE}].

Dependent item

veeam.repositories.raw[{#NAME}]

Preprocessing

JSON Path: $.repositories_states.data.[?(@.id=='{#ID}')].first()

Repository [{#NAME}] [{#TYPE}]: Used space [{#PATH}]

Used space by repositories expressed in gigabytes (GB).

Dependent item

veeam.repository.capacity[{#NAME}]

Preprocessing

JSON Path: $.usedSpaceGB

Repository [{#NAME}] [{#TYPE}]: Free space [{#PATH}]

Free space of repositories expressed in gigabytes (GB).

Dependent item

veeam.repository.free.space[{#NAME}]

Preprocessing

JSON Path: $.freeGB

LLD rule Sessions discovery

Name Description Type Key and additional info

Sessions discovery

Discovery of sessions.

Dependent item

veeam.sessions.discovery

Preprocessing

JSON Path: $.sessions.data
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Sessions discovery

Name	Description	Type	Key and additional info
Session [{#NAME}] [{#TYPE}]: Get data	Gets raw data from session with the name: `[{#NAME}]`, `[{#TYPE}]`.	Dependent item	veeam.sessions.raw[{#ID}] Preprocessing JSON Path: `$.sessions.data.[?(@.id=='{#ID}')].first()` ⛔️Custom on fail: Discard value
Session [{#NAME}] [{#TYPE}]: State	The state of the session. The enums used: `Stopped`, `Starting`, `Stopping`, `Working`, `Pausing`, `Resuming`, `WaitingTape`, `Idle`, `Postprocessing`, `WaitingRepository`, `WaitingSlot`.	Dependent item	veeam.sessions.state[{#ID}] Preprocessing JSON Path: `$.state`
Session [{#NAME}] [{#TYPE}]: Result	The result of the session. The enums used: `None`, `Success`, `Warning`, `Failed`.	Dependent item	veeam.sessions.result[{#ID}] Preprocessing JSON Path: `$.result.result`
Session [{#NAME}] [{#TYPE}]: Message	A message that explains the session result.	Dependent item	veeam.sessions.message[{#ID}] Preprocessing JSON Path: `$.result.message`
Session progress percent [{#NAME}] [{#TYPE}]	The progress of the session expressed as percentage.	Dependent item	veeam.sessions.progress.percent[{#ID}] Preprocessing JSON Path: `$.progressPercent`

Trigger prototypes for Sessions discovery

Name	Description	Expression	Severity	Dependencies and additional info
Veeam Backup: Last result session failed		`find(/Veeam Backup and Replication by HTTP/veeam.sessions.result[{#ID}],,"like","Failed")=1`\|Average	Manual close: Yes

LLD rule Jobs states discovery

Name Description Type Key and additional info

Jobs states discovery

Discovery of the jobs states.

Dependent item

veeam.job.state.discovery

Preprocessing

JSON Path: $.jobs_states.data
Discard unchanged with heartbeat: 6h

Item prototypes for Jobs states discovery

Name Description Type Key and additional info

Job states [{#NAME}] [{#TYPE}]: Get data

Gets raw data from the job states with the name [{#NAME}].

Dependent item

veeam.jobs.states.raw[{#ID}]

Preprocessing

JSON Path: $.jobs_states.data.[?(@.id=='{#ID}')].first()

Job states [{#NAME}] [{#TYPE}]: Status

The current status of the job. The enums used: running, inactive, disabled.

Dependent item

veeam.jobs.status[{#ID}]

Preprocessing

JSON Path: $.status

Job states [{#NAME}] [{#TYPE}]: Last result

The result of the session. The enums used: None, Success, Warning, Failed.

Dependent item

veeam.jobs.last.result[{#ID}]

Preprocessing

JSON Path: $.lastResult

Trigger prototypes for Jobs states discovery

Name	Description	Expression	Severity	Dependencies and additional info
Veeam Backup: Last result job failed		`find(/Veeam Backup and Replication by HTTP/veeam.jobs.last.result[{#ID}],,"like","Failed")=1`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_vault_http

View README Download JSON

HashiCorp Vault by HTTP

Overview

The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Vault by HTTP — collects metrics by HTTP agent from /sys/metrics API endpoint. See https://www.vaultproject.io/api-docs/system/metrics.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Vault 1.6

Configuration

Setup

Configure Vault API. See Vault Configuration. Create a Vault service token and set it to the macro {$VAULT.TOKEN}.

Macros used

Name	Description	Default
{$VAULT.API.PORT}	Vault port.	`8200`
{$VAULT.API.SCHEME}	Vault API scheme.	`http`
{$VAULT.HOST}	Vault host name.	`<PUT YOUR VAULT HOST>`
{$VAULT.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors for trigger expression.	`90`
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN}	Maximum number of Vault leadership setup failed.	`5`
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN}	Maximum number of Vault leadership losses.	`5`
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN}	Maximum number of Vault leadership step downs.	`5`
{$VAULT.LLD.FILTER.STORAGE.MATCHES}	Filter of discoverable storage backends.	`.+`
{$VAULT.TOKEN}	Vault auth token.	`<PUT YOUR AUTH TOKEN>`
{$VAULT.TOKEN.ACCESSORS}	Vault accessors separated by spaces for monitoring token expiration time.
{$VAULT.TOKEN.TTL.MIN.CRIT}	Token TTL critical threshold.	`3d`
{$VAULT.TOKEN.TTL.MIN.WARN}	Token TTL warning threshold.	`7d`

Items

Name	Description	Type	Key and additional info
Get health		HTTP agent	vault.get_health Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"healthcheck": 0}`
Get leader		HTTP agent	vault.get_leader Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics		HTTP agent	vault.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Clear metrics		Dependent item	vault.clear_metrics Preprocessing Check for error in JSON: `$.errors` ⛔️Custom on fail: Discard value
Get tokens	Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}".	Script	vault.get_tokens
Check WAL discovery		Dependent item	vault.checkwaldiscovery Preprocessing Prometheus to JSON: `{__name__=~"^vault_wal_(?:.+)$"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `15m`
Check replication discovery		Dependent item	vault.checkreplicationdiscovery Preprocessing Prometheus to JSON: `{__name__=~"^replication_(?:.+)$"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `15m`
Check storage discovery		Dependent item	vault.checkstoragediscovery Preprocessing Prometheus to JSON: `{name=~"^vault(?:.+)(?:get	put	list	delete)_count$"}`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>JavaScript:`The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat:`15m`
Check mountpoint discovery		Dependent item	vault.checkmountpointdiscovery Preprocessing Prometheus to JSON: `{__name__=~"^vault_rollback_attempt_(?:.+?)_count$"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `15m`
Initialized	Initialization status.	Dependent item	vault.health.initialized Preprocessing JSON Path: `$.initialized` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Sealed	Seal status.	Dependent item	vault.health.sealed Preprocessing JSON Path: `$.sealed` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Standby	Standby status.	Dependent item	vault.health.standby Preprocessing JSON Path: `$.standby` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Performance standby	Performance standby status.	Dependent item	vault.health.performance_standby Preprocessing JSON Path: `$.performance_standby` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Performance replication	Performance replication mode https://www.vaultproject.io/docs/enterprise/replication	Dependent item	vault.health.replicationperformancemode Preprocessing JSON Path: `$.replication_performance_mode` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Disaster Recovery replication	Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication	Dependent item	vault.health.replicationdrmode Preprocessing JSON Path: `$.replication_dr_mode` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Version	Server version.	Dependent item	vault.health.version Preprocessing JSON Path: `$.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Healthcheck	Vault healthcheck.	Dependent item	vault.health.check Preprocessing JSON Path: `$.healthcheck` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1h`
HA enabled	HA enabled status.	Dependent item	vault.leader.ha_enabled Preprocessing JSON Path: `$.ha_enabled` Boolean to decimal Discard unchanged with heartbeat: `1h`
Is leader	Leader status.	Dependent item	vault.leader.is_self Preprocessing JSON Path: `$.is_self` Boolean to decimal Discard unchanged with heartbeat: `1h`
Get metrics error	Get metrics error.	Dependent item	vault.get_metrics.error Preprocessing JSON Path: `$.errors[0]` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	vault.metrics.process.cpu.seconds.total Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value
Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	vault.metrics.process.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Open file descriptors, current	Number of open file descriptors.	Dependent item	vault.metrics.process.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Process resident memory	Resident memory size in bytes.	Dependent item	vault.metrics.process.resident_memory.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
Uptime	Server uptime.	Dependent item	vault.metrics.process.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Process virtual memory, current	Virtual memory size in bytes.	Dependent item	vault.metrics.process.virtual_memory.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Process virtual memory, max	Maximum amount of virtual memory available in bytes.	Dependent item	vault.metrics.process.virtual_memory.max.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_max_bytes)` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Audit log requests, rate	Number of all audit log requests across all audit log devices.	Dependent item	vault.metrics.audit.log.request.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_request_count)` ⛔️Custom on fail: Discard value Change per second
Audit log request failures, rate	Number of audit log request failures.	Dependent item	vault.metrics.audit.log.request.failure.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_request_failure)` ⛔️Custom on fail: Discard value Change per second
Audit log response, rate	Number of audit log responses across all audit log devices.	Dependent item	vault.metrics.audit.log.response.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_response_count)` ⛔️Custom on fail: Discard value Change per second
Audit log response failures, rate	Number of audit log response failures.	Dependent item	vault.metrics.audit.log.response.failure.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_response_failure)` ⛔️Custom on fail: Discard value Change per second
Barrier DELETE ops, rate	Number of DELETE operations at the barrier.	Dependent item	vault.metrics.barrier.delete.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_delete_count)` ⛔️Custom on fail: Discard value Change per second
Barrier GET ops, rate	Number of GET operations at the barrier.	Dependent item	vault.metrics.vault.barrier.get.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_get_count)` ⛔️Custom on fail: Discard value Change per second
Barrier LIST ops, rate	Number of LIST operations at the barrier.	Dependent item	vault.metrics.barrier.list.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_list_count)` ⛔️Custom on fail: Discard value Change per second
Barrier PUT ops, rate	Number of PUT operations at the barrier.	Dependent item	vault.metrics.barrier.put.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_put_count)` ⛔️Custom on fail: Discard value Change per second
Cache hit, rate	Number of times a value was retrieved from the LRU cache.	Dependent item	vault.metrics.cache.hit.rate Preprocessing Prometheus pattern: `VALUE(vault_cache_hit)` ⛔️Custom on fail: Discard value Change per second
Cache miss, rate	Number of times a value was not in the LRU cache. The results in a read from the configured storage.	Dependent item	vault.metrics.cache.miss.rate Preprocessing Prometheus pattern: `VALUE(vault_cache_miss)` ⛔️Custom on fail: Discard value Change per second
Cache write, rate	Number of times a value was written to the LRU cache.	Dependent item	vault.metrics.cache.write.rate Preprocessing Prometheus pattern: `VALUE(vault_cache_write)` ⛔️Custom on fail: Discard value Change per second
Check token, rate	Number of token checks handled by Vault core.	Dependent item	vault.metrics.core.check.token.rate Preprocessing Prometheus pattern: `VALUE(vault_core_check_token_count)` ⛔️Custom on fail: Discard value Change per second
Fetch ACL and token, rate	Number of ACL and corresponding token entry fetches handled by Vault core.	Dependent item	vault.metrics.core.fetch.aclandtoken Preprocessing Prometheus pattern: `VALUE(vault_core_fetch_acl_and_token_count)` ⛔️Custom on fail: Discard value Change per second
Requests, rate	Number of requests handled by Vault core.	Dependent item	vault.metrics.core.handle.request Preprocessing Prometheus pattern: `VALUE(vault_core_handle_request_count)` ⛔️Custom on fail: Discard value Change per second
Leadership setup failed, counter	Cluster leadership setup failures which have occurred in a highly available Vault cluster.	Dependent item	vault.metrics.core.leadership.setup_failed Preprocessing Prometheus to JSON: `vault_core_leadership_setup_failed` JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0`
Leadership setup lost, counter	Cluster leadership losses which have occurred in a highly available Vault cluster.	Dependent item	vault.metrics.core.leadership_lost Preprocessing Prometheus to JSON: `vault_core_leadership_lost_count` JSON Path: `$[?(@.name=="vault_core_leadership_lost_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Post-unseal ops, counter	Duration of time taken by post-unseal operations handled by Vault core.	Dependent item	vault.metrics.core.post_unseal Preprocessing Prometheus pattern: `VALUE(vault_core_post_unseal_count)` ⛔️Custom on fail: Discard value
Pre-seal ops, counter	Duration of time taken by pre-seal operations.	Dependent item	vault.metrics.core.pre_seal Preprocessing Prometheus pattern: `VALUE(vault_core_pre_seal_count)` ⛔️Custom on fail: Discard value
Requested seal ops, counter	Duration of time taken by requested seal operations.	Dependent item	vault.metrics.core.sealwithrequest Preprocessing Prometheus pattern: `VALUE(vault_core_seal_with_request_count)` ⛔️Custom on fail: Discard value
Seal ops, counter	Duration of time taken by seal operations.	Dependent item	vault.metrics.core.seal Preprocessing Prometheus pattern: `VALUE(vault_core_seal_count)` ⛔️Custom on fail: Discard value
Internal seal ops, counter	Duration of time taken by internal seal operations.	Dependent item	vault.metrics.core.seal_internal Preprocessing Prometheus pattern: `VALUE(vault_core_seal_internal_count)` ⛔️Custom on fail: Discard value
Leadership step downs, counter	Cluster leadership step down.	Dependent item	vault.metrics.core.step_down Preprocessing Prometheus to JSON: `vault_core_step_down_count` JSON Path: `$[?(@.name=="vault_core_step_down_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Unseal ops, counter	Duration of time taken by unseal operations.	Dependent item	vault.metrics.core.unseal Preprocessing Prometheus pattern: `VALUE(vault_core_unseal_count)` ⛔️Custom on fail: Discard value
Fetch lease times, counter	Time taken to fetch lease times.	Dependent item	vault.metrics.expire.fetch.lease.times Preprocessing Prometheus pattern: `VALUE(vault_expire_fetch_lease_times_count)` ⛔️Custom on fail: Discard value
Fetch lease times by token, counter	Time taken to fetch lease times by token.	Dependent item	vault.metrics.expire.fetch.lease.times.by_token Preprocessing Prometheus pattern: `VALUE(vault_expire_fetch_lease_times_by_token_count)` ⛔️Custom on fail: Discard value
Number of expiring leases	Number of all leases which are eligible for eventual expiry.	Dependent item	vault.metrics.expire.num_leases Preprocessing Prometheus pattern: `VALUE(vault_expire_num_leases)` ⛔️Custom on fail: Discard value
Expire revoke, count	Time taken to revoke a token.	Dependent item	vault.metrics.expire.revoke Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_count)` ⛔️Custom on fail: Discard value
Expire revoke force, count	Time taken to forcibly revoke a token.	Dependent item	vault.metrics.expire.revoke.force Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_force_count)` ⛔️Custom on fail: Discard value
Expire revoke prefix, count	Tokens revoke on a prefix.	Dependent item	vault.metrics.expire.revoke.prefix Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_prefix_count)` ⛔️Custom on fail: Discard value
Revoke secrets by token, count	Time taken to revoke all secrets issued with a given token.	Dependent item	vault.metrics.expire.revoke.by_token Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_by_token_count)` ⛔️Custom on fail: Discard value
Expire renew, count	Time taken to renew a lease.	Dependent item	vault.metrics.expire.renew Preprocessing Prometheus pattern: `VALUE(vault_expire_renew_count)` ⛔️Custom on fail: Discard value
Renew token, count	Time taken to renew a token which does not need to invoke a logical backend.	Dependent item	vault.metrics.expire.renew_token Preprocessing Prometheus pattern: `VALUE(vault_expire_renew_token_count)` ⛔️Custom on fail: Discard value
Register ops, count	Time taken for register operations.	Dependent item	vault.metrics.expire.register Preprocessing Prometheus pattern: `VALUE(vault_expire_register_count)` ⛔️Custom on fail: Discard value
Register auth ops, count	Time taken for register authentication operations which create lease entries without lease ID.	Dependent item	vault.metrics.expire.register.auth Preprocessing Prometheus pattern: `VALUE(vault_expire_register_auth_count)` ⛔️Custom on fail: Discard value
Policy GET ops, rate	Number of operations to get a policy.	Dependent item	vault.metrics.policy.get_policy.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_get_policy_count)` ⛔️Custom on fail: Discard value Change per second
Policy LIST ops, rate	Number of operations to list policies.	Dependent item	vault.metrics.policy.list_policies.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_list_policies_count)` ⛔️Custom on fail: Discard value Change per second
Policy DELETE ops, rate	Number of operations to delete a policy.	Dependent item	vault.metrics.policy.delete_policy.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_delete_policy_count)` ⛔️Custom on fail: Discard value Change per second
Policy SET ops, rate	Number of operations to set a policy.	Dependent item	vault.metrics.policy.set_policy.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_set_policy_count)` ⛔️Custom on fail: Discard value Change per second
Token create, count	The time taken to create a token.	Dependent item	vault.metrics.token.create Preprocessing Prometheus pattern: `VALUE(vault_token_create_count)` ⛔️Custom on fail: Discard value
Token createAccessor, count	The time taken to create a token accessor.	Dependent item	vault.metrics.token.createAccessor Preprocessing Prometheus pattern: `VALUE(vault_token_createAccessor_count)` ⛔️Custom on fail: Discard value
Token lookup, rate	Number of token look up.	Dependent item	vault.metrics.token.lookup.rate Preprocessing Prometheus pattern: `VALUE(vault_token_lookup_count)` ⛔️Custom on fail: Discard value Change per second
Token revoke, count	The time taken to look up a token.	Dependent item	vault.metrics.token.revoke Preprocessing Prometheus pattern: `VALUE(vault_token_revoke_count)` ⛔️Custom on fail: Discard value
Token revoke tree, count	Time taken to revoke a token tree.	Dependent item	vault.metrics.token.revoke.tree Preprocessing Prometheus pattern: `VALUE(vault_token_revoke_tree_count)` ⛔️Custom on fail: Discard value
Token store, count	Time taken to store an updated token entry without writing to the secondary index.	Dependent item	vault.metrics.token.store Preprocessing Prometheus pattern: `VALUE(vault_token_store_count)` ⛔️Custom on fail: Discard value
Runtime allocated bytes	Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value.	Dependent item	vault.metrics.runtime.alloc.bytes Preprocessing Prometheus pattern: `VALUE(vault_runtime_alloc_bytes)` ⛔️Custom on fail: Discard value
Runtime freed objects	Number of freed objects.	Dependent item	vault.metrics.runtime.free.count Preprocessing Prometheus pattern: `VALUE(vault_runtime_free_count)` ⛔️Custom on fail: Discard value
Runtime heap objects	Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting.	Dependent item	vault.metrics.runtime.heap.objects Preprocessing Prometheus pattern: `VALUE(vault_runtime_heap_objects)` ⛔️Custom on fail: Discard value
Runtime malloc count	Cumulative count of allocated heap objects.	Dependent item	vault.metrics.runtime.malloc.count Preprocessing Prometheus pattern: `VALUE(vault_runtime_malloc_count)` ⛔️Custom on fail: Discard value
Runtime num goroutines	Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting.	Dependent item	vault.metrics.runtime.num_goroutines Preprocessing Prometheus pattern: `VALUE(vault_runtime_num_goroutines)` ⛔️Custom on fail: Discard value
Runtime sys bytes	Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system.	Dependent item	vault.metrics.runtime.sys.bytes Preprocessing Prometheus pattern: `VALUE(vault_runtime_sys_bytes)` ⛔️Custom on fail: Discard value
Runtime GC pause, total	The total garbage collector pause time since Vault was last started.	Dependent item	vault.metrics.total.gc.pause Preprocessing Prometheus pattern: `VALUE(vault_runtime_total_gc_pause_ns)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Runtime GC runs, total	Total number of garbage collection runs since Vault was last started.	Dependent item	vault.metrics.runtime.total.gc.runs Preprocessing Prometheus pattern: `VALUE(vault_runtime_total_gc_runs)` ⛔️Custom on fail: Discard value
Token count, total	Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes.	Dependent item	vault.metrics.token Preprocessing Prometheus to JSON: `vault_token_count` JSON Path: `$[?(@.name=="vault_token_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Token count by auth, total	Total number of service tokens that were created by an auth method.	Dependent item	vault.metrics.token.by_auth Preprocessing Prometheus to JSON: `vault_token_count_by_auth` JSON Path: `$[?(@.name=="vault_token_count_by_auth")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Token count by policy, total	Total number of service tokens that have a policy attached.	Dependent item	vault.metrics.token.by_policy Preprocessing Prometheus to JSON: `vault_token_count_by_policy` JSON Path: `$[?(@.name=="vault_token_count_by_policy")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Token count by ttl, total	Number of service tokens, grouped by the TTL range they were assigned at creation.	Dependent item	vault.metrics.token.by_ttl Preprocessing Prometheus to JSON: `vault_token_count_by_ttl` JSON Path: `$[?(@.name=="vault_token_count_by_ttl")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Token creation, rate	Number of service or batch tokens created.	Dependent item	vault.metrics.token.creation.rate Preprocessing Prometheus to JSON: `vault_token_creation` JSON Path: `$[?(@.name=="vault_token_creation")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Secret kv entries	Number of entries in each key-value secret engine.	Dependent item	vault.metrics.secret.kv.count Preprocessing Prometheus to JSON: `vault_secret_kv_count` JSON Path: `$[?(@.name=="vault_secret_kv_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Token secret lease creation, rate	Counts the number of leases created by secret engines.	Dependent item	vault.metrics.secret.lease.creation.rate Preprocessing Prometheus to JSON: `vault_secret_lease_creation` JSON Path: `$[?(@.name=="vault_secret_lease_creation")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second

Triggers

Name	Description	Expression	Severity
HashiCorp Vault: Vault server is sealed	https://www.vaultproject.io/docs/concepts/seal	`last(/HashiCorp Vault by HTTP/vault.health.sealed)=1`\|Average
HashiCorp Vault: Version has changed	Vault version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0`\|Info	Manual close: Yes
HashiCorp Vault: Vault server is not responding		`last(/HashiCorp Vault by HTTP/vault.health.check)=0`\|High
HashiCorp Vault: Failed to get metrics		`length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0`\|Warning	Depends on: HashiCorp Vault: Vault server is sealed
HashiCorp Vault: Current number of open files is too high		`min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN}`\|Warning
HashiCorp Vault: Service has been restarted	Uptime is less than 10 minutes.	`last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m`\|Info	Manual close: Yes
HashiCorp Vault: High frequency of leadership setup failures	There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h.	`(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN}`\|Average
HashiCorp Vault: High frequency of leadership losses	There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h.	`(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN}`\|Average
HashiCorp Vault: High frequency of leadership step downs	There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h.	`(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN}`\|Average

LLD rule Storage metrics discovery

Name	Description	Type	Key and additional info
Storage metrics discovery	Storage backend metrics discovery.	Dependent item	vault.storage.discovery

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info

Storage [{#STORAGE}] {#OPERATION} ops, rate

Number of a {#OPERATION} operation against the {#STORAGE} storage backend.

Dependent item

vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}]

Preprocessing

Prometheus pattern: VALUE({#PATTERN_C})
⛔️Custom on fail: Discard value
Change per second

LLD rule Mountpoint metrics discovery

Name	Description	Type	Key and additional info
Mountpoint metrics discovery	Mountpoint metrics discovery.	Dependent item	vault.mountpoint.discovery

Item prototypes for Mountpoint metrics discovery

Name Description Type Key and additional info

Rollback attempt [{#MOUNTPOINT}] ops, rate

Number of operations to perform a rollback operation on the given mount point.

Dependent item

vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}]

Preprocessing

Prometheus pattern: VALUE({#PATTERN_C})
⛔️Custom on fail: Discard value
Change per second

Route rollback [{#MOUNTPOINT}] ops, rate

Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors.

Dependent item

vault.metrics.route.rollback.rate[{#MOUNTPOINT}]

Preprocessing

Prometheus pattern: VALUE({#PATTERN_C})
⛔️Custom on fail: Discard value
Change per second

LLD rule WAL metrics discovery

Name	Description	Type	Key and additional info
WAL metrics discovery	Discovery for WAL metrics.	Dependent item	vault.wal.discovery

Item prototypes for WAL metrics discovery

Name	Description	Type	Key and additional info
Delete WALs, count{#SINGLETON}	Time taken to delete a Write Ahead Log (WAL).	Dependent item	vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_deletewals_count)` ⛔️Custom on fail: Discard value
GC deleted WAL{#SINGLETON}	Number of Write Ahead Logs (WAL) deleted during each garbage collection run.	Dependent item	vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_gc_deleted)` ⛔️Custom on fail: Discard value
WALs on disk, total{#SINGLETON}	Total Number of Write Ahead Logs (WAL) on disk.	Dependent item	vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_gc_total)` ⛔️Custom on fail: Discard value
Load WALs, count{#SINGLETON}	Time taken to load a Write Ahead Log (WAL).	Dependent item	vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_loadWAL_count)` ⛔️Custom on fail: Discard value
Persist WALs, count{#SINGLETON}	Time taken to persist a Write Ahead Log (WAL).	Dependent item	vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_persistwals_count)` ⛔️Custom on fail: Discard value
Flush ready WAL, count{#SINGLETON}	Time taken to flush a ready Write Ahead Log (WAL) to storage.	Dependent item	vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_flushready_count)` ⛔️Custom on fail: Discard value

LLD rule Replication metrics discovery

Name	Description	Type	Key and additional info
Replication metrics discovery	Discovery for replication metrics.	Dependent item	vault.replication.discovery

Item prototypes for Replication metrics discovery

Name	Description	Type	Key and additional info
Stream WAL missing guard, count{#SINGLETON}	Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found.	Dependent item	vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(logshipper_streamWALs_missing_guard)` ⛔️Custom on fail: Discard value
Stream WAL guard found, count{#SINGLETON}	Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found.	Dependent item	vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(logshipper_streamWALs_guard_found)` ⛔️Custom on fail: Discard value
Merkle commit index{#SINGLETON}	The last committed index in the Merkle Tree.	Dependent item	vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_merkle_commit_index)` ⛔️Custom on fail: Discard value
Last WAL{#SINGLETON}	The index of the last WAL.	Dependent item	vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_wal_last_wal)` ⛔️Custom on fail: Discard value
Last DR WAL{#SINGLETON}	The index of the last DR WAL.	Dependent item	vault.metrics.replication.wal.lastdrwal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_wal_last_dr_wal)` ⛔️Custom on fail: Discard value
Last performance WAL{#SINGLETON}	The index of the last Performance WAL.	Dependent item	vault.metrics.replication.wal.lastperformancewal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_wal_last_performance_wal)` ⛔️Custom on fail: Discard value
Last remote WAL{#SINGLETON}	The index of the last remote WAL.	Dependent item	vault.metrics.replication.fsm.lastremotewal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_fsm_last_remote_wal)` ⛔️Custom on fail: Discard value

LLD rule Token metrics discovery

Name	Description	Type	Key and additional info
Token metrics discovery	Tokens metrics discovery.	Dependent item	vault.tokens.discovery

Item prototypes for Token metrics discovery

Name Description Type Key and additional info

Token [{#TOKEN_NAME}] error

Token lookup error text.

Dependent item

vault.tokenviaaccessor.error["{#ACCESSOR}"]

Preprocessing

JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].error.first()
Discard unchanged with heartbeat: 1h

Token [{#TOKEN_NAME}] has TTL

The Token has TTL.

Dependent item

vault.tokenviaaccessor.has_ttl["{#ACCESSOR}"]

Preprocessing

JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].has_ttl.first()
Boolean to decimal
Discard unchanged with heartbeat: 1h

Token [{#TOKEN_NAME}] TTL

The TTL period of the token.

Dependent item

vault.tokenviaaccessor.ttl["{#ACCESSOR}"]

Preprocessing

JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].ttl.first()

Trigger prototypes for Token metrics discovery

Name	Expression	Severity
HashiCorp Vault: Token [{#TOKEN_NAME}] lookup error occurred	`length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0`\|Warning	Depends on: HashiCorp Vault: Vault server is sealed
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon	`last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT}`\|Average
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon	`last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN}`\|Warning	Depends on: HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_truenas_core_snmp

View README Download JSON

TrueNAS CORE by SNMP

Overview

Template for monitoring TrueNAS CORE by SNMP.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

TrueNAS CORE 12.0-U8
TrueNAS CORE 13.0-U5.3

Configuration

Setup

Import the template into Zabbix.
Enable SNMP daemon at Services in TrueNAS CORE web interface: https://www.truenas.com/docs/core/uireference/services/snmpscreen/
Link the template to the host.

Macros used

Name	Description	Default
{$CPU.UTIL.CRIT}	Threshold of CPU utilization for warning trigger in %.	`90`
{$ICMPLOSSWARN}	Threshold of ICMP packets loss for warning trigger in %.	`20`
{$ICMPRESPONSETIME_WARN}	Threshold of average ICMP response time for warning trigger in seconds.	`0.15`
{$IF.ERRORS.WARN}	Threshold of error packets rate for warning trigger. Can be used with interface name as context.	`2`
{$IF.UTIL.MAX}	Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context.	`90`
{$IFCONTROL}	Macro for operational state of the interface for link down trigger. Can be used with interface name as context.	`1`
{$LOADAVGPER_CPU.MAX.WARN}	Load per CPU considered sustainable. Tune if needed.	`1.5`
{$MEMORY.AVAILABLE.MIN}	Threshold of available memory for trigger in bytes.	`20M`
{$MEMORY.UTIL.MAX}	Threshold of memory utilization for trigger in %	`90`
{$NET.IF.IFADMINSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*`
{$NET.IF.IFADMINSTATUS.NOT_MATCHES}	Ignore down(2) administrative status	`^2$`
{$NET.IF.IFALIAS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFALIAS.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFDESCR.MATCHES}	This macro used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFDESCR.NOT_MATCHES}	This macro used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFNAME.NOT_MATCHES}	This macro used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFOPERSTATUS.MATCHES}	This macro used in filters of network interfaces discovery rule.	`^.*$`
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}	Ignore notPresent(6)	`^6$`
{$NET.IF.IFTYPE.MATCHES}	This macro used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFTYPE.NOT_MATCHES}	This macro used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$SNMP.TIMEOUT}	The time interval for SNMP availability trigger.	`5m`
{$SWAP.PFREE.MIN.WARN}	Threshold of free swap space for warning trigger in %.	`50`
{$VFS.DEV.DEVNAME.MATCHES}	This macro is used in block devices discovery. Can be overridden on the host or linked template level	`.+`
{$VFS.DEV.DEVNAME.NOT_MATCHES}	This macro is used in block devices discovery. Can be overridden on the host or linked template level	`Macro too long. Please see the template.`
{$DATASET.NAME.MATCHES}	This macro is used in datasets discovery. Can be overridden on the host or linked template level	`.+`
{$DATASET.NAME.NOT_MATCHES}	This macro is used in datasets discovery. Can be overridden on the host or linked template level	`^(boot\|.+\.system(.+)?$)`
{$ZPOOL.PUSED.MAX.WARN}	Threshold of used pool space for warning trigger in %.	`80`
{$ZPOOL.FREE.MIN.WARN}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$ZPOOL.PUSED.MAX.CRIT}	Threshold of used pool space for average severity trigger in %.	`90`
{$ZPOOL.FREE.MIN.CRIT}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$DATASET.PUSED.MAX.WARN}	Threshold of used dataset space for warning trigger in %.	`80`
{$DATASET.FREE.MIN.WARN}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$DATASET.PUSED.MAX.CRIT}	Threshold of used dataset space for average severity trigger in %.	`90`
{$DATASET.FREE.MIN.CRIT}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$TEMPERATURE.MAX.WARN}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`50`
{$TEMPERATURE.MAX.CRIT}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`65`

Items

Name	Description	Type	Key and additional info
ICMP ping	Host accessibility by ICMP. 0 - ICMP ping fails. 1 - ICMP ping successful.	Simple check	icmpping
ICMP loss	Percentage of lost packets.	Simple check	icmppingloss
ICMP response time	ICMP ping response time (in seconds).	Simple check	icmppingsec
System contact details	MIB: SNMPv2-MIB The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string.	SNMP agent	system.contact Preprocessing Discard unchanged with heartbeat: `6h`
System description	MIB: SNMPv2-MIB System description of the host.	SNMP agent	system.descr Preprocessing Discard unchanged with heartbeat: `6h`
System location	MIB: SNMPv2-MIB The physical location of this node. If the location is unknown, the value is the zero-length string.	SNMP agent	system.location Preprocessing Discard unchanged with heartbeat: `6h`
System name	MIB: SNMPv2-MIB The host name of the system.	SNMP agent	system.name Preprocessing Discard unchanged with heartbeat: `6h`
System object ID	MIB: SNMPv2-MIB The vendor authoritative identification of the network management subsystem contained in the entity. This value is allocated within the SMI enterprises subtree (1.3.6.1.4.1) and provides an easy and unambiguous means for determining what kind of box is being managed.	SNMP agent	system.objectid Preprocessing Discard unchanged with heartbeat: `6h`
Uptime	MIB: HOST-RESOURCES-MIB The amount of time since this host was last initialized. Note that this is different from sysUpTime in the SNMPv2-MIB [RFC1907] because sysUpTime is the uptime of the network management portion of the system.	SNMP agent	system.uptime Preprocessing Custom multiplier: `0.01`
SNMP traps (fallback)	The item is used to collect all SNMP traps unmatched by other snmptrap items.	SNMP trap	snmptrap.fallback
SNMP agent availability	Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - not available 1 - available 2 - unknown	Zabbix internal	zabbix[host,snmp,available]
Interrupts per second	MIB: UCD-SNMP-MIB Number of interrupts processed.	SNMP agent	system.cpu.intr Preprocessing Change per second
Context switches per second	MIB: UCD-SNMP-MIB Number of context switches.	SNMP agent	system.cpu.switches Preprocessing Change per second
Load average (1m avg)	MIB: UCD-SNMP-MIB The 1 minute load averages.	SNMP agent	system.cpu.load.avg1
Load average (5m avg)	MIB: UCD-SNMP-MIB The 5 minutes load averages.	SNMP agent	system.cpu.load.avg5
Load average (15m avg)	MIB: UCD-SNMP-MIB The 15 minutes load averages.	SNMP agent	system.cpu.load.avg15
Number of CPUs	MIB: HOST-RESOURCES-MIB Count the number of CPU cores by counting number of cores discovered in hrProcessorTable using LLD.	SNMP agent	system.cpu.num Preprocessing JavaScript: `The text is too long. Please see the template.`
Free memory	MIB: UCD-SNMP-MIB The amount of real/physical memory currently unused or available.	SNMP agent	vm.memory.free Preprocessing Custom multiplier: `1024`
Memory (buffers)	MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as memory buffers.	SNMP agent	vm.memory.buffers Preprocessing Custom multiplier: `1024`
Memory (cached)	MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as cached memory.	SNMP agent	vm.memory.cached Preprocessing Custom multiplier: `1024`
Total memory	MIB: UCD-SNMP-MIB The total memory expressed in bytes.	SNMP agent	vm.memory.total Preprocessing Custom multiplier: `1024`
Available memory	Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP.	Calculated	vm.memory.available
Memory utilization	Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP.	Calculated	vm.memory.util
Total swap space	MIB: UCD-SNMP-MIB The total amount of swap space configured for this host.	SNMP agent	system.swap.total Preprocessing Custom multiplier: `1024`
Free swap space	MIB: UCD-SNMP-MIB The amount of swap space currently unused or available.	SNMP agent	system.swap.free Preprocessing Custom multiplier: `1024`
Free swap space in %	The free space of the swap volume/file expressed in %.	Calculated	system.swap.pfree Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `100`
ARC size	MIB: FREENAS-MIB ARC size in bytes.	SNMP agent	truenas.zfs.arc.size Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
ARC metadata size	MIB: FREENAS-MIB ARC metadata size used in bytes.	SNMP agent	truenas.zfs.arc.meta Preprocessing Custom multiplier: `1024`
ARC data size	MIB: FREENAS-MIB ARC data size used in bytes.	SNMP agent	truenas.zfs.arc.data Preprocessing Custom multiplier: `1024`
ARC hits	MIB: FREENAS-MIB Total amount of cache hits in the ARC per second.	SNMP agent	truenas.zfs.arc.hits Preprocessing Change per second
ARC misses	MIB: FREENAS-MIB Total amount of cache misses in the ARC per second.	SNMP agent	truenas.zfs.arc.misses Preprocessing Change per second
ARC target size of cache	MIB: FREENAS-MIB ARC target size of cache in bytes.	SNMP agent	truenas.zfs.arc.c Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
ARC target size of MRU	MIB: FREENAS-MIB ARC target size of MRU in bytes.	SNMP agent	truenas.zfs.arc.p Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
ARC cache hit ratio	MIB: FREENAS-MIB ARC cache hit ration percentage.	SNMP agent	truenas.zfs.arc.hit.ratio
ARC cache miss ratio	MIB: FREENAS-MIB ARC cache miss ration percentage.	SNMP agent	truenas.zfs.arc.miss.ratio
L2ARC hits	MIB: FREENAS-MIB Hits to the L2 cache per second.	SNMP agent	truenas.zfs.l2arc.hits Preprocessing Change per second
L2ARC misses	MIB: FREENAS-MIB Misses to the L2 cache per second.	SNMP agent	truenas.zfs.l2arc.misses Preprocessing Change per second
L2ARC read rate	MIB: FREENAS-MIB Read rate from L2 cache in bytes per second.	SNMP agent	truenas.zfs.l2arc.read Preprocessing Change per second
L2ARC write rate	MIB: FREENAS-MIB Write rate from L2 cache in bytes per second.	SNMP agent	truenas.zfs.l2arc.write Preprocessing Change per second
L2ARC size	MIB: FREENAS-MIB L2ARC size in bytes.	SNMP agent	truenas.zfs.l2arc.size Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
ZIL operations 1 second	MIB: FREENAS-MIB The ops column parsed from the command zilstat 1 1.	SNMP agent	truenas.zfs.zil.ops1
ZIL operations 5 seconds	MIB: FREENAS-MIB The ops column parsed from the command zilstat 5 1.	SNMP agent	truenas.zfs.zil.ops5
ZIL operations 10 seconds	MIB: FREENAS-MIB The ops column parsed from the command zilstat 10 1.	SNMP agent	truenas.zfs.zil.ops10

Triggers

Name	Description	Expression	Severity
TrueNAS CORE: Unavailable by ICMP ping	Last three attempts returned timeout. Please check device connectivity.	`max(/TrueNAS CORE by SNMP/icmpping,#3)=0`\|High
TrueNAS CORE: High ICMP ping loss	ICMP packets loss detected.	`min(/TrueNAS CORE by SNMP/icmppingloss,5m)>{$ICMP_LOSS_WARN} and min(/TrueNAS CORE by SNMP/icmppingloss,5m)<100`\|Warning	Depends on: TrueNAS CORE: Unavailable by ICMP ping
TrueNAS CORE: High ICMP ping response time	Average ICMP response time is too big.	`avg(/TrueNAS CORE by SNMP/icmppingsec,5m)>{$ICMP_RESPONSE_TIME_WARN}`\|Warning	Depends on: TrueNAS CORE: Unavailable by ICMP ping
TrueNAS CORE: System name has changed	The name of the system has changed. Acknowledge to close the problem manually.	`last(/TrueNAS CORE by SNMP/system.name,#1)<>last(/TrueNAS CORE by SNMP/system.name,#2) and length(last(/TrueNAS CORE by SNMP/system.name))>0`\|Info	Manual close: Yes
TrueNAS CORE: Host has been restarted	Uptime is less than 10 minutes.	`last(/TrueNAS CORE by SNMP/system.uptime)<10m`\|Info	Manual close: Yes
TrueNAS CORE: No SNMP data collection	SNMP is not available for polling. Please check device connectivity and SNMP settings.	`max(/TrueNAS CORE by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0`\|Warning	Depends on: TrueNAS CORE: Unavailable by ICMP ping
TrueNAS CORE: Load average is too high	The load average per CPU is too high. The system may be slow to respond.	`min(/TrueNAS CORE by SNMP/system.cpu.load.avg1,5m)/last(/TrueNAS CORE by SNMP/system.cpu.num)>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/TrueNAS CORE by SNMP/system.cpu.load.avg5)>0 and last(/TrueNAS CORE by SNMP/system.cpu.load.avg15)>0`\|Average
TrueNAS CORE: Lack of available memory	The system is running out of memory.	`min(/TrueNAS CORE by SNMP/vm.memory.available,5m)<{$MEMORY.AVAILABLE.MIN} and last(/TrueNAS CORE by SNMP/vm.memory.total)>0`\|Average
TrueNAS CORE: High memory utilization	The system is running out of free memory.	`min(/TrueNAS CORE by SNMP/vm.memory.util,5m)>{$MEMORY.UTIL.MAX}`\|Average	Depends on: TrueNAS CORE: Lack of available memory
TrueNAS CORE: High swap space usage	If there is no swap configured, this trigger is ignored.	`min(/TrueNAS CORE by SNMP/system.swap.pfree,5m)<{$SWAP.PFREE.MIN.WARN} and last(/TrueNAS CORE by SNMP/system.swap.total)>0`\|Warning	Depends on: TrueNAS CORE: Lack of available memory TrueNAS CORE: High memory utilization

LLD rule CPU discovery

Name Description Type Key and additional info

CPU discovery

This discovery will create set of per core CPU metrics from UCD-SNMP-MIB, using {#CPU.COUNT} in preprocessing. That's the only reason why LLD is used.

Dependent item

cpu.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for CPU discovery

Name	Description	Type	Key and additional info
CPU idle time	MIB: UCD-SNMP-MIB The time the CPU has spent doing nothing.	SNMP agent	system.cpu.idle[{#SNMPINDEX}]
CPU system time	MIB: UCD-SNMP-MIB The time the CPU has spent running the kernel and its processes.	SNMP agent	system.cpu.system[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
CPU user time	MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that are not niced.	SNMP agent	system.cpu.user[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
CPU nice time	MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that have been niced.	SNMP agent	system.cpu.nice[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
CPU iowait time	MIB: UCD-SNMP-MIB The amount of time the CPU has been waiting for I/O to complete.	SNMP agent	system.cpu.iowait[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
CPU interrupt time	MIB: UCD-SNMP-MIB The amount of time the CPU has been servicing hardware interrupts.	SNMP agent	system.cpu.interrupt[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
CPU utilization	The CPU utilization expressed in %.	Dependent item	system.cpu.util[{#SNMPINDEX}] Preprocessing JavaScript: `//Calculate utilization<br>return (100 - value)`

Trigger prototypes for CPU discovery

Name	Description	Expression	Severity	Dependencies and additional info
TrueNAS CORE: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/TrueNAS CORE by SNMP/system.cpu.util[{#SNMPINDEX}],5m)>{$CPU.UTIL.CRIT}`\|Warning	Depends on: TrueNAS CORE: Load average is too high

LLD rule Block devices discovery

Name	Description	Type	Key and additional info
Block devices discovery	Block devices are discovered from UCD-DISKIO-MIB::diskIOTable (http://net-snmp.sourceforge.net/docs/mibs/ucdDiskIOMIB.html#diskIOTable).	SNMP agent	vfs.dev.discovery

Item prototypes for Block devices discovery

Name

Description

Type

Key and additional info

TrueNAS CORE: [{#DEVNAME}]: Disk read rate

MIB: UCD-DISKIO-MIB

The number of read accesses from this device since boot.

SNMP agent

vfs.dev.read.rate[{#SNMPINDEX}]

Preprocessing

Change per second

TrueNAS CORE: [{#DEVNAME}]: Disk write rate

MIB: UCD-DISKIO-MIB

The number of write accesses from this device since boot.

SNMP agent

vfs.dev.write.rate[{#SNMPINDEX}]

Preprocessing

Change per second

TrueNAS CORE: [{#DEVNAME}]: Disk utilization

MIB: UCD-DISKIO-MIB

The 1 minute average load of disk (%).

SNMP agent

vfs.dev.util[{#SNMPINDEX}]

LLD rule Network interfaces discovery

Name	Description	Type	Key and additional info
Network interfaces discovery	Discovering interfaces from IF-MIB.	SNMP agent	net.if.discovery

Item prototypes for Network interfaces discovery

Name	Description	Type	Key and additional info
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded	MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.discards[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.errors[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Bits received	MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded	MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.discards[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.errors[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Bits sent	MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Speed	MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of `n' then the speed of the interface is somewhere in the range of`n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero.	SNMP agent	net.if.speed[{#SNMPINDEX}] Preprocessing Custom multiplier: `1000000` Discard unchanged with heartbeat: `1h`
Interface [{#IFNAME}({#IFALIAS})]: Operational status	MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components.	SNMP agent	net.if.status[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
Interface [{#IFNAME}({#IFALIAS})]: Interface type	MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention.	SNMP agent	net.if.type[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`

Trigger prototypes for Network interfaces discovery

Name	Description	Expression	Severity
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High input error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/TrueNAS CORE by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/TrueNAS CORE by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High output error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/TrueNAS CORE by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/TrueNAS CORE by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before	This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.	change(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])<>2)\|Info	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down	This trigger expression works as follows: 1. It can be triggered if the operations status is down. 2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.	`{$IFCONTROL:"{#IFNAME}"}=1 and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])=2)`\|Average

LLD rule ZFS pools discovery

Name Description Type Key and additional info

ZFS pools discovery

ZFS pools discovery from FREENAS-MIB.

SNMP agent

truenas.zfs.pools.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for ZFS pools discovery

Name	Description	Type	Key and additional info
Pool [{#POOLNAME}]: Total space	MIB: FREENAS-MIB The size of the storage pool in bytes.	SNMP agent	truenas.zpool.size.total[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}` Discard unchanged with heartbeat: `1h`
Pool [{#POOLNAME}]: Used space	MIB: FREENAS-MIB The used size of the storage pool in bytes.	SNMP agent	truenas.zpool.used[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}`
Pool [{#POOLNAME}]: Available space	MIB: FREENAS-MIB The available size of the storage pool in bytes.	SNMP agent	truenas.zpool.avail[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}`
Pool [{#POOLNAME}]: Usage in %	The used size of the storage pool in %.	Calculated	truenas.zpool.pused[{#POOLNAME}]
Pool [{#POOLNAME}]: Health	MIB: FREENAS-MIB The current health of the containing pool, as reported by zpool status.	SNMP agent	truenas.zpool.health[{#POOLNAME}] Preprocessing Discard unchanged with heartbeat: `1h`
Pool [{#POOLNAME}]: Read operations rate	MIB: FREENAS-MIB The number of read I/O operations sent to the pool or device, including metadata requests (averaged since system booted).	SNMP agent	truenas.zpool.read.ops[{#POOLNAME}] Preprocessing Change per second
Pool [{#POOLNAME}]: Write operations rate	MIB: FREENAS-MIB The number of write I/O operations sent to the pool or device (averaged since system booted).	SNMP agent	truenas.zpool.write.ops[{#POOLNAME}] Preprocessing Change per second
Pool [{#POOLNAME}]: Read rate	MIB: FREENAS-MIB The bandwidth of all read operations (including metadata), expressed as units per second (averaged since system booted).	SNMP agent	truenas.zpool.read.bytes[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}` Change per second
Pool [{#POOLNAME}]: Write rate	MIB: FREENAS-MIB The bandwidth of all write operations, expressed as units per second (averaged since system booted).	SNMP agent	truenas.zpool.write.bytes[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}` Change per second

Trigger prototypes for ZFS pools discovery

Name	Description	Expression	Severity
TrueNAS CORE: Pool [{#POOLNAME}]: Very high space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"}%.` 2. The second condition - the pool free space is less than `{$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"}`\|Average
TrueNAS CORE: Pool [{#POOLNAME}]: High space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"}%.` 2. The second condition - the pool free space is less than `{$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"}`\|Warning	Depends on: TrueNAS CORE: Pool [{#POOLNAME}]: Very high space usage
TrueNAS CORE: Pool [{#POOLNAME}]: Status is not online	Please check pool status.	`last(/TrueNAS CORE by SNMP/truenas.zpool.health[{#POOLNAME}]) <> 0`\|Average

LLD rule ZFS datasets discovery

Name Description Type Key and additional info

ZFS datasets discovery

ZFS datasets discovery from FREENAS-MIB.

SNMP agent

truenas.zfs.dataset.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for ZFS datasets discovery

Name	Description	Type	Key and additional info
Dataset [{#DATASET_NAME}]: Total space	MIB: FREENAS-MIB The size of the dataset in bytes.	SNMP agent	truenas.dataset.size.total[{#DATASET_NAME}] Preprocessing Custom multiplier: `{#DATASET_ALLOC_UNITS}` Discard unchanged with heartbeat: `1h`
Dataset [{#DATASET_NAME}]: Used space	MIB: FREENAS-MIB The used size of the dataset in bytes.	SNMP agent	truenas.dataset.used[{#DATASET_NAME}] Preprocessing Custom multiplier: `{#DATASET_ALLOC_UNITS}`
Dataset [{#DATASET_NAME}]: Available space	MIB: FREENAS-MIB The available size of the dataset in bytes.	SNMP agent	truenas.dataset.avail[{#DATASET_NAME}] Preprocessing Custom multiplier: `{#DATASET_ALLOC_UNITS}`
Dataset [{#DATASET_NAME}]: Usage in %	The used size of the dataset in %.	Calculated	truenas.dataset.pused[{#DATASET_NAME}]

Trigger prototypes for ZFS datasets discovery

Name	Description	Expression	Severity	Dependencies and additional info
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Very high space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"}%.` 2. The second condition - the dataset free space is less than `{$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"}`\|Average
TrueNAS CORE: Dataset [{#DATASET_NAME}]: High space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"}%.` 2. The second condition - the dataset free space is less than `{$DATASET.FREE.MIN.WARN:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.WARN:"{#POOLNAME}"}`\|Warning	Depends on: TrueNAS CORE: Dataset [{#DATASET_NAME}]: Very high space usage

LLD rule ZFS volumes discovery

Name Description Type Key and additional info

ZFS volumes discovery

ZFS volumes discovery from FREENAS-MIB.

SNMP agent

truenas.zfs.zvols.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for ZFS volumes discovery

Name Description Type Key and additional info

TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Total space

MIB: FREENAS-MIB

The size of the ZFS volume in bytes.

SNMP agent

truenas.zvol.size.total[{#ZVOL_NAME}]

Preprocessing

Custom multiplier: {#ZVOL_ALLOC_UNITS}
Discard unchanged with heartbeat: 1h

TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Used space

MIB: FREENAS-MIB

The used size of the ZFS volume in bytes.

SNMP agent

truenas.zvol.used[{#ZVOL_NAME}]

Preprocessing

Custom multiplier: {#ZVOL_ALLOC_UNITS}

TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Available space

MIB: FREENAS-MIB

The available of the ZFS volume in bytes.

SNMP agent

truenas.zvol.avail[{#ZVOL_NAME}]

Preprocessing

Custom multiplier: {#ZVOL_ALLOC_UNITS}

LLD rule Disks temperature discovery

Name Description Type Key and additional info

Disks temperature discovery

Disks temperature discovery from FREENAS-MIB.

SNMP agent

truenas.disk.temp.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for Disks temperature discovery

Name Description Type Key and additional info

Disk [{#DISK_NAME}]': Temperature

MIB: FREENAS-MIB

The temperature of this HDD in mC.

SNMP agent

truenas.disk.temp[{#DISK_NAME}]

Preprocessing

Custom multiplier: 0.001
Discard unchanged with heartbeat: 1h

Trigger prototypes for Disks temperature discovery

Name	Description	Expression	Severity	Dependencies and additional info
TrueNAS CORE: Disk [{#DISK_NAME}]': Average disk temperature is too high	Disk temperature is high.	`avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.CRIT:"{#DISK_NAME}"}`\|Average
TrueNAS CORE: Disk [{#DISK_NAME}]': Average disk temperature is too high	Disk temperature is high.	`avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.WARN:"{#DISK_NAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_travis_ci_http

View README Download JSON

Travis CI by HTTP

Overview

The template to monitor Travis CI by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Travis CI API V3 2021

Configuration

Setup

You must set {$TRAVIS.API.TOKEN} and {$TRAVIS.API.URL} macros. {$TRAVIS.API.TOKEN} is a Travis API authentication token located in User -> Settings -> API authentication. {$TRAVIS.API.URL} could be in 2 different variations:

for a private project : api.travis-ci.com
for an enterprise projects: api.example.com (where you replace example.com with the domain Travis CI is running on)

Macros used

Name	Description	Default
{$TRAVIS.API.TOKEN}	Travis API Token
{$TRAVIS.API.URL}	Travis API URL	`api.travis-ci.com`
{$TRAVIS.BUILDS.SUCCESS.PERCENT}	Percent of successful builds in the repo (for trigger expression)	`80`

Items

Name	Description	Type	Key and additional info
Get repos	Getting repos using Travis API.	HTTP agent	travis.get_repos
Get builds	Getting builds using Travis API.	HTTP agent	travis.get_builds
Get jobs	Getting jobs using Travis API.	HTTP agent	travis.get_jobs
Get health	Getting home JSON using Travis API.	HTTP agent	travis.get_health Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0` JavaScript: `The text is too long. Please see the template.`
Jobs passed	Total count of passed jobs in all repos.	Dependent item	travis.jobs.total Preprocessing JSON Path: `$.jobs.length()`
Jobs active	Active jobs in all repos.	Dependent item	travis.jobs.active Preprocessing JSON Path: `$.jobs[?(@.state == "started")].length()` ⛔️Custom on fail: Set value to: `0`
Jobs in queue	Jobs in queue in all repos.	Dependent item	travis.jobs.queue Preprocessing JSON Path: `$.jobs[?(@.state == "received")].length()` ⛔️Custom on fail: Set value to: `0`
Builds	Total count of builds in all repos.	Dependent item	travis.builds.total Preprocessing JSON Path: `$.builds.length()`
Builds duration	Sum of all builds durations in all repos.	Dependent item	travis.builds.duration Preprocessing JSON Path: `$..duration.sum()` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Travis CI: Service is unavailable	Travis API is unavailable. Please check if the correct macros are set.	`last(/Travis CI by HTTP/travis.get_health)=0`\|High	Manual close: Yes
Travis CI: Failed to fetch home page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Travis CI by HTTP/travis.get_health,30m)=1`\|Warning	Manual close: Yes

LLD rule Repos metrics discovery

Name Description Type Key and additional info

Repos metrics discovery

Metrics for Repos statistics.

Dependent item

travis.repos.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Repos metrics discovery

Name	Description	Type	Key and additional info
Repo [{#SLUG}]: Get builds	Getting builds of {#SLUG} using Travis API.	HTTP agent	travis.repo.get_builds[{#SLUG}]
Repo [{#SLUG}]: Get caches	Getting caches of {#SLUG} using Travis API.	HTTP agent	travis.repo.get_caches[{#SLUG}]
Repo [{#SLUG}]: Cache files	Count of cache files in {#SLUG} repo.	Dependent item	travis.repo.caches.files[{#SLUG}] Preprocessing JSON Path: `$.caches.length()`
Repo [{#SLUG}]: Cache size	Total size of cache files in {#SLUG} repo.	Dependent item	travis.repo.caches.size[{#SLUG}] Preprocessing JSON Path: `$.caches..size.sum()` ⛔️Custom on fail: Set value to: `0`
Repo [{#SLUG}]: Builds passed	Count of all passed builds in {#SLUG} repo.	Dependent item	travis.repo.builds.passed[{#SLUG}] Preprocessing JavaScript: `The text is too long. Please see the template.`
Repo [{#SLUG}]: Builds failed	Count of all failed builds in {#SLUG} repo.	Dependent item	travis.repo.builds.failed[{#SLUG}] Preprocessing JavaScript: `The text is too long. Please see the template.`
Repo [{#SLUG}]: Builds total	Count of total builds in {#SLUG} repo.	Dependent item	travis.repo.builds.total[{#SLUG}] Preprocessing JSON Path: `$.builds.length()`
Repo [{#SLUG}]: Builds passed, %	Percent of passed builds in {#SLUG} repo.	Calculated	travis.repo.builds.passed.pct[{#SLUG}]
Repo [{#SLUG}]: Description	Description of Travis repo (git project description).	Dependent item	travis.repo.description[{#SLUG}] Preprocessing JSON Path: `$.repositories[?(@.slug == "{#SLUG}")].description.first()` Discard unchanged with heartbeat: `1h`
Repo [{#SLUG}]: Last build duration	Last build duration in {#SLUG} repo.	Dependent item	travis.repo.last_build.duration[{#SLUG}] Preprocessing JSON Path: `$.builds[0].duration` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Repo [{#SLUG}]: Last build state	Last build state in {#SLUG} repo.	Dependent item	travis.repo.last_build.state[{#SLUG}] Preprocessing JSON Path: `$.builds[0].state` Discard unchanged with heartbeat: `1h`
Repo [{#SLUG}]: Last build number	Last build number in {#SLUG} repo.	Dependent item	travis.repo.last_build.number[{#SLUG}] Preprocessing JSON Path: `$.builds[0].number` Discard unchanged with heartbeat: `1h`
Repo [{#SLUG}]: Last build id	Last build id in {#SLUG} repo.	Dependent item	travis.repo.last_build.id[{#SLUG}] Preprocessing JSON Path: `$.builds[0].id` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Repos metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Travis CI: Repo [{#SLUG}]: Percent of successful builds	Low successful builds rate.	`last(/Travis CI by HTTP/travis.repo.builds.passed.pct[{#SLUG}])<{$TRAVIS.BUILDS.SUCCESS.PERCENT}`\|Warning	Manual close: Yes
Travis CI: Repo [{#SLUG}]: Last build status is 'errored'	Last build status is errored.	`find(/Travis CI by HTTP/travis.repo.last_build.state[{#SLUG}],,"like","errored")=1`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_tomcat_jmx

View README Download JSON

Apache Tomcat by JMX

Overview

This template is designed for the effortless deployment of Apache Tomcat monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache Tomcat 8.5.59

Configuration

Setup

Metrics are collected by JMX.

Enable and configure JMX access to Apache Tomcat. See documentation for instructions (chose your version).
If your Tomcat installation require authentication for JMX, set values in host macros {$TOMCAT.USERNAME} and {$TOMCAT.PASSWORD}.
You can set custom macro values and add macros with context for specific metrics following macro description.

Macros used

Name	Description	Default
{$TOMCAT.USER}	User for JMX
{$TOMCAT.PASSWORD}	Password for JMX
{$TOMCAT.LLD.FILTER.REQUEST_PROCESSOR.MATCHES}	Filter for discoverable global request processors.	`.*`
{$TOMCAT.LLD.FILTER.REQUESTPROCESSOR.NOTMATCHES}	Filter to exclude global request processors.	`CHANGE_IF_NEEDED`
{$TOMCAT.LLD.FILTER.MANAGER.MATCHES}	Filter for discoverable managers.	`.*`
{$TOMCAT.LLD.FILTER.MANAGER.NOT_MATCHES}	Filter to exclude managers.	`CHANGE_IF_NEEDED`
{$TOMCAT.LLD.FILTER.THREAD_POOL.MATCHES}	Filter for discoverable thread pools.	`.*`
{$TOMCAT.LLD.FILTER.THREADPOOL.NOTMATCHES}	Filter to exclude thread pools.	`CHANGE_IF_NEEDED`
{$TOMCAT.THREADS.MAX.PCT}	Threshold for busy worker threads trigger. Can be used with {#JMXNAME} as context.	`75`
{$TOMCAT.THREADS.MAX.TIME}	The time during which the number of busy threads can exceed the threshold. Can be used with {#JMXNAME} as context.	`5m`

Items

Name Description Type Key and additional info

Version

The version of the Tomcat.

JMX agent

jmx["Catalina:type=Server",serverInfo]

Preprocessing

Discard unchanged with heartbeat: 1d

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Apache Tomcat: Version has been changed	The Tomcat version has changed. Acknowledge to close the problem manually.	`last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#1)<>last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#2) and length(last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo]))>0`\|Info	Manual close: Yes

LLD rule Global request processors discovery

Name	Description	Type	Key and additional info
Global request processors discovery	Discovery for GlobalRequestProcessor	JMX agent	jmx.discovery[beans,"Catalina:type=GlobalRequestProcessor,name=*"]

Item prototypes for Global request processors discovery

Name	Description	Type	Key and additional info
{#JMXNAME}: Bytes received per second	Bytes received rate by processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},bytesReceived] Preprocessing Change per second
{#JMXNAME}: Bytes sent per second	Bytes sent rate by processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},bytesSent] Preprocessing Change per second
{#JMXNAME}: Errors per second	Error rate of request processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},errorCount] Preprocessing Change per second
{#JMXNAME}: Requests per second	Rate of requests served by request processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},requestCount] Preprocessing Change per second
{#JMXNAME}: Requests processing time	The total time to process all incoming requests of request processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},processingTime] Preprocessing Custom multiplier: `0.001`

LLD rule Protocol handlers discovery

Name	Description	Type	Key and additional info
Protocol handlers discovery	Discovery for ProtocolHandler	JMX agent	jmx.discovery[attributes,"Catalina:type=ProtocolHandler,port=*"]

Item prototypes for Protocol handlers discovery

Name Description Type Key and additional info

{#JMXVALUE}: Gzip compression status

Gzip compression status on {#JMXNAME}. Enabling gzip compression may save server bandwidth.

JMX agent

jmx[{#JMXOBJ},compression]

Preprocessing

Discard unchanged with heartbeat: 1h

Trigger prototypes for Protocol handlers discovery

Name	Description	Expression	Severity	Dependencies and additional info
Apache Tomcat: {#JMXVALUE}: Gzip compression is disabled	gzip compression is disabled for connector {#JMXVALUE}.	`find(/Apache Tomcat by JMX/jmx[{#JMXOBJ},compression],,"like","off") = 1`\|Info	Manual close: Yes

LLD rule Thread pools discovery

Name	Description	Type	Key and additional info
Thread pools discovery	Discovery for ThreadPool	JMX agent	jmx.discovery[beans,"Catalina:type=ThreadPool,name=*"]

Item prototypes for Thread pools discovery

Name Description Type Key and additional info

{#JMXNAME}: Threads count

Amount of threads the thread pool has right now, both busy and free.

JMX agent

jmx[{#JMXOBJ},currentThreadCount]

Preprocessing

Discard unchanged with heartbeat: 10m

{#JMXNAME}: Threads limit

Limit of the threads count. When currentThreadsBusy counter reaches the maxThreads limit, no more requests could be handled, and the application chokes.

JMX agent

jmx[{#JMXOBJ},maxThreads]

Preprocessing

Discard unchanged with heartbeat: 10m

{#JMXNAME}: Threads busy

Number of the requests that are being currently handled.

JMX agent

jmx[{#JMXOBJ},currentThreadsBusy]

Trigger prototypes for Thread pools discovery

Name	Description	Expression	Severity	Dependencies and additional info
Apache Tomcat: {#JMXNAME}: Busy worker threads count is high	When current threads busy counter reaches the limit, no more requests could be handled, and the application chokes.	`min(/Apache Tomcat by JMX/jmx[{#JMXOBJ},currentThreadsBusy],{$TOMCAT.THREADS.MAX.TIME:"{#JMXNAME}"})>last(/Apache Tomcat by JMX/jmx[{#JMXOBJ},maxThreads])*{$TOMCAT.THREADS.MAX.PCT:"{#JMXNAME}"}/100`\|High

LLD rule Contexts discovery

Name	Description	Type	Key and additional info
Contexts discovery	Discovery for contexts	JMX agent	jmx.discovery[beans,"Catalina:type=Manager,host=,context="]

Item prototypes for Contexts discovery

Name	Description	Type	Key and additional info
{#JMXHOST}{#JMXCONTEXT}: Sessions active	Active sessions of the application.	JMX agent	jmx[{#JMXOBJ},activeSessions]
{#JMXHOST}{#JMXCONTEXT}: Sessions active maximum so far	Maximum number of active sessions so far.	JMX agent	jmx[{#JMXOBJ},maxActive]
{#JMXHOST}{#JMXCONTEXT}: Sessions created per second	Rate of sessions created by this application per second.	JMX agent	jmx[{#JMXOBJ},sessionCounter] Preprocessing Change per second
{#JMXHOST}{#JMXCONTEXT}: Sessions rejected per second	Rate of sessions we rejected due to maxActive being reached.	JMX agent	jmx[{#JMXOBJ},rejectedSessions] Preprocessing Change per second
{#JMXHOST}{#JMXCONTEXT}: Sessions allowed maximum	The maximum number of active Sessions allowed, or -1 for no limit.	JMX agent	jmx[{#JMXOBJ},maxActiveSessions]

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_systemd

View README Download JSON

Systemd by Zabbix agent 2

Overview

This template is designed for the effortless deployment of Systemd monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Systemd 252

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Systemd monitoring plugin.
Set filters with macros if you want to override default filter parameters.

Macros used

Name	Description	Default
{$SYSTEMD.NAME.SOCKET.MATCHES}	Filter of systemd socket units by name.	`.+`
{$SYSTEMD.NAME.SOCKET.NOT_MATCHES}	Filter of systemd socket units by name.	`CHANGE_IF_NEEDED`
{$SYSTEMD.ACTIVESTATE.SOCKET.MATCHES}	Filter of systemd socket units by active state.	`.+`
{$SYSTEMD.ACTIVESTATE.SOCKET.NOT_MATCHES}	Filter of systemd socket units by active state.	`^inactive$`
{$SYSTEMD.UNITFILESTATE.SOCKET.MATCHES}	Filter of systemd socket units by unit file state.	`^enabled$`
{$SYSTEMD.UNITFILESTATE.SOCKET.NOT_MATCHES}	Filter of systemd socket units by unit file state.	`CHANGE_IF_NEEDED`
{$SYSTEMD.NAME.SERVICE.MATCHES}	Filter of systemd service units by name.	`.+`
{$SYSTEMD.NAME.SERVICE.NOT_MATCHES}	Filter of systemd service units by name.	`CHANGE_IF_NEEDED`
{$SYSTEMD.ACTIVESTATE.SERVICE.MATCHES}	Filter of systemd service units by active state.	`.+`
{$SYSTEMD.ACTIVESTATE.SERVICE.NOT_MATCHES}	Filter of systemd service units by active state.	`^inactive$`
{$SYSTEMD.UNITFILESTATE.SERVICE.MATCHES}	Filter of systemd service units by unit file state.	`^enabled$`
{$SYSTEMD.UNITFILESTATE.SERVICE.NOT_MATCHES}	Filter of systemd service units by unit file state.	`CHANGE_IF_NEEDED`

LLD rule Service units discovery

Name	Description	Type	Key and additional info
Service units discovery	Discover systemd service units and their details.	Zabbix agent	systemd.unit.discovery[service]

Item prototypes for Service units discovery

Name	Description	Type	Key and additional info
{#UNIT.NAME}: Get unit info	Returns all properties of a systemd service unit. Unit description: {#UNIT.DESCRIPTION}.	Zabbix agent	systemd.unit.get["{#UNIT.NAME}"]
{#UNIT.NAME}: Active state	State value that reflects whether the unit is currently active or not. The following states are currently defined: "active", "reloading", "inactive", "failed", "activating", and "deactivating".	Dependent item	systemd.service.active_state["{#UNIT.NAME}"] Preprocessing JSON Path: `$.ActiveState.state` Discard unchanged with heartbeat: `30m`
{#UNIT.NAME}: Load state	State value that reflects whether the configuration file of this unit has been loaded. The following states are currently defined: "loaded", "error", and "masked".	Dependent item	systemd.service.load_state["{#UNIT.NAME}"] Preprocessing JSON Path: `$.LoadState.state` Discard unchanged with heartbeat: `30m`
{#UNIT.NAME}: Unit file state	Encodes the install state of the unit file of FragmentPath. It currently knows the following states: "enabled", "enabled-runtime", "linked", "linked-runtime", "masked", "masked-runtime", "static", "disabled", and "invalid".	Dependent item	systemd.service.unitfile_state["{#UNIT.NAME}"] Preprocessing JSON Path: `$.UnitFileState.state` Discard unchanged with heartbeat: `30m`
{#UNIT.NAME}: Active time	Number of seconds since unit entered the active state.	Dependent item	systemd.service.uptime["{#UNIT.NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for Service units discovery

Name	Description	Expression	Severity	Dependencies and additional info
Systemd: {#UNIT.NAME}: Service is not running		`last(/Systemd by Zabbix agent 2/systemd.service.active_state["{#UNIT.NAME}"])<>1`\|Warning	Manual close: Yes
Systemd: {#UNIT.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Systemd by Zabbix agent 2/systemd.service.uptime["{#UNIT.NAME}"])<10m`\|Info	Manual close: Yes

LLD rule Socket units discovery

Name	Description	Type	Key and additional info
Socket units discovery	Discover systemd socket units and their details.	Zabbix agent	systemd.unit.discovery[socket]

Item prototypes for Socket units discovery

Name Description Type Key and additional info

{#UNIT.NAME}: Get unit info

Returns all properties of a systemd socket unit.

Unit description: {#UNIT.DESCRIPTION}.

Zabbix agent

systemd.unit.get["{#UNIT.NAME}",Socket]

{#UNIT.NAME}: Connections accepted per sec

The number of accepted socket connections (NAccepted) per second.

Dependent item

systemd.socket.conn_accepted.rate["{#UNIT.NAME}"]

Preprocessing

JSON Path: $.NAccepted
Change per second

{#UNIT.NAME}: Connections connected

The current number of socket connections (NConnections).

Dependent item

systemd.socket.conn_count["{#UNIT.NAME}"]

Preprocessing

JSON Path: $.NConnections

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_squid_snmp

View README Download JSON

Squid by SNMP

Overview

This template is designed for the effortless deployment of Squid monitoring by Zabbix via SNMP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Squid 3.5.12

Configuration

Setup

Setup Squid

Enable SNMP support following official documentation. Required parameters in squid.conf:

snmp_port <port_number>
acl <zbx_acl_name> snmp_community <community_name>
snmp_access allow <zbx_acl_name> <zabbix_server_ip>

Setup Zabbix

1. Import the template templateappsquid_snmp.yaml into Zabbix.

2. Set values for {$SQUID.SNMP.COMMUNITY}, {$SQUID.SNMP.PORT} and {$SQUID.HTTP.PORT} as configured in squid.conf.

3. Link the imported template to a host with Squid.

4. Add SNMPv2 interface to Squid host. Set Port as {$SQUID.SNMP.PORT} and SNMP community as {$SQUID.SNMP.COMMUNITY}.

Macros used

Name	Description	Default
{$SQUID.SNMP.PORT}	snmp_port configured in squid.conf (Default: 3401)	`3401`
{$SQUID.HTTP.PORT}	http_port configured in squid.conf (Default: 3128)	`3128`
{$SQUID.SNMP.COMMUNITY}	SNMP community allowed by ACL in squid.conf	`public`
{$SQUID.FILE.DESC.WARN.MIN}	The threshold for minimum number of available file descriptors	`100`
{$SQUID.PAGE.FAULT.WARN}	The threshold for sys page faults rate in percent of received HTTP requests	`90`

Items

Name	Description	Type	Key and additional info
Service ping		Simple check	net.tcp.service[tcp,,{$SQUID.HTTP.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
Uptime	The Uptime of the cache in timeticks (in hundredths of a second) with preprocessing	SNMP agent	squid[cacheUptime] Preprocessing Custom multiplier: `0.01`
Version	Cache Software Version	SNMP agent	squid[cacheVersionId] Preprocessing Discard unchanged with heartbeat: `6h`
CPU usage	The percentage use of the CPU	SNMP agent	squid[cacheCpuUsage]
Memory maximum resident size	Maximum Resident Size	SNMP agent	squid[cacheMaxResSize] Preprocessing Custom multiplier: `1024`
Memory maximum cache size	The value of the cache_mem parameter	SNMP agent	squid[cacheMemMaxSize] Preprocessing Custom multiplier: `1048576`
Memory cache usage	Total accounted memory	SNMP agent	squid[cacheMemUsage] Preprocessing Custom multiplier: `1024`
Cache swap low water mark	Cache Swap Low Water Mark	SNMP agent	squid[cacheSwapLowWM]
Cache swap high water mark	Cache Swap High Water Mark	SNMP agent	squid[cacheSwapHighWM]
Cache swap directory size	The total of the cache_dir space allocated	SNMP agent	squid[cacheSwapMaxSize] Preprocessing Custom multiplier: `1048576`
Cache swap current size	Storage Swap Size	SNMP agent	squid[cacheCurrentSwapSize]
File descriptor count - current used	Number of file descriptors in use	SNMP agent	squid[cacheCurrentFileDescrCnt]
File descriptor count - current maximum	Highest number of file descriptors in use	SNMP agent	squid[cacheCurrentFileDescrMax]
File descriptor count - current reserved	Reserved number of file descriptors	SNMP agent	squid[cacheCurrentResFileDescrCnt]
File descriptor count - current available	Available number of file descriptors	SNMP agent	squid[cacheCurrentUnusedFDescrCnt]
Byte hit ratio per 1 minute	Byte Hit Ratios	SNMP agent	squid[cacheRequestByteRatio.1]
Byte hit ratio per 5 minutes	Byte Hit Ratios	SNMP agent	squid[cacheRequestByteRatio.5]
Byte hit ratio per 1 hour	Byte Hit Ratios	SNMP agent	squid[cacheRequestByteRatio.60]
Request hit ratio per 1 minute	Byte Hit Ratios	SNMP agent	squid[cacheRequestHitRatio.1]
Request hit ratio per 5 minutes	Byte Hit Ratios	SNMP agent	squid[cacheRequestHitRatio.5]
Request hit ratio per 1 hour	Byte Hit Ratios	SNMP agent	squid[cacheRequestHitRatio.60]
Sys page faults per second	Page faults with physical I/O	SNMP agent	squid[cacheSysPageFaults] Preprocessing Change per second
HTTP requests received per second	Number of HTTP requests received	SNMP agent	squid[cacheProtoClientHttpRequests] Preprocessing Change per second
HTTP traffic received per second	Number of HTTP traffic received from clients	SNMP agent	squid[cacheHttpInKb] Preprocessing Custom multiplier: `1024` Change per second
HTTP traffic sent per second	Number of HTTP traffic sent to clients	SNMP agent	squid[cacheHttpOutKb] Preprocessing Custom multiplier: `1024` Change per second
HTTP Hits sent from cache per second	Number of HTTP Hits sent to clients from cache	SNMP agent	squid[cacheHttpHits] Preprocessing Change per second
HTTP Errors sent per second	Number of HTTP Errors sent to clients	SNMP agent	squid[cacheHttpErrors] Preprocessing Change per second
ICP messages sent per second	Number of ICP messages sent	SNMP agent	squid[cacheIcpPktsSent] Preprocessing Change per second
ICP messages received per second	Number of ICP messages received	SNMP agent	squid[cacheIcpPktsRecv] Preprocessing Change per second
ICP traffic transmitted per second	Number of ICP traffic transmitted	SNMP agent	squid[cacheIcpKbSent] Preprocessing Custom multiplier: `1024` Change per second
ICP traffic received per second	Number of ICP traffic received	SNMP agent	squid[cacheIcpKbRecv] Preprocessing Custom multiplier: `1024` Change per second
DNS server requests per second	Number of external dns server requests	SNMP agent	squid[cacheDnsRequests] Preprocessing Change per second
DNS server replies per second	Number of external dns server replies	SNMP agent	squid[cacheDnsReplies] Preprocessing Change per second
FQDN cache requests per second	Number of FQDN Cache requests	SNMP agent	squid[cacheFqdnRequests] Preprocessing Change per second
FQDN cache hits per second	Number of FQDN Cache hits	SNMP agent	squid[cacheFqdnHits] Preprocessing Change per second
FQDN cache misses per second	Number of FQDN Cache misses	SNMP agent	squid[cacheFqdnMisses] Preprocessing Change per second
IP cache requests per second	Number of IP Cache requests	SNMP agent	squid[cacheIpRequests] Preprocessing Change per second
IP cache hits per second	Number of IP Cache hits	SNMP agent	squid[cacheIpHits] Preprocessing Change per second
IP cache misses per second	Number of IP Cache misses	SNMP agent	squid[cacheIpMisses] Preprocessing Change per second
Objects count	Number of objects stored by the cache	SNMP agent	squid[cacheNumObjCount]
Objects LRU expiration age	Storage LRU Expiration Age	SNMP agent	squid[cacheCurrentLRUExpiration] Preprocessing Custom multiplier: `0.01`
Objects unlinkd requests	Requests given to unlinkd	SNMP agent	squid[cacheCurrentUnlinkRequests]
HTTP all service time per 5 minutes	HTTP all service time per 5 minutes	SNMP agent	squid[cacheHttpAllSvcTime.5] Preprocessing Custom multiplier: `0.001`
HTTP all service time per hour	HTTP all service time per hour	SNMP agent	squid[cacheHttpAllSvcTime.60] Preprocessing Custom multiplier: `0.001`
HTTP miss service time per 5 minutes	HTTP miss service time per 5 minutes	SNMP agent	squid[cacheHttpMissSvcTime.5] Preprocessing Custom multiplier: `0.001`
HTTP miss service time per hour	HTTP miss service time per hour	SNMP agent	squid[cacheHttpMissSvcTime.60] Preprocessing Custom multiplier: `0.001`
HTTP hit service time per 5 minutes	HTTP hit service time per 5 minutes	SNMP agent	squid[cacheHttpHitSvcTime.5] Preprocessing Custom multiplier: `0.001`
HTTP hit service time per hour	HTTP hit service time per hour	SNMP agent	squid[cacheHttpHitSvcTime.60] Preprocessing Custom multiplier: `0.001`
ICP query service time per 5 minutes	ICP query service time per 5 minutes	SNMP agent	squid[cacheIcpQuerySvcTime.5] Preprocessing Custom multiplier: `0.001`
ICP query service time per hour	ICP query service time per hour	SNMP agent	squid[cacheIcpQuerySvcTime.60] Preprocessing Custom multiplier: `0.001`
ICP reply service time per 5 minutes	ICP reply service time per 5 minutes	SNMP agent	squid[cacheIcpReplySvcTime.5] Preprocessing Custom multiplier: `0.001`
ICP reply service time per hour	ICP reply service time per hour	SNMP agent	squid[cacheIcpReplySvcTime.60] Preprocessing Custom multiplier: `0.001`
DNS service time per 5 minutes	DNS service time per 5 minutes	SNMP agent	squid[cacheDnsSvcTime.5] Preprocessing Custom multiplier: `0.001`
DNS service time per hour	DNS service time per hour	SNMP agent	squid[cacheDnsSvcTime.60] Preprocessing Custom multiplier: `0.001`

Triggers

Name	Description	Expression	Severity
Squid: Port {$SQUID.HTTP.PORT} is down		`last(/Squid by SNMP/net.tcp.service[tcp,,{$SQUID.HTTP.PORT}])=0`\|Average	Manual close: Yes
Squid: Squid has been restarted	Uptime is less than 10 minutes.	`last(/Squid by SNMP/squid[cacheUptime])<10m`\|Info	Manual close: Yes
Squid: Squid version has been changed	Squid version has changed. Acknowledge to close the problem manually.	`last(/Squid by SNMP/squid[cacheVersionId],#1)<>last(/Squid by SNMP/squid[cacheVersionId],#2) and length(last(/Squid by SNMP/squid[cacheVersionId]))>0`\|Info	Manual close: Yes
Squid: Swap usage is more than low watermark		`last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapLowWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100`\|Warning
Squid: Swap usage is more than high watermark		`last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapHighWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100`\|High
Squid: Squid is running out of file descriptors		`last(/Squid by SNMP/squid[cacheCurrentUnusedFDescrCnt])<{$SQUID.FILE.DESC.WARN.MIN}`\|Warning
Squid: High sys page faults rate		`avg(/Squid by SNMP/squid[cacheSysPageFaults],5m)>avg(/Squid by SNMP/squid[cacheProtoClientHttpRequests],5m)/100*{$SQUID.PAGE.FAULT.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_sharepoint_http

View README Download JSON

Microsoft SharePoint by HTTP

Overview

This template is designed for the effortless deployment of Microsoft SharePoint monitoring by Zabbix via HTTP and doesn't require any external scripts.

SharePoint includes a Representational State Transfer (REST) service. Developers can perform read operations from their SharePoint Add-ins, solutions, and client applications, using REST web technologies and standard Open Data Protocol (OData) syntax. Details in https://docs.microsoft.com/ru-ru/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service?tabs=csom

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

SharePoint Server 2019

Configuration

Setup

Create a new host. Define macros according to your Sharepoint web portal. It is recommended to fill in the values of the filter macros to avoid getting redundant data.

Macros used

Name	Description	Default
{$SHAREPOINT.USER}
{$SHAREPOINT.PASSWORD}
{$SHAREPOINT.URL}	Portal page URL. For example http://sharepoint.companyname.local/
{$SHAREPOINT.LLD.FILTER.NAME.MATCHES}	Filter of discoverable dictionaries by name.	`.*`
{$SHAREPOINT.LLD.FILTER.FULL_PATH.MATCHES}	Filter of discoverable dictionaries by full path.	`^/`
{$SHAREPOINT.LLD.FILTER.TYPE.MATCHES}	Filter of discoverable types.	`FOLDER`
{$SHAREPOINT.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered dictionaries by name.	`CHANGE_IF_NEEDED`
{$SHAREPOINT.LLD.FILTER.FULLPATH.NOTMATCHES}	Filter to exclude discovered dictionaries by full path.	`CHANGE_IF_NEEDED`
{$SHAREPOINT.LLD.FILTER.TYPE.NOT_MATCHES}	Filter to exclude discovered types.	`CHANGE_IF_NEEDED`
{$SHAREPOINT.ROOT}		`/Shared Documents`
{$SHAREPOINT.LLD_INTERVAL}		`3h`
{$SHAREPOINT.GET_INTERVAL}		`1m`
{$SHAREPOINT.MAXHEALTHSCORE}	Must be in the range from 0 to 10 in details: https://docs.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-wsshp/c60ddeb6-4113-4a73-9e97-26b5c3907d33	`5`

Items

Name	Description	Type	Key and additional info
Get directory structure	Used to get directory structure information	Script	sharepoint.get_dir Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"status":520,"data":{},"time":0}`
Get directory structure: Status	HTTP response (status) code. Indicates whether the HTTP request was successfully completed. Additional information is available in the server log file.	Dependent item	sharepoint.get_dir.status Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set error to: `DISCARD_VALUE` Discard unchanged with heartbeat: `3h`
Get directory structure: Exec time	The time taken to execute the script for obtaining the data structure (in ms). Less is better.	Dependent item	sharepoint.get_dir.time Preprocessing JSON Path: `$.time` ⛔️Custom on fail: Set error to: `DISCARD_VALUE` Discard unchanged with heartbeat: `3h`
Health score	This item specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput.	HTTP agent	sharepoint.health_score Preprocessing Regular expression: `X-SharePointHealthScore\b:\s(\d+) \1` In range: `0 -> 10` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity
MS SharePoint: Error getting directory structure.	Error getting directory structure. Check the Zabbix server log for more details.	`last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.status)<>200`\|Warning	Manual close: Yes
MS SharePoint: Server responds slowly to API request		`last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.time)>2000`\|Warning	Manual close: Yes
MS SharePoint: Bad health score		`last(/Microsoft SharePoint by HTTP/sharepoint.health_score)>"{$SHAREPOINT.MAX_HEALTH_SCORE}"`\|Average

LLD rule Directory discovery

Name Description Type Key and additional info

Directory discovery

Script

sharepoint.directory.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for Directory discovery

Name Description Type Key and additional info

Size ({#SHAREPOINT.LLD.FULL_PATH})

Size of:

{#SHAREPOINT.LLD.FULL_PATH}

Dependent item

sharepoint.size["{#SHAREPOINT.LLD.FULL_PATH}"]

Preprocessing

JSON Path: {{#SHAREPOINT.LLD.JSON_PATH}.regsub("(.*)", \1)}.meta.size
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 24h

Modified ({#SHAREPOINT.LLD.FULL_PATH})

Date of change:

{#SHAREPOINT.LLD.FULL_PATH}

Dependent item

sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Created ({#SHAREPOINT.LLD.FULL_PATH})

Date of creation:

{#SHAREPOINT.LLD.FULL_PATH}

Dependent item

sharepoint.created["{#SHAREPOINT.LLD.FULL_PATH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Trigger prototypes for Directory discovery

Name	Description	Expression	Severity	Dependencies and additional info
MS SharePoint: Sharepoint object is changed	Updated date of modification of folder / file	`last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#1)<>last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_rabbitmq_http

View README Download JSON

RabbitMQ cluster by HTTP

Overview

This template is developed to monitor the messaging broker RabbitMQ cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See the RabbitMQ documentation for the instructions.
Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

Set the hostname or IP address of the RabbitMQ cluster host in the {$RABBITMQ.API.CLUSTER_HOST} macro. You can also change the port in the {$RABBITMQ.API.PORT} macro and the scheme in the {$RABBITMQ.API.SCHEME} macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER} and {$RABBITMQ.API.PASSWORD}.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.API.CLUSTER_HOST}	The hostname or IP of the API endpoint for the RabbitMQ cluster.	`<SET CLUSTER API HOST>`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get overview	The HTTP API endpoint that returns cluster-wide metrics.	HTTP agent	rabbitmq.get_overview
Get exchanges	The HTTP API endpoint that returns exchanges metrics.	HTTP agent	rabbitmq.get_exchanges
Connections total	The total number of connections.	Dependent item	rabbitmq.overview.object_totals.connections Preprocessing JSON Path: `$.object_totals.connections`
Channels total	The total number of channels.	Dependent item	rabbitmq.overview.object_totals.channels Preprocessing JSON Path: `$.object_totals.channels`
Queues total	The total number of queues.	Dependent item	rabbitmq.overview.object_totals.queues Preprocessing JSON Path: `$.object_totals.queues`
Consumers total	The total number of consumers.	Dependent item	rabbitmq.overview.object_totals.consumers Preprocessing JSON Path: `$.object_totals.consumers`
Exchanges total	The total number of exchanges.	Dependent item	rabbitmq.overview.object_totals.exchanges Preprocessing JSON Path: `$.object_totals.exchanges`
Messages total	The total number of messages (ready, plus unacknowledged).	Dependent item	rabbitmq.overview.queue_totals.messages Preprocessing JSON Path: `$.queue_totals.messages` ⛔️Custom on fail: Set value to: `0`
Messages ready for delivery	The number of messages ready for delivery.	Dependent item	rabbitmq.overview.queue_totals.messages.ready Preprocessing JSON Path: `$.queue_totals.messages_ready` ⛔️Custom on fail: Set value to: `0`
Messages unacknowledged	The number of unacknowledged messages.	Dependent item	rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing JSON Path: `$.queue_totals.messages_unacknowledged` ⛔️Custom on fail: Set value to: `0`
Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack.rate Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.overview.messages.confirm Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.overview.messages.confirm.rate Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get.rate Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages published	The count of published messages.	Dependent item	rabbitmq.overview.messages.publish Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.overview.messages.publish.rate Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in.rate Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out.rate Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable.rate Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages returned redeliver	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
Messages returned redeliver per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver.rate Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ cluster: Failed to fetch overview data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ cluster by HTTP/rabbitmq.get_overview,30m)=1`\|Warning	Manual close: Yes

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Healthcheck: alarms in effect in the cluster{#SINGLETON}

Responds a 200 OK if there are no alarms in effect in the cluster, otherwise responds with a 503 Service Unavailable.

HTTP agent

rabbitmq.healthcheck.alarms[{#SINGLETON}]

Preprocessing

Regular expression: HTTP\/1\.1\b\s(\d+) \1
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ cluster: There are active alarms in the cluster	This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ cluster by HTTP/rabbitmq.healthcheck.alarms[{#SINGLETON}])=0`\|Average

LLD rule Exchanges discovery

Name Description Type Key and additional info

Exchanges discovery

The metrics for an individual exchange.

Dependent item

rabbitmq.exchanges.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Exchanges discovery

Name	Description	Type	Key and additional info
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `The text is too long. Please see the template.`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

RabbitMQ node by HTTP

Overview

This template is developed to monitor the messaging broker RabbitMQ node by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See the RabbitMQ documentation for the instructions.
Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

Set the hostname or IP address of the RabbitMQ node host in the {$RABBITMQ.API.HOST} macro. You can also change the port in the {$RABBITMQ.API.PORT} macro and the scheme in the {$RABBITMQ.API.SCHEME} macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER} and {$RABBITMQ.API.PASSWORD}.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.CLUSTER.NAME}	The name of the RabbitMQ cluster.	`rabbit`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.API.HOST}	The hostname or IP of the API endpoint for the RabbitMQ.	`<SET NODE API HOST>`
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`
{$RABBITMQ.RESPONSE_TIME.MAX.WARN}	The maximum response time by the RabbitMQ expressed in seconds for a trigger expression.	`10`
{$RABBITMQ.MESSAGES.MAX.WARN}	The maximum number of messages in the queue for a trigger expression.	`1000`

Items

Name	Description	Type	Key and additional info
Service ping		Simple check	net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Get node overview	The HTTP API endpoint that returns cluster-wide metrics.	HTTP agent	rabbitmq.getnodeoverview Preprocessing Discard unchanged with heartbeat: `1h`
Get nodes	The HTTP API endpoint that returns metrics of the nodes.	HTTP agent	rabbitmq.get_nodes
Get queues	The HTTP API endpoint that returns metrics of the queues metrics.	HTTP agent	rabbitmq.get_queues
Management plugin version	The version of the management plugin in use.	Dependent item	rabbitmq.node.overview.management_version Preprocessing JSON Path: `$.management_version` Discard unchanged with heartbeat: `1d`
RabbitMQ version	The version of the RabbitMQ on the node, which processed this request.	Dependent item	rabbitmq.node.overview.rabbitmq_version Preprocessing JSON Path: `$.rabbitmq_version` Discard unchanged with heartbeat: `1d`
Used file descriptors	The descriptors of the used file.	Dependent item	rabbitmq.node.fd_used Preprocessing JSON Path: `$.fd_used`
Free disk space	The current free disk space.	Dependent item	rabbitmq.node.disk_free Preprocessing JSON Path: `$.disk_free`
Disk free limit	The free space limit of a disk expressed in bytes.	Dependent item	rabbitmq.node.diskfreelimit Preprocessing JSON Path: `$.disk_free_limit`
Memory used	The memory usage expressed in bytes.	Dependent item	rabbitmq.node.mem_used Preprocessing JSON Path: `$.mem_used`
Memory limit	The memory usage with high watermark properties expressed in bytes.	Dependent item	rabbitmq.node.mem_limit Preprocessing JSON Path: `$.mem_limit`
Runtime run queue	The average number of Erlang processes waiting to run.	Dependent item	rabbitmq.node.run_queue Preprocessing JSON Path: `$.run_queue`
Sockets used	The number of file descriptors used as sockets.	Dependent item	rabbitmq.node.sockets_used Preprocessing JSON Path: `$.sockets_used`
Sockets available	The file descriptors available for use as sockets.	Dependent item	rabbitmq.node.sockets_total Preprocessing JSON Path: `$.sockets_total`
Number of network partitions	The number of network partitions, which this node "sees".	Dependent item	rabbitmq.node.partitions Preprocessing JSON Path: `$.partitions` JavaScript: `return JSON.parse(value).length;`
Is running	It "sees" whether the node is running or not.	Dependent item	rabbitmq.node.running Preprocessing JSON Path: `$.running` Boolean to decimal
Memory alarm	It checks whether the host has a memory alarm or not.	Dependent item	rabbitmq.node.mem_alarm Preprocessing JSON Path: `$.mem_alarm` Boolean to decimal
Disk free alarm	It checks whether the node has a disk alarm or not.	Dependent item	rabbitmq.node.diskfreealarm Preprocessing JSON Path: `$.disk_free_alarm` Boolean to decimal
Uptime	Uptime expressed in milliseconds.	Dependent item	rabbitmq.node.uptime Preprocessing JSON Path: `$.uptime` Custom multiplier: `0.001`
Service response time		Simple check	net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"]

Triggers

Name	Description	Expression	Severity
RabbitMQ node: Service is down		`last(/RabbitMQ node by HTTP/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0`\|Average	Manual close: Yes
RabbitMQ node: Failed to fetch nodes data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ node by HTTP/rabbitmq.get_nodes,30m)=1`\|Warning	Manual close: Yes Depends on: RabbitMQ node: Service is down
RabbitMQ node: Version has changed	RabbitMQ version has changed. Acknowledge to close the problem manually.	`last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version))>0`\|Info	Manual close: Yes
RabbitMQ node: Number of network partitions is too high	For more details see Detecting Network Partitions.	`min(/RabbitMQ node by HTTP/rabbitmq.node.partitions,5m)>0`\|Warning
RabbitMQ node: Node is not running	RabbitMQ node is not running.	`max(/RabbitMQ node by HTTP/rabbitmq.node.running,5m)=0`\|Average	Depends on: RabbitMQ node: Service is down
RabbitMQ node: Memory alarm	For more details see Memory Alarms.	`last(/RabbitMQ node by HTTP/rabbitmq.node.mem_alarm)=1`\|Average
RabbitMQ node: Free disk space alarm	For more details see Free Disk Space Alarms.	`last(/RabbitMQ node by HTTP/rabbitmq.node.disk_free_alarm)=1`\|Average
RabbitMQ node: Host has been restarted	Uptime is less than 10 minutes.	`last(/RabbitMQ node by HTTP/rabbitmq.node.uptime)<10m`\|Info	Manual close: Yes
RabbitMQ node: Service response time is too high		`min(/RabbitMQ node by HTTP/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: RabbitMQ node: Service is down

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name	Description	Type	Key and additional info
Healthcheck: local alarms in effect on this node{#SINGLETON}	It responds with a status code `200 OK` if there are no alarms in effect in the cluster. Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.local_alarms[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: expiration date on the certificates{#SINGLETON}	It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code `200 OK` if all the certificates are valid (have not expired). Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: virtual hosts on this node{#SINGLETON}	It responds with It responds with a status code `200 OK` if all virtual hosts are running on the target node. Otherwise it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON}	It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code `200 OK` if there are no such classic mirrored queues. Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.mirror_sync[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: queues with minimum online quorum{#SINGLETON}	It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code `200 OK` if there are no such quorum queues. Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.quorum[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression
RabbitMQ node: There are active alarms in the node	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.local_alarms[{#SINGLETON}])=0`\|Average
RabbitMQ node: There are valid TLS certificates expiring in the next month	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}])=0`\|Average
RabbitMQ node: There are not running virtual hosts	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}])=0`\|Average
RabbitMQ node: There are queues that could potentially lose data if this node goes offline.	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.mirror_sync[{#SINGLETON}])=0`\|Average
RabbitMQ node: There are queues that would lose their quorum and availability if this node is shut down.	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.quorum[{#SINGLETON}])=0`\|Average

LLD rule Health Check 3.8.9- discovery

Name Description Type Key and additional info

Health Check 3.8.9- discovery

Specific metrics for the versions: up to and including 3.8.4.

Dependent item

rabbitmq.healthcheck.v389.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.9- discovery

Name Description Type Key and additional info

Healthcheck{#SINGLETON}

It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect.

HTTP agent

rabbitmq.healthcheck[{#SINGLETON}]

Preprocessing

JSON Path: $.status
Boolean to decimal
⛔️Custom on fail: Set value to: 0

Trigger prototypes for Health Check 3.8.9- discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ node: Node healthcheck failed	For more details see Health Checks.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck[{#SINGLETON}])=0`\|Average

LLD rule Queues discovery

Name Description Type Key and additional info

Queues discovery

The metrics for an individual queue.

Dependent item

rabbitmq.queues.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Queues discovery

Name	Description	Type	Key and additional info
Queue [{#VHOST}][{#QUEUE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$[?(@.name == "{#QUEUE}" && @.vhost == "{#VHOST}")].first()`
Queue [{#VHOST}][{#QUEUE}]: Messages total	The count of total messages in the queue.	Dependent item	rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages`
Queue [{#VHOST}][{#QUEUE}]: Messages per second	The count of total messages per second in the queue.	Dependent item	rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_details.rate`
Queue [{#VHOST}][{#QUEUE}]: Consumers	The number of consumers.	Dependent item	rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.consumers`
Queue [{#VHOST}][{#QUEUE}]: Memory	The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures.	Dependent item	rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.memory`
Queue [{#VHOST}][{#QUEUE}]: Messages ready	The number of messages ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready`
Queue [{#VHOST}][{#QUEUE}]: Messages ready per second	The number of messages per second ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready_details.rate`
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged	The number of messages delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged`
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second	The number of messages per second delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged_details.rate`
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second	The number of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages delivered	The count of messages delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second	The count of messages (per second) delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second	The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages published per second	The rate of published messages per second.	Dependent item	rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered	The count of subset of messages in the `deliver_get` queue with the `redelivered` flag set.	Dependent item	rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second	The rate of messages redelivered per second.	Dependent item	rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Trigger prototypes for Queues discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ node: Too many messages in queue [{#VHOST}][{#QUEUE}]		`min(/RabbitMQ node by HTTP/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_rabbitmq_agent

View README Download JSON

RabbitMQ cluster by Zabbix agent

Overview

This template is developed to monitor the messaging broker RabbitMQ by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template RabbitMQ Cluster — collects metrics by polling RabbitMQ management plugin with Zabbix agent.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.

Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

A login name and password are also supported in macros functions:

{$RABBITMQ.API.USER}
{$RABBITMQ.API.PASSWORD}

If your cluster consists of several nodes, it is recommended to assign the cluster template to a separate balancing host. In the case of a single-node installation, you can assign the cluster template to one host with a node template.

If you use another API endpoint, then don't forget to change {$RABBITMQ.API.CLUSTER_HOST} macro.

Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.API.CLUSTER_HOST}	The hostname or IP of the API endpoint for the RabbitMQ cluster.	`127.0.0.1`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get overview	The HTTP API endpoint that returns cluster-wide metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
Get exchanges	The HTTP API endpoint that returns exchanges metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/exchanges"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
Connections total	The total number of connections.	Dependent item	rabbitmq.overview.object_totals.connections Preprocessing JSON Path: `$.object_totals.connections`
Channels total	The total number of channels.	Dependent item	rabbitmq.overview.object_totals.channels Preprocessing JSON Path: `$.object_totals.channels`
Queues total	The total number of queues.	Dependent item	rabbitmq.overview.object_totals.queues Preprocessing JSON Path: `$.object_totals.queues`
Consumers total	The total number of consumers.	Dependent item	rabbitmq.overview.object_totals.consumers Preprocessing JSON Path: `$.object_totals.consumers`
Exchanges total	The total number of exchanges.	Dependent item	rabbitmq.overview.object_totals.exchanges Preprocessing JSON Path: `$.object_totals.exchanges`
Messages total	The total number of messages (ready, plus unacknowledged).	Dependent item	rabbitmq.overview.queue_totals.messages Preprocessing JSON Path: `$.queue_totals.messages` ⛔️Custom on fail: Set value to: `0`
Messages ready for delivery	The number of messages ready for delivery.	Dependent item	rabbitmq.overview.queue_totals.messages.ready Preprocessing JSON Path: `$.queue_totals.messages_ready` ⛔️Custom on fail: Set value to: `0`
Messages unacknowledged	The number of unacknowledged messages.	Dependent item	rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing JSON Path: `$.queue_totals.messages_unacknowledged` ⛔️Custom on fail: Set value to: `0`
Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack.rate Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.overview.messages.confirm Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.overview.messages.confirm.rate Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get.rate Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages published	The count of published messages.	Dependent item	rabbitmq.overview.messages.publish Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.overview.messages.publish.rate Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in.rate Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out.rate Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable.rate Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
Messages returned redeliver	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
Messages returned redeliver per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver.rate Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ cluster: Failed to fetch overview data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"],30m)=1`\|Warning	Manual close: Yes

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Healthcheck: alarms in effect in the cluster{#SINGLETON}

It responds with a status code 200 OK if there are no alarms in effect in the cluster.

Otherwise, it responds with a status code 503 Service Unavailable.

Zabbix agent

web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"]

Preprocessing

Regular expression: HTTP\/1\.1\b\s(\d+) \1
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ cluster: There are active alarms in the cluster	This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"])=0`\|Average

LLD rule Exchanges discovery

Name Description Type Key and additional info

Exchanges discovery

The metrics for an individual exchange.

Dependent item

rabbitmq.exchanges.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Exchanges discovery

Name	Description	Type	Key and additional info
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `The text is too long. Please see the template.`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

RabbitMQ node by Zabbix agent

Overview

This template is developed to monitor RabbitMQ by Zabbix that works without any external scripts.

Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template RabbitMQ Node — (Zabbix version >= 4.2) collects metrics by polling RabbitMQ management plugin with Zabbix agent.

It also uses Zabbix agent to collect RabbitMQ Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.

Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

A login name and password are also supported in macros functions:

{$RABBITMQ.API.USER}
{$RABBITMQ.API.PASSWORD}

If you use another API endpoint, then don't forget to change {$RABBITMQ.API.HOST} macro. Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.CLUSTER.NAME}	The name of the RabbitMQ cluster.	`rabbit`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.API.HOST}	The hostname or IP of the API endpoint for the RabbitMQ.	`127.0.0.1`
{$RABBITMQ.PROCESS_NAME}	The process name filter for the RabbitMQ process discovery.	`beam.smp`
{$RABBITMQ.PROCESS.NAME.PARAMETER}	The process name of the RabbitMQ server used in the item key `proc.get`. It could be specified if the correct process name is known.
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`
{$RABBITMQ.RESPONSE_TIME.MAX.WARN}	The maximum response time by the RabbitMQ expressed in seconds for a trigger expression.	`10`
{$RABBITMQ.MESSAGES.MAX.WARN}	The maximum number of messages in the queue for a trigger expression.	`1000`

Items

Name	Description	Type	Key and additional info
Service ping		Zabbix agent	net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Get node overview	The HTTP API endpoint that returns cluster-wide metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing Regular expression: `\n\s?\n(.*) \1` Discard unchanged with heartbeat: `1h`
Get nodes	The HTTP API endpoint that returns metrics of the nodes.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
Get queues	The HTTP API endpoint that returns metrics of the queues metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/queues"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
Management plugin version	The version of the management plugin in use.	Dependent item	rabbitmq.node.overview.management_version Preprocessing JSON Path: `$.management_version` Discard unchanged with heartbeat: `1d`
RabbitMQ version	The version of the RabbitMQ on the node, which processed this request.	Dependent item	rabbitmq.node.overview.rabbitmq_version Preprocessing JSON Path: `$.rabbitmq_version` Discard unchanged with heartbeat: `1d`
Used file descriptors	The descriptors of the used file.	Dependent item	rabbitmq.node.fd_used Preprocessing JSON Path: `$.fd_used`
Free disk space	The current free disk space.	Dependent item	rabbitmq.node.disk_free Preprocessing JSON Path: `$.disk_free`
Memory used	The memory usage expressed in bytes.	Dependent item	rabbitmq.node.mem_used Preprocessing JSON Path: `$.mem_used`
Memory limit	The memory usage with high watermark properties expressed in bytes.	Dependent item	rabbitmq.node.mem_limit Preprocessing JSON Path: `$.mem_limit`
Disk free limit	The free space limit of a disk expressed in bytes.	Dependent item	rabbitmq.node.diskfreelimit Preprocessing JSON Path: `$.disk_free_limit`
Runtime run queue	The average number of Erlang processes waiting to run.	Dependent item	rabbitmq.node.run_queue Preprocessing JSON Path: `$.run_queue`
Sockets used	The number of file descriptors used as sockets.	Dependent item	rabbitmq.node.sockets_used Preprocessing JSON Path: `$.sockets_used`
Sockets available	The file descriptors available for use as sockets.	Dependent item	rabbitmq.node.sockets_total Preprocessing JSON Path: `$.sockets_total`
Number of network partitions	The number of network partitions, which this node "sees".	Dependent item	rabbitmq.node.partitions Preprocessing JSON Path: `$.partitions` JavaScript: `return JSON.parse(value).length;`
Is running	It "sees" whether the node is running or not.	Dependent item	rabbitmq.node.running Preprocessing JSON Path: `$.running` Boolean to decimal
Memory alarm	It checks whether the host has a memory alarm or not.	Dependent item	rabbitmq.node.mem_alarm Preprocessing JSON Path: `$.mem_alarm` Boolean to decimal
Disk free alarm	It checks whether the node has a disk alarm or not.	Dependent item	rabbitmq.node.diskfreealarm Preprocessing JSON Path: `$.disk_free_alarm` Boolean to decimal
Uptime	Uptime expressed in milliseconds.	Dependent item	rabbitmq.node.uptime Preprocessing JSON Path: `$.uptime` Custom multiplier: `0.001`
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$RABBITMQ.PROCESS.NAME.PARAMETER},,,summary]
Service response time		Zabbix agent	net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"]

Triggers

Name	Description	Expression	Severity
RabbitMQ node: Version has changed	RabbitMQ version has changed. Acknowledge to close the problem manually.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version))>0`\|Info	Manual close: Yes
RabbitMQ node: Number of network partitions is too high	For more details see Detecting Network Partitions.	`min(/RabbitMQ node by Zabbix agent/rabbitmq.node.partitions,5m)>0`\|Warning
RabbitMQ node: Memory alarm	For more details see Memory Alarms.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.mem_alarm)=1`\|Average
RabbitMQ node: Free disk space alarm	For more details see Free Disk Space Alarms.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.disk_free_alarm)=1`\|Average
RabbitMQ node: Host has been restarted	Uptime is less than 10 minutes.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.uptime)<10m`\|Info	Manual close: Yes

LLD rule RabbitMQ process discovery

Name	Description	Type	Key and additional info
RabbitMQ process discovery	The discovery of the RabbitMQ summary processes.	Dependent item	rabbitmq.proc.discovery

Item prototypes for RabbitMQ process discovery

Name	Description	Type	Key and additional info
Get process data	The summary metrics aggregated by a process {#RABBITMQ.NAME}.	Dependent item	rabbitmq.proc.get[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#RABBITMQ.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#RABBITMQ.NAME} data`
Number of running processes	The number of running processes {#RABBITMQ.NAME}.	Dependent item	rabbitmq.proc.num[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Memory usage (rss)	The summary of resident set size memory used by a process {#RABBITMQ.NAME} expressed in bytes.	Dependent item	rabbitmq.proc.rss[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage (vsize)	The summary of virtual memory used by a process {#RABBITMQ.NAME} expressed in bytes.	Dependent item	rabbitmq.proc.vmem[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process {#RABBITMQ.NAME}.	Dependent item	rabbitmq.proc.pmem[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
CPU utilization	The percentage of the CPU utilization by a process {#RABBITMQ.NAME}.	Zabbix agent	proc.cpu.util[{#RABBITMQ.NAME}]

Trigger prototypes for RabbitMQ process discovery

Name	Description	Expression	Severity
RabbitMQ node: Process is not running		`last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])=0`\|High
RabbitMQ node: Failed to fetch nodes data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"],30m)=1 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Warning	Manual close: Yes Depends on: RabbitMQ node: Process is not running
RabbitMQ node: Service is down		`last(/RabbitMQ node by Zabbix agent/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Average	Manual close: Yes
RabbitMQ node: Node is not running	RabbitMQ node is not running.	`max(/RabbitMQ node by Zabbix agent/rabbitmq.node.running,5m)=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Average	Depends on: RabbitMQ node: Service is down
RabbitMQ node: Service response time is too high		`min(/RabbitMQ node by Zabbix agent/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Warning	Manual close: Yes Depends on: RabbitMQ node: Service is down

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name	Description	Type	Key and additional info
Healthcheck: local alarms in effect on this node{#SINGLETON}	It responds with a status code `200 OK` if there are no local alarms in effect on the target node. Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: expiration date on the certificates{#SINGLETON}	It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code `200 OK` if all the certificates are valid (have not expired). Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: virtual hosts on this node{#SINGLETON}	It responds with It responds with a status code `200 OK` if all virtual hosts are running on the target node. Otherwise it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON}	It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code `200 OK` if there are no such classic mirrored queues. Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Healthcheck: queues with minimum online quorum{#SINGLETON}	It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code `200 OK` if there are no such quorum queues. Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression
RabbitMQ node: There are active alarms in the node	It checks the active alarms in the nodes via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"])=0`\|Average
RabbitMQ node: There are valid TLS certificates expiring in the next month	It checks if there are valid TLS certificates expiring in the next month. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"])=0`\|Average
RabbitMQ node: There are not running virtual hosts	It checks if there are not running virtual hosts via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"])=0`\|Average
RabbitMQ node: There are queues that could potentially lose data if this node goes offline.	It checks whether there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"])=0`\|Average
RabbitMQ node: There are queues that would lose their quorum and availability if this node is shut down.	It checks if there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"])=0`\|Average

LLD rule Health Check 3.8.9- discovery

Name Description Type Key and additional info

Health Check 3.8.9- discovery

Specific metrics for the versions: up to and including 3.8.4.

Dependent item

rabbitmq.healthcheck.v389.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.9- discovery

Name Description Type Key and additional info

Healthcheck{#SINGLETON}

It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect.

Zabbix agent

web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"]

Preprocessing

Regular expression: \n\s?\n(.*) \1
JSON Path: $.status
Boolean to decimal
⛔️Custom on fail: Set value to: 0

Trigger prototypes for Health Check 3.8.9- discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ node: Node healthcheck failed	For more details see Health Checks.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"])=0`\|Average

LLD rule Queues discovery

Name Description Type Key and additional info

Queues discovery

The metrics for an individual queue.

Dependent item

rabbitmq.queues.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Queues discovery

Name	Description	Type	Key and additional info
Queue [{#VHOST}][{#QUEUE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$[?(@.name == "{#QUEUE}" && @.vhost == "{#VHOST}")].first()`
Queue [{#VHOST}][{#QUEUE}]: Messages total	The count of total messages in the queue.	Dependent item	rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages`
Queue [{#VHOST}][{#QUEUE}]: Messages per second	The count of total messages per second in the queue.	Dependent item	rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_details.rate`
Queue [{#VHOST}][{#QUEUE}]: Consumers	The number of consumers.	Dependent item	rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.consumers`
Queue [{#VHOST}][{#QUEUE}]: Memory	The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures.	Dependent item	rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.memory`
Queue [{#VHOST}][{#QUEUE}]: Messages ready	The number of messages ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready`
Queue [{#VHOST}][{#QUEUE}]: Messages ready per second	The number of messages per second ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready_details.rate`
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged	The number of messages delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged`
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second	The number of messages per second delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged_details.rate`
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second	The number of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages delivered	The count of messages delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second	The count of messages (per second) delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second	The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages published per second	The rate of published messages per second.	Dependent item	rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered	The count of subset of messages in the `deliver_get` queue with the `redelivered` flag set.	Dependent item	rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second	The rate of messages redelivered per second.	Dependent item	rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Trigger prototypes for Queues discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ node: Too many messages in queue [{#VHOST}][{#QUEUE}]		`min(/RabbitMQ node by Zabbix agent/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_proxmox

View README Download JSON

Proxmox VE by HTTP

Overview

This template is designed for the effortless deployment of Proxmox VE monitoring by Zabbix via HTTP and doesn't require any external scripts.

Proxmox VE uses a REST like API. The concept is described in Resource Oriented Architecture (ROA).

Check the API documentation for details.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Proxmox VE

Configuration

Setup

Create an API token for the monitoring user. Important note: for security reasons, it is recommended to create a separate user (Datacenter - Permissions).

Please provide the necessary access levels for both the User and the Token:

Check: ["perm","/",["Sys.Audit"]]
Check: ["perm","/storage",["Datastore.Audit"]]
Check: ["perm","/vms",["VM.Audit"]]

Copy the resulting Token ID and Secret into the host macros {$PVE.TOKEN.ID} and {$PVE.TOKEN.SECRET}.
Set the hostname or IP address of the Proxmox API VE host in the {$PVE.URL.HOST} macro. You can also change the API port in the {$PVE.URL.PORT} macro if necessary.

Macros used

Name	Description	Default
{$PVE.URL.HOST}	The hostname or IP address of the Proxmox VE API host.	`<SET PVE HOST>`
{$PVE.URL.PORT}	The API uses the HTTPS protocol and the server listens to port 8006 by default.	`8006`
{$PVE.TOKEN.ID}	API tokens allow stateless access to most parts of the REST API by another system, software or API client.	`USER@REALM!TOKENID`
{$PVE.TOKEN.SECRET}	Secret key.	`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`
{$PVE.ROOT.PUSE.MAX.WARN}	Maximum used root space in percentage.	`90`
{$PVE.MEMORY.PUSE.MAX.WARN}	Maximum used memory in percentage.	`90`
{$PVE.CPU.PUSE.MAX.WARN}	Maximum used CPU in percentage.	`90`
{$PVE.SWAP.PUSE.MAX.WARN}	Maximum used swap space in percentage.	`90`
{$PVE.VM.MEMORY.PUSE.MAX.WARN}	Maximum used memory in percentage.	`90`
{$PVE.VM.CPU.PUSE.MAX.WARN}	Maximum used CPU in percentage.	`90`
{$PVE.LXC.MEMORY.PUSE.MAX.WARN}	Maximum used memory in percentage.	`90`
{$PVE.LXC.CPU.PUSE.MAX.WARN}	Maximum used CPU in percentage.	`90`
{$PVE.LXC.DISK.PUSE.MAX.WARN}	Maximum used disk in percentage.	`90`
{$PVE.STORAGE.PUSE.MAX.WARN}	Maximum used storage space in percentage.	`90`

Items

Name Description Type Key and additional info

Get cluster resources

Resources index.

HTTP agent

proxmox.cluster.resources

Preprocessing

Check for not supported value: any error
⛔️Custom on fail: Set value to: Error getting data

Get cluster status

Get cluster status information.

HTTP agent

proxmox.cluster.status

Preprocessing

Check for not supported value: any error
⛔️Custom on fail: Set value to: Error getting data

API service status

Get API service status.

Script

proxmox.api.available

Preprocessing

Discard unchanged with heartbeat: 12h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Proxmox VE: API service not available	The API service is not available. Check your network and authorization settings.	`last(/Proxmox VE by HTTP/proxmox.api.available) <> 200`\|High

LLD rule Cluster discovery

Name	Description	Type	Key and additional info
Cluster discovery		Dependent item	proxmox.cluster.discovery

Item prototypes for Cluster discovery

Name Description Type Key and additional info

Cluster [{#RESOURCE.NAME}]: Quorate

Indicates if there is a majority of nodes online to make decisions.

Dependent item

proxmox.cluster.quorate[{#RESOURCE.NAME}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 10m

Trigger prototypes for Cluster discovery

Name	Description	Expression	Severity	Dependencies and additional info
Proxmox VE: Cluster [{#RESOURCE.NAME}] not quorum	Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes.	`last(/Proxmox VE by HTTP/proxmox.cluster.quorate[{#RESOURCE.NAME}]) <> 1`\|High

LLD rule Node discovery

Name	Description	Type	Key and additional info
Node discovery		Dependent item	proxmox.node.discovery

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Node [{#NODE.NAME}]: Status	Indicates if the node is online or offline.	Dependent item	proxmox.node.online[{#NODE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Status	Read node status.	HTTP agent	proxmox.node.status[{#NODE.NAME}]
Node [{#NODE.NAME}]: RRD statistics	Read node RRD statistics.	HTTP agent	proxmox.node.rrd[{#NODE.NAME}] Preprocessing JavaScript: `The text is too long. Please see the template.`
Node [{#NODE.NAME}]: Time	Read server time and time zone settings.	HTTP agent	proxmox.node.time[{#NODE.NAME}]
Node [{#NODE.NAME}]: Uptime	The system uptime expressed in the following format: "N days, hh:mm:ss".	Dependent item	proxmox.node.uptime[{#NODE.NAME}] Preprocessing JSON Path: `$.data.uptime`
Node [{#NODE.NAME}]: PVE version	PVE manager version.	Dependent item	proxmox.node.pveversion[{#NODE.NAME}] Preprocessing JSON Path: `$.data.pveversion` Discard unchanged with heartbeat: `1d`
Node [{#NODE.NAME}]: Kernel version	Kernel version info.	Dependent item	proxmox.node.kernelversion[{#NODE.NAME}] Preprocessing JSON Path: `$.data.kversion` Discard unchanged with heartbeat: `1d`
Node [{#NODE.NAME}]: Root filesystem, used	Root filesystem usage.	Dependent item	proxmox.node.rootused[{#NODE.NAME}] Preprocessing JSON Path: `$.rootused` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Root filesystem, total	Root filesystem total.	Dependent item	proxmox.node.roottotal[{#NODE.NAME}] Preprocessing JSON Path: `$.roottotal` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Memory, used	Memory usage.	Dependent item	proxmox.node.memused[{#NODE.NAME}] Preprocessing JSON Path: `$.memused` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Memory, total	Memory total.	Dependent item	proxmox.node.memtotal[{#NODE.NAME}] Preprocessing JSON Path: `$.memtotal` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: CPU, usage	CPU usage.	Dependent item	proxmox.node.cpu[{#NODE.NAME}] Preprocessing JSON Path: `$.cpu` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Outgoing data, rate	Network usage.	Dependent item	proxmox.node.netout[{#NODE.NAME}] Preprocessing JSON Path: `$.netout` Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Incoming data, rate	Network usage.	Dependent item	proxmox.node.netin[{#NODE.NAME}] Preprocessing JSON Path: `$.netin` Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: CPU, loadavg	CPU average load.	Dependent item	proxmox.node.loadavg[{#NODE.NAME}] Preprocessing JSON Path: `$.loadavg` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: CPU, iowait	CPU iowait time.	Dependent item	proxmox.node.iowait[{#NODE.NAME}] Preprocessing JSON Path: `$.iowait` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Swap filesystem, total	Swap total.	Dependent item	proxmox.node.swaptotal[{#NODE.NAME}] Preprocessing JSON Path: `$.swaptotal` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Swap filesystem, used	Swap used.	Dependent item	proxmox.node.swapused[{#NODE.NAME}] Preprocessing JSON Path: `$.swapused` Discard unchanged with heartbeat: `10m`
Node [{#NODE.NAME}]: Time zone	Time zone.	Dependent item	proxmox.node.timezone[{#NODE.NAME}] Preprocessing JSON Path: `$.data.timezone` Discard unchanged with heartbeat: `12h`
Node [{#NODE.NAME}]: Localtime	Seconds since 1970-01-01 00:00:00 (local time).	Dependent item	proxmox.node.localtime[{#NODE.NAME}] Preprocessing JSON Path: `$.data.localtime`
Node [{#NODE.NAME}]: Time	Seconds since 1970-01-01 00:00:00 UTC.	Dependent item	proxmox.node.utctime[{#NODE.NAME}] Preprocessing JSON Path: `$.data.time`

Trigger prototypes for Node discovery

Name	Description	Expression	Severity
Proxmox VE: Node [{#NODE.NAME}] offline	Node offline.	`last(/Proxmox VE by HTTP/proxmox.node.online[{#NODE.NAME}]) <> 1`\|High
Proxmox VE: Node [{#NODE.NAME}] has been restarted	Uptime is less than 10 minutes.	`last(/Proxmox VE by HTTP/proxmox.node.uptime[{#NODE.NAME}])<10m`\|Info	Manual close: Yes Depends on: Proxmox VE: Node [{#NODE.NAME}] offline
Proxmox VE: Node [{#NODE.NAME}]: PVE manager has changed	Firmware version has changed. Acknowledge to close the problem manually.	`last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}]))>0`\|Info	Manual close: Yes
Proxmox VE: Node [{#NODE.NAME}]: Kernel version has changed	Firmware version has changed. Acknowledge to close the problem manually.	`last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}]))>0`\|Info	Manual close: Yes
Proxmox VE: Node [{#NODE.NAME}] high root filesystem space usage	Root filesystem space usage.	`min(/Proxmox VE by HTTP/proxmox.node.rootused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.roottotal[{#NODE.NAME}]) * 100 >{$PVE.ROOT.PUSE.MAX.WARN:"{#NODE.NAME}"}`\|Warning
Proxmox VE: Node [{#NODE.NAME}] high memory usage	Memory usage.	`min(/Proxmox VE by HTTP/proxmox.node.memused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.memtotal[{#NODE.NAME}]) * 100 >{$PVE.MEMORY.PUSE.MAX.WARN:"{#NODE.NAME}"}`\|Warning
Proxmox VE: Node [{#NODE.NAME}] high CPU usage	CPU usage.	`min(/Proxmox VE by HTTP/proxmox.node.cpu[{#NODE.NAME}],5m) > {$PVE.CPU.PUSE.MAX.WARN:"{#NODE.NAME}"}`\|Warning
Proxmox VE: Node [{#NODE.NAME}] high swap space usage	If there is no swap configured, this trigger is ignored.	`min(/Proxmox VE by HTTP/proxmox.node.swapused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) * 100 > {$PVE.SWAP.PUSE.MAX.WARN:"{#NODE.NAME}"} and last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) > 0`\|Warning

LLD rule Storage discovery

Name	Description	Type	Key and additional info
Storage discovery		Dependent item	proxmox.storage.discovery

Item prototypes for Storage discovery

Name	Description	Type	Key and additional info
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Type	More specific type, if available.	Dependent item	proxmox.node.plugintype[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Size	Storage size in bytes.	Dependent item	proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Content	Allowed storage content types.	Dependent item	proxmox.node.content[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Used	Used disk space in bytes.	Dependent item	proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`

Trigger prototypes for Storage discovery

Name	Description	Expression	Severity	Dependencies and additional info
Proxmox VE: Storage [{#NODE.NAME}/{#STORAGE.NAME}] high filesystem space usage	Root filesystem space usage.	`min(/Proxmox VE by HTTP/proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}]) * 100 >{$PVE.STORAGE.PUSE.MAX.WARN:"{#NODE.NAME}/{#STORAGE.NAME}"}`\|Warning

LLD rule QEMU discovery

Name	Description	Type	Key and additional info
QEMU discovery		Dependent item	proxmox.qemu.discovery

Item prototypes for QEMU discovery

Name	Description	Type	Key and additional info
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk write, rate	Disk write.	Dependent item	proxmox.qemu.diskwrite[{#QEMU.ID}] Preprocessing JSON Path: `$.data.diskwrite` Change per second Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk read, rate	Disk read.	Dependent item	proxmox.qemu.diskread[{#QEMU.ID}] Preprocessing JSON Path: `$.data.diskread` Change per second Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory usage	Used memory in bytes.	Dependent item	proxmox.qemu.mem[{#QEMU.ID}] Preprocessing JSON Path: `$.data.mem` Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory total	The total memory expressed in bytes.	Dependent item	proxmox.qemu.maxmem[{#QEMU.ID}] Preprocessing JSON Path: `$.data.maxmem` Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Incoming data, rate	Incoming data rate.	Dependent item	proxmox.qemu.netin[{#QEMU.ID}] Preprocessing JSON Path: `$.data.netin` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Outgoing data, rate	Outgoing data rate.	Dependent item	proxmox.qemu.netout[{#QEMU.ID}] Preprocessing JSON Path: `$.data.netout` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: CPU usage	CPU load.	Dependent item	proxmox.qemu.cpu[{#QEMU.ID}] Preprocessing JSON Path: `$.data.cpu` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`
VM [{#NODE.NAME}/{#QEMU.NAME}]: Get data	Get VM status data.	HTTP agent	proxmox.qemu.get.data[{#QEMU.ID}]
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Uptime	The system uptime expressed in the following format: "N days, hh:mm:ss".	Dependent item	proxmox.qemu.uptime[{#QEMU.ID}] Preprocessing JSON Path: `$.data.uptime`
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Status	Status of Virtual Machine.	Dependent item	proxmox.qemu.vmstatus[{#QEMU.ID}] Preprocessing JSON Path: `$.data.status`

Trigger prototypes for QEMU discovery

Name	Description	Expression	Severity
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high memory usage	Memory usage.	`min(/Proxmox VE by HTTP/proxmox.qemu.mem[{#QEMU.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.qemu.maxmem[{#QEMU.ID}]) * 100 >{$PVE.VM.MEMORY.PUSE.MAX.WARN:"{#QEMU.ID}"}`\|Warning
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high CPU usage	CPU usage.	`min(/Proxmox VE by HTTP/proxmox.qemu.cpu[{#QEMU.ID}],5m) > {$PVE.VM.CPU.PUSE.MAX.WARN:"{#QEMU.ID}"}`\|Warning
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME}] has been restarted	Uptime is less than 10 minutes.	`last(/Proxmox VE by HTTP/proxmox.qemu.uptime[{#QEMU.ID}])<10m`\|Info	Manual close: Yes Depends on: Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running	VM state is not "running".	`last(/Proxmox VE by HTTP/proxmox.qemu.vmstatus[{#QEMU.ID}])<>"running"`\|Average

LLD rule LXC discovery

Name	Description	Type	Key and additional info
LXC discovery		Dependent item	proxmox.lxc.discovery

Item prototypes for LXC discovery

Name	Description	Type	Key and additional info
LXC [{#NODE.NAME}/{#LXC.NAME}]: Get data	Get LXC status data.	HTTP agent	proxmox.lxc.get.data[{#LXC.ID}]
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Uptime	The system uptime expressed in the following format: "N days, hh:mm:ss".	Dependent item	proxmox.lxc.uptime[{#LXC.ID}] Preprocessing JSON Path: `$.data.uptime`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Status	Status of LXC container.	Dependent item	proxmox.lxc.vmstatus[{#LXC.ID}] Preprocessing JSON Path: `$.data.status`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk write, rate	Disk write.	Dependent item	proxmox.lxc.diskwrite[{#LXC.ID}] Preprocessing JSON Path: `$.data.diskwrite` Change per second Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk read, rate	Disk read.	Dependent item	proxmox.lxc.diskread[{#LXC.ID}] Preprocessing JSON Path: `$.data.diskread` Change per second Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk space total	Total disk space.	Dependent item	proxmox.lxc.maxdisk[{#LXC.ID}] Preprocessing JSON Path: `$.data.maxdisk` Discard unchanged with heartbeat: `1h`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk space usage	Disk space usage.	Dependent item	proxmox.lxc.disk[{#LXC.ID}] Preprocessing JSON Path: `$.data.disk` Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory usage	Used memory in bytes.	Dependent item	proxmox.lxc.mem[{#LXC.ID}] Preprocessing JSON Path: `$.data.mem` Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory total	The total memory expressed in bytes.	Dependent item	proxmox.lxc.maxmem[{#LXC.ID}] Preprocessing JSON Path: `$.data.maxmem` Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Incoming data, rate	Incoming data rate.	Dependent item	proxmox.lxc.netin[{#LXC.ID}] Preprocessing JSON Path: `$.data.netin` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Outgoing data, rate	Outgoing data rate.	Dependent item	proxmox.lxc.netout[{#LXC.ID}] Preprocessing JSON Path: `$.data.netout` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: CPU usage	CPU load.	Dependent item	proxmox.lxc.cpu[{#LXC.ID}] Preprocessing JSON Path: `$.data.cpu` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`

Trigger prototypes for LXC discovery

Name	Description	Expression	Severity
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME}] has been restarted	Uptime is less than 10 minutes.	`last(/Proxmox VE by HTTP/proxmox.lxc.uptime[{#LXC.ID}])<10m`\|Info	Manual close: Yes Depends on: Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running	LXC state is not "running".	`last(/Proxmox VE by HTTP/proxmox.lxc.vmstatus[{#LXC.ID}])<>"running"`\|Average
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: high disk space usage	Disk space usage.	`min(/Proxmox VE by HTTP/proxmox.lxc.disk[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxdisk[{#LXC.ID}]) * 100 > {$PVE.LXC.DISK.PUSE.MAX.WARN:"{#LXC.ID}"}`\|Warning
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high memory usage	Memory usage.	`min(/Proxmox VE by HTTP/proxmox.lxc.mem[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxmem[{#LXC.ID}]) * 100 >{$PVE.LXC.MEMORY.PUSE.MAX.WARN:"{#LXC.ID}"}`\|Warning
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high CPU usage	CPU usage.	`min(/Proxmox VE by HTTP/proxmox.lxc.cpu[{#LXC.ID}],5m) > {$PVE.LXC.CPU.PUSE.MAX.WARN:"{#LXC.ID}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

module_process

View README Download JSON

OS processes by Zabbix agent

Overview

This template is designed to monitor processes by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. For example, by specifying "zabbix" as macro value, you can monitor all zabbix processes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

CentOS Linux 8
Ubuntu 22.04.1 LTS

Configuration

Setup

Install and setup Zabbix agent.

Custom processes set in macros:

{$PROC.NAME.MATCHES}
{$PROC.NAME.NOT_MATCHES}

Macros used

Name	Description	Default
{$PROC.NAME.MATCHES}	This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level.	`<CHANGE VALUE>`
{$PROC.NAME.NOT_MATCHES}	This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level.	`<CHANGE VALUE>`

Items

Name	Description	Type	Key and additional info
Get process summary	The summary of data metrics for all processes.	Zabbix agent	proc.get[,,,summary]

LLD rule Processes discovery

Name	Description	Type	Key and additional info
Processes discovery	Discovery of OS summary processes.	Dependent item	custom.proc.discovery

Item prototypes for Processes discovery

Name	Description	Type	Key and additional info
Process [{#NAME}]: Get data	Summary metrics collected during the process {#NAME}.	Dependent item	custom.proc.get[{#NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#NAME} data`
Process [{#NAME}]: Memory usage (rss)	The summary of Resident Set Size (RSS) memory used by the process {#NAME} in bytes.	Dependent item	custom.proc.rss[{#NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Memory usage (vsize)	The summary of virtual memory used by process {#NAME} in bytes.	Dependent item	custom.proc.vmem[{#NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Memory usage, %	The percentage of real memory used by the process {#NAME}.	Dependent item	custom.proc.pmem[{#NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Number of running processes	The number of running processes {#NAME}.	Dependent item	custom.proc.num[{#NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Process [{#NAME}]: Number of threads	The number of threads {#NAME}.	Dependent item	custom.proc.thread[{#NAME}] Preprocessing JSON Path: `$.threads` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Number of page faults	The number of page faults {#NAME}.	Dependent item	custom.proc.page[{#NAME}] Preprocessing JSON Path: `$.page_faults` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Size of locked memory	The size of locked memory {#NAME}.	Dependent item	custom.proc.mem.locked[{#NAME}] Preprocessing JSON Path: `$.lck` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Swap space used	The swap space used by {#NAME}.	Dependent item	custom.proc.swap[{#NAME}] Preprocessing JSON Path: `$.swap` ⛔️Custom on fail: Discard value

Trigger prototypes for Processes discovery

Name	Description	Expression	Severity	Dependencies and additional info
OS: Process [{#NAME}]: Process is not running		`last(/OS processes by Zabbix agent/custom.proc.num[{#NAME}])=0`\|High	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_php-fpm_http

View README Download JSON

PHP-FPM by HTTP

Overview

This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template PHP-FPM by Zabbix agent - collects metrics by polling PHP-FPM status-page with HTTP agent remotely.

Note that this solution supports HTTPS and redirects.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

PHP 7
PHP 8

Configuration

Setup

Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2, systemd service name - php8.2-fpm. Adjust the following instructions accordingly if needed.

Open the PHP-FPM configuration file and enable the status page as shown.
```
pm.status_path = /status
ping.path = /ping
```
Validate the syntax to ensure it is correct before you reload the service. Replace the <version> in the command if needed.
```
$ php-fpm -t
```
or
```
$ php-fpm<version> -t
```
Reload the php-fpm service to make the change active. Replace the <version> in the command if needed.
```
$ systemctl reload php-fpm
```
or
```
$ systemctl reload php<version>-fpm
```
Next, edit the configuration of your web server.

If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.

# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;

## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4;    # your IP here
# deny all;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}

If you use Apache, edit the configuration file of the virtual host and add the following location blocks.

<LocationMatch "/status">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>

<LocationMatch "/ping">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>

Check the web server configuration syntax. The command may vary depending on the OS distribution and web server.
```
$ nginx -t
```
or
```
$ httpd -t
```
or
```
$ apachectl configtest
```

Reload the web server configuration. The command may vary depending on the OS distribution and web server.
```
$ systemctl reload nginx
```
or
```
$ systemctl reload httpd
```
or
```
$ systemctl reload apache2
```
Verify that the pages are available with these commands.
```
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
```

If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE} macro.

If you use another web server port or scheme for the location of the PHP-FPM status/ping pages, don't forget to change the macros {$PHP_FPM.SCHEME} and {$PHP_FPM.PORT}.

Macros used

Name	Description	Default
{$PHP_FPM.PORT}	The port of the PHP-FPM status host or container.	`80`
{$PHP_FPM.SCHEME}	Request scheme which may be http or https	`http`
{$PHP_FPM.HOST}	The hostname or IP address of the PHP-FPM status for a host or container.	`localhost`
{$PHP_FPM.STATUS.PAGE}	The path of the PHP-FPM status page.	`status`
{$PHP_FPM.PING.PAGE}	The path of the PHP-FPM ping page.	`ping`
{$PHP_FPM.PING.REPLY}	The expected reply to the ping.	`pong`
{$PHP_FPM.QUEUE.WARN.MAX}	The maximum percent of the PHP-FPM queue usage for a trigger expression.	`80`

Items

Name	Description	Type	Key and additional info
Get ping page		HTTP agent	php-fpm.get_ping
Get status page		HTTP agent	php-fpm.get_status
Ping		Dependent item	php-fpm.ping Preprocessing Regular expression: `{$PHP_FPM.PING.REPLY}($	\r?\n) 1`</p><p>⛔️Custom on fail: Set value to:`0`
Processes, active	The total number of active processes.	Dependent item	php-fpm.processes_active Preprocessing JSON Path: `$.['active processes']`
Version	The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers.	Dependent item	php-fpm.version Preprocessing Regular expression: `^[.\s\S]*X-Powered-By: PHP/([.\d]{1,}) \1` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Pool name	The name of the current pool.	Dependent item	php-fpm.name Preprocessing JSON Path: `$.pool` Discard unchanged with heartbeat: `3h`
Uptime	It indicates how long has this pool been running.	Dependent item	php-fpm.uptime Preprocessing JSON Path: `$.['start since']`
Start time	The time when this pool was started.	Dependent item	php-fpm.start_time Preprocessing JSON Path: `$.['start time']`
Processes, total	The total number of server processes currently running.	Dependent item	php-fpm.processes_total Preprocessing JSON Path: `$.['total processes']`
Processes, idle	The total number of idle processes.	Dependent item	php-fpm.processes_idle Preprocessing JSON Path: `$.['idle processes']`
Process manager	The method used by the process manager to control the number of child processes for this pool.	Dependent item	php-fpm.process_manager Preprocessing JSON Path: `$.['process manager']` Discard unchanged with heartbeat: `3h`
Processes, max active	The highest value of "active processes" since the PHP-FPM server was started.	Dependent item	php-fpm.processesmaxactive Preprocessing JSON Path: `$.['max active processes']`
Accepted connections per second	The number of accepted requests per second.	Dependent item	php-fpm.conn_accepted.rate Preprocessing JSON Path: `$.['accepted conn']` Change per second
Slow requests	The number of requests that has exceeded your `request_slowlog_timeout` value.	Dependent item	php-fpm.slow_requests Preprocessing JSON Path: `$.['slow requests']` Simple change
Listen queue	The current number of connections that have been initiated but not yet accepted.	Dependent item	php-fpm.listen_queue Preprocessing JSON Path: `$.['listen queue']`
Listen queue, max	The maximum number of requests in the queue of pending connections since this FPM pool was started.	Dependent item	php-fpm.listenqueuemax Preprocessing JSON Path: `$.['max listen queue']`
Listen queue, len	The size of the socket queue of pending connections.	Dependent item	php-fpm.listenqueuelen Preprocessing JSON Path: `$.['listen queue len']`
Queue usage	The utilization of the queue.	Calculated	php-fpm.listenqueueusage
Max children reached	The number of times that `pm.max_children` has been reached since the PHP-FPM pool was started.	Dependent item	php-fpm.max_children Preprocessing JSON Path: `$.['max children reached']` Simple change

Triggers

Name	Description	Expression	Severity
PHP-FPM: Service is down		`last(/PHP-FPM by HTTP/php-fpm.ping)=0 or nodata(/PHP-FPM by HTTP/php-fpm.ping,3m)=1`\|High	Manual close: Yes
PHP-FPM: Version has changed	The PHP-FPM version has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by HTTP/php-fpm.version,#1)<>last(/PHP-FPM by HTTP/php-fpm.version,#2) and length(last(/PHP-FPM by HTTP/php-fpm.version))>0`\|Info	Manual close: Yes
PHP-FPM: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/PHP-FPM by HTTP/php-fpm.uptime,30m)=1`\|Info	Manual close: Yes Depends on: PHP-FPM: Service is down
PHP-FPM: Pool has been restarted	Uptime is less than 10 minutes.	`last(/PHP-FPM by HTTP/php-fpm.uptime)<10m`\|Info	Manual close: Yes
PHP-FPM: Manager changed	The PHP-FPM manager has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by HTTP/php-fpm.process_manager,#1)<>last(/PHP-FPM by HTTP/php-fpm.process_manager,#2)`\|Info	Manual close: Yes
PHP-FPM: Detected slow requests	The PHP-FPM has detected a slow request. The slow request means that it took more time to execute than expected (defined in the configuration of your pool).	`min(/PHP-FPM by HTTP/php-fpm.slow_requests,#3)>0`\|Warning
PHP-FPM: Queue utilization is high	The queue for this pool has reached `{$PHP_FPM.QUEUE.WARN.MAX}%` of its maximum capacity. Items in the queue represent the current number of connections that have been initiated on this pool but not yet accepted.	`min(/PHP-FPM by HTTP/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_php-fpm_agent_active

View README Download JSON

PHP-FPM by Zabbix agent active

Overview

This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix agent that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template PHP-FPM by Zabbix agent - collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.

Note that this template doesn't support HTTPS and redirects (limitations of web.page.get).

It also uses Zabbix agent to collect php-fpm Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

PHP 7
PHP 8

Configuration

Setup

Open the PHP-FPM configuration file and enable the status page as shown.
```
pm.status_path = /status
ping.path = /ping
```
Validate the syntax to ensure it is correct before you reload the service. Replace the <version> in the command if needed.
```
$ php-fpm -t
```
or
```
$ php-fpm<version> -t
```
Reload the php-fpm service to make the change active. Replace the <version> in the command if needed.
```
$ systemctl reload php-fpm
```
or
```
$ systemctl reload php<version>-fpm
```
Next, edit the configuration of your web server.

If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.

# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;

## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4;    # your IP here
# deny all;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}

If you use Apache, edit the configuration file of the virtual host and add the following location blocks.

<LocationMatch "/status">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>

<LocationMatch "/ping">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>

Check the web server configuration syntax. The command may vary depending on the OS distribution and web server.
```
$ nginx -t
```
or
```
$ httpd -t
```
or
```
$ apachectl configtest
```

Reload the web server configuration. The command may vary depending on the OS distribution and web server.
```
$ systemctl reload nginx
```
or
```
$ systemctl reload httpd
```
or
```
$ systemctl reload apache2
```
Verify that the pages are available with these commands.
```
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
```

Depending on your OS distribution, the PHP-FPM process name may vary as well. Please check the actual name in the line "Name" from /proc/<pid>/status file (https://www.zabbix.com/documentation/7.0/manual/appendix/items/procmemnumnotes) and change the {$PHPFPM.PROCESS.NAME.PARAMETER} macro if needed.

If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE} macro.

If you use another web server port for the location of the PHP-FPM status/ping pages, don't forget to change the macro {$PHP_FPM.PORT}.

Macros used

Name	Description	Default
{$PHP_FPM.PORT}	The port of the PHP-FPM status host or container.	`80`
{$PHP_FPM.HOST}	The hostname or IP address of the PHP-FPM status for a host or container.	`localhost`
{$PHP_FPM.STATUS.PAGE}	The path of the PHP-FPM status page.	`status`
{$PHP_FPM.PING.PAGE}	The path of the PHP-FPM ping page.	`ping`
{$PHP_FPM.PING.REPLY}	The expected reply to the ping.	`pong`
{$PHP_FPM.QUEUE.WARN.MAX}	The maximum percent of the PHP-FPM queue usage for a trigger expression.	`80`
{$PHPFPM.PROCESSNAME}	The process name filter for the PHP-FPM process discovery. May vary depending on your OS distribution.	`php-fpm`
{$PHP_FPM.PROCESS.NAME.PARAMETER}	The process name of the PHP-FPM used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent (active)	proc.get[{$PHP_FPM.PROCESS.NAME.PARAMETER},,,summary]
php-fpm_ping		Zabbix agent (active)	web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"]
Get status page		Zabbix agent (active)	web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing Regular expression: `^[.\s\S]*({.+}) \1`
Ping		Dependent item	php-fpm.ping Preprocessing Regular expression: `{$PHP_FPM.PING.REPLY}($	\r?\n) 1`</p><p>⛔️Custom on fail: Set value to:`0`
Processes, active	The total number of active processes.	Dependent item	php-fpm.processes_active Preprocessing JSON Path: `$.['active processes']`
Version	The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers.	Dependent item	php-fpm.version Preprocessing Regular expression: `^[.\s\S]*X-Powered-By: PHP/([.\d]{1,}) \1` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Pool name	The name of the current pool.	Dependent item	php-fpm.name Preprocessing JSON Path: `$.pool` Discard unchanged with heartbeat: `3h`
Uptime	It indicates how long has this pool been running.	Dependent item	php-fpm.uptime Preprocessing JSON Path: `$.['start since']`
Start time	The time when this pool was started.	Dependent item	php-fpm.start_time Preprocessing JSON Path: `$.['start time']`
Processes, total	The total number of server processes currently running.	Dependent item	php-fpm.processes_total Preprocessing JSON Path: `$.['total processes']`
Processes, idle	The total number of idle processes.	Dependent item	php-fpm.processes_idle Preprocessing JSON Path: `$.['idle processes']`
Queue usage	The utilization of the queue.	Calculated	php-fpm.listenqueueusage
Process manager	The method used by the process manager to control the number of child processes for this pool.	Dependent item	php-fpm.process_manager Preprocessing JSON Path: `$.['process manager']` Discard unchanged with heartbeat: `3h`
Processes, max active	The highest value of "active processes" since the PHP-FPM server was started.	Dependent item	php-fpm.processesmaxactive Preprocessing JSON Path: `$.['max active processes']`
Accepted connections per second	The number of accepted requests per second.	Dependent item	php-fpm.conn_accepted.rate Preprocessing JSON Path: `$.['accepted conn']` Change per second
Slow requests	The number of requests that has exceeded your `request_slowlog_timeout` value.	Dependent item	php-fpm.slow_requests Preprocessing JSON Path: `$.['slow requests']` Simple change
Listen queue	The current number of connections that have been initiated but not yet accepted.	Dependent item	php-fpm.listen_queue Preprocessing JSON Path: `$.['listen queue']`
Listen queue, max	The maximum number of requests in the queue of pending connections since this FPM pool was started.	Dependent item	php-fpm.listenqueuemax Preprocessing JSON Path: `$.['max listen queue']`
Listen queue, len	The size of the socket queue of pending connections.	Dependent item	php-fpm.listenqueuelen Preprocessing JSON Path: `$.['listen queue len']`
Max children reached	The number of times that `pm.max_children` has been reached since the PHP-FPM pool was started.	Dependent item	php-fpm.max_children Preprocessing JSON Path: `$.['max children reached']` Simple change

Triggers

Name	Description	Expression	Severity
PHP-FPM: Version has changed	The PHP-FPM version has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by Zabbix agent active/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent active/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent active/php-fpm.version))>0`\|Info	Manual close: Yes
PHP-FPM: Pool has been restarted	Uptime is less than 10 minutes.	`last(/PHP-FPM by Zabbix agent active/php-fpm.uptime)<10m`\|Info	Manual close: Yes
PHP-FPM: Queue utilization is high	The queue for this pool has reached `{$PHP_FPM.QUEUE.WARN.MAX}%` of its maximum capacity. Items in the queue represent the current number of connections that have been initiated on this pool but not yet accepted.	`min(/PHP-FPM by Zabbix agent active/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX}`\|Warning
PHP-FPM: Manager changed	The PHP-FPM manager has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by Zabbix agent active/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent active/php-fpm.process_manager,#2)`\|Info	Manual close: Yes
PHP-FPM: Detected slow requests	The PHP-FPM has detected a slow request. The slow request means that it took more time to execute than expected (defined in the configuration of your pool).	`min(/PHP-FPM by Zabbix agent active/php-fpm.slow_requests,#3)>0`\|Warning

LLD rule PHP-FPM process discovery

Name	Description	Type	Key and additional info
PHP-FPM process discovery	The discovery of the PHP-FPM summary processes.	Dependent item	php-fpm.proc.discovery

Item prototypes for PHP-FPM process discovery

Name	Description	Type	Key and additional info
Get process data	The summary metrics aggregated by a process `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.get[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#PHP_FPM.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#PHP_FPM.NAME} data`
Memory usage (rss)	The summary of resident set size memory used by a process `{#PHP_FPM.NAME}` expressed in bytes.	Dependent item	php-fpm.proc.rss[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage (vsize)	The summary of virtual memory used by a process `{#PHP_FPM.NAME}` expressed in bytes.	Dependent item	php-fpm.proc.vmem[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.pmem[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Number of running processes	The number of running processes `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.num[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
CPU utilization	The percentage of the CPU utilization by a process `{#PHP_FPM.NAME}`.	Zabbix agent (active)	proc.cpu.util[{#PHP_FPM.NAME}]

Trigger prototypes for PHP-FPM process discovery

Name	Description	Expression	Severity
PHP-FPM: Process is not running		`last(/PHP-FPM by Zabbix agent active/php-fpm.proc.num[{#PHP_FPM.NAME}])=0`\|High
PHP-FPM: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/PHP-FPM by Zabbix agent active/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent active/php-fpm.proc.num[{#PHP_FPM.NAME}])>0`\|Info	Manual close: Yes
PHP-FPM: Service is down		`(last(/PHP-FPM by Zabbix agent active/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent active/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent active/php-fpm.proc.num[{#PHP_FPM.NAME}])>0`\|High	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_php-fpm_agent

View README Download JSON

PHP-FPM by Zabbix agent

Overview

The template PHP-FPM by Zabbix agent - collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.

Note that this template doesn't support HTTPS and redirects (limitations of web.page.get).

It also uses Zabbix agent to collect php-fpm Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

PHP 7
PHP 8

Configuration

Setup

Open the PHP-FPM configuration file and enable the status page as shown.
```
pm.status_path = /status
ping.path = /ping
```
Validate the syntax to ensure it is correct before you reload the service. Replace the <version> in the command if needed.
```
$ php-fpm -t
```
or
```
$ php-fpm<version> -t
```
Reload the php-fpm service to make the change active. Replace the <version> in the command if needed.
```
$ systemctl reload php-fpm
```
or
```
$ systemctl reload php<version>-fpm
```
Next, edit the configuration of your web server.

If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.

# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;

## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4;    # your IP here
# deny all;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}

If you use Apache, edit the configuration file of the virtual host and add the following location blocks.

<LocationMatch "/status">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>

<LocationMatch "/ping">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>

Check the web server configuration syntax. The command may vary depending on the OS distribution and web server.
```
$ nginx -t
```
or
```
$ httpd -t
```
or
```
$ apachectl configtest
```

Reload the web server configuration. The command may vary depending on the OS distribution and web server.
```
$ systemctl reload nginx
```
or
```
$ systemctl reload httpd
```
or
```
$ systemctl reload apache2
```
Verify that the pages are available with these commands.
```
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
```

If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE} macro.

If you use another web server port for the location of the PHP-FPM status/ping pages, don't forget to change the macro {$PHP_FPM.PORT}.

Macros used

Name	Description	Default
{$PHP_FPM.PORT}	The port of the PHP-FPM status host or container.	`80`
{$PHP_FPM.HOST}	The hostname or IP address of the PHP-FPM status for a host or container.	`localhost`
{$PHP_FPM.STATUS.PAGE}	The path of the PHP-FPM status page.	`status`
{$PHP_FPM.PING.PAGE}	The path of the PHP-FPM ping page.	`ping`
{$PHP_FPM.PING.REPLY}	The expected reply to the ping.	`pong`
{$PHP_FPM.QUEUE.WARN.MAX}	The maximum percent of the PHP-FPM queue usage for a trigger expression.	`80`
{$PHPFPM.PROCESSNAME}	The process name filter for the PHP-FPM process discovery. May vary depending on your OS distribution.	`php-fpm`
{$PHP_FPM.PROCESS.NAME.PARAMETER}	The process name of the PHP-FPM used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$PHP_FPM.PROCESS.NAME.PARAMETER},,,summary]
php-fpm_ping		Zabbix agent	web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"]
Get status page		Zabbix agent	web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing Regular expression: `^[.\s\S]*({.+}) \1`
Ping		Dependent item	php-fpm.ping Preprocessing Regular expression: `{$PHP_FPM.PING.REPLY}($	\r?\n) 1`</p><p>⛔️Custom on fail: Set value to:`0`
Processes, active	The total number of active processes.	Dependent item	php-fpm.processes_active Preprocessing JSON Path: `$.['active processes']`
Version	The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers.	Dependent item	php-fpm.version Preprocessing Regular expression: `^[.\s\S]*X-Powered-By: PHP/([.\d]{1,}) \1` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Pool name	The name of the current pool.	Dependent item	php-fpm.name Preprocessing JSON Path: `$.pool` Discard unchanged with heartbeat: `3h`
Uptime	It indicates how long has this pool been running.	Dependent item	php-fpm.uptime Preprocessing JSON Path: `$.['start since']`
Start time	The time when this pool was started.	Dependent item	php-fpm.start_time Preprocessing JSON Path: `$.['start time']`
Processes, total	The total number of server processes currently running.	Dependent item	php-fpm.processes_total Preprocessing JSON Path: `$.['total processes']`
Processes, idle	The total number of idle processes.	Dependent item	php-fpm.processes_idle Preprocessing JSON Path: `$.['idle processes']`
Queue usage	The utilization of the queue.	Calculated	php-fpm.listenqueueusage
Process manager	The method used by the process manager to control the number of child processes for this pool.	Dependent item	php-fpm.process_manager Preprocessing JSON Path: `$.['process manager']` Discard unchanged with heartbeat: `3h`
Processes, max active	The highest value of "active processes" since the PHP-FPM server was started.	Dependent item	php-fpm.processesmaxactive Preprocessing JSON Path: `$.['max active processes']`
Accepted connections per second	The number of accepted requests per second.	Dependent item	php-fpm.conn_accepted.rate Preprocessing JSON Path: `$.['accepted conn']` Change per second
Slow requests	The number of requests that has exceeded your `request_slowlog_timeout` value.	Dependent item	php-fpm.slow_requests Preprocessing JSON Path: `$.['slow requests']` Simple change
Listen queue	The current number of connections that have been initiated but not yet accepted.	Dependent item	php-fpm.listen_queue Preprocessing JSON Path: `$.['listen queue']`
Listen queue, max	The maximum number of requests in the queue of pending connections since this FPM pool was started.	Dependent item	php-fpm.listenqueuemax Preprocessing JSON Path: `$.['max listen queue']`
Listen queue, len	The size of the socket queue of pending connections.	Dependent item	php-fpm.listenqueuelen Preprocessing JSON Path: `$.['listen queue len']`
Max children reached	The number of times that `pm.max_children` has been reached since the PHP-FPM pool was started.	Dependent item	php-fpm.max_children Preprocessing JSON Path: `$.['max children reached']` Simple change

Triggers

Name	Description	Expression	Severity
PHP-FPM: Version has changed	The PHP-FPM version has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by Zabbix agent/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent/php-fpm.version))>0`\|Info	Manual close: Yes
PHP-FPM: Pool has been restarted	Uptime is less than 10 minutes.	`last(/PHP-FPM by Zabbix agent/php-fpm.uptime)<10m`\|Info	Manual close: Yes
PHP-FPM: Queue utilization is high	The queue for this pool has reached `{$PHP_FPM.QUEUE.WARN.MAX}%` of its maximum capacity. Items in the queue represent the current number of connections that have been initiated on this pool but not yet accepted.	`min(/PHP-FPM by Zabbix agent/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX}`\|Warning
PHP-FPM: Manager changed	The PHP-FPM manager has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#2)`\|Info	Manual close: Yes
PHP-FPM: Detected slow requests	The PHP-FPM has detected a slow request. The slow request means that it took more time to execute than expected (defined in the configuration of your pool).	`min(/PHP-FPM by Zabbix agent/php-fpm.slow_requests,#3)>0`\|Warning

LLD rule PHP-FPM process discovery

Name	Description	Type	Key and additional info
PHP-FPM process discovery	The discovery of the PHP-FPM summary processes.	Dependent item	php-fpm.proc.discovery

Item prototypes for PHP-FPM process discovery

Name	Description	Type	Key and additional info
Get process data	The summary metrics aggregated by a process `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.get[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#PHP_FPM.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#PHP_FPM.NAME} data`
Memory usage (rss)	The summary of resident set size memory used by a process `{#PHP_FPM.NAME}` expressed in bytes.	Dependent item	php-fpm.proc.rss[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage (vsize)	The summary of virtual memory used by a process `{#PHP_FPM.NAME}` expressed in bytes.	Dependent item	php-fpm.proc.vmem[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.pmem[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Number of running processes	The number of running processes `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.num[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
CPU utilization	The percentage of the CPU utilization by a process `{#PHP_FPM.NAME}`.	Zabbix agent	proc.cpu.util[{#PHP_FPM.NAME}]

Trigger prototypes for PHP-FPM process discovery

Name	Description	Expression	Severity
PHP-FPM: Process is not running		`last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])=0`\|High
PHP-FPM: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/PHP-FPM by Zabbix agent/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0`\|Info	Manual close: Yes
PHP-FPM: Service is down		`(last(/PHP-FPM by Zabbix agent/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0`\|High	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_pfsense_snmp

View README Download JSON

PFSense by SNMP

Overview

Template for monitoring pfSense by SNMP

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

pfSense 2.5.0, 2.5.1, 2.5.2

Configuration

Setup

Import template into Zabbix
Enable SNMP daemon at Services in pfSense web interface https://docs.netgate.com/pfsense/en/latest/services/snmp.html
Setup firewall rule to get access from Zabbix proxy or Zabbix server by SNMP https://docs.netgate.com/pfsense/en/latest/firewall/index.html#managing-firewall-rules
Link template to the host

Macros used

Name	Description	Default
{$IF.ERRORS.WARN}	Threshold of error packets rate for warning trigger. Can be used with interface name as context.	`2`
{$IF.UTIL.MAX}	Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context.	`90`
{$IFCONTROL}	Macro for operational state of the interface for link down trigger. Can be used with interface name as context.	`1`
{$NET.IF.IFADMINSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*`
{$NET.IF.IFADMINSTATUS.NOT_MATCHES}	Ignore down(2) administrative status.	`^2$`
{$NET.IF.IFALIAS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFALIAS.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFDESCR.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFDESCR.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFNAME.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`(^pflog[0-9.]$\|^pfsync[0-9.]$)`
{$NET.IF.IFOPERSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*$`
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}	Ignore notPresent(6).	`^6$`
{$NET.IF.IFTYPE.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFTYPE.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$SNMP.TIMEOUT}	The time interval for SNMP availability trigger.	`5m`
{$STATE.TABLE.UTIL.MAX}	Threshold of state table utilization trigger in %.	`90`
{$SOURCE.TRACKING.TABLE.UTIL.MAX}	Threshold of source tracking table utilization trigger in %.	`90`

Items

Name	Description	Type	Key and additional info
SNMP agent availability	Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - not available 1 - available 2 - unknown	Zabbix internal	zabbix[host,snmp,available]
Packet filter running status	MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled.	SNMP agent	pfsense.pf.status
States table current	MIB: BEGEMOT-PF-MIB Number of entries in the state table.	SNMP agent	pfsense.state.table.count
States table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset.	SNMP agent	pfsense.state.table.limit
States table utilization in %	Utilization of state table in %.	Calculated	pfsense.state.table.pused
Source tracking table current	MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table.	SNMP agent	pfsense.source.tracking.table.count
Source tracking table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset.	SNMP agent	pfsense.source.tracking.table.limit
Source tracking table utilization in %	Utilization of source tracking table in %.	Calculated	pfsense.source.tracking.table.pused
DHCP server status	MIB: HOST-RESOURCES-MIB The status of DHCP server process.	SNMP agent	pfsense.dhcpd.status Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0`
DNS server status	MIB: HOST-RESOURCES-MIB The status of DNS server process.	SNMP agent	pfsense.dns.status Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0`
State of nginx process	MIB: HOST-RESOURCES-MIB The status of nginx process.	SNMP agent	pfsense.nginx.status Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0`
Packets matched a filter rule	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.match Preprocessing Change per second
Packets with bad offset	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.bad.offset Preprocessing Change per second
Fragmented packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.fragment Preprocessing Change per second
Short packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.short Preprocessing Change per second
Normalized packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.normalize Preprocessing Change per second
Packets dropped due to memory limitation	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.mem.drop Preprocessing Change per second
Firewall rules count	MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system.	SNMP agent	pfsense.rules.count

Triggers

Name	Description	Expression
PFSense: No SNMP data collection	SNMP is not available for polling. Please check device connectivity and SNMP settings.	`max(/PFSense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0`\|Warning
PFSense: Packet filter is not running	Please check PF status.	`last(/PFSense by SNMP/pfsense.pf.status)<>1`\|High
PFSense: State table usage is high	Please check the number of connections https://docs.netgate.com/pfsense/en/latest/config/advanced-firewall-nat.html#config-advanced-firewall-maxstates	`min(/PFSense by SNMP/pfsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX}`\|Warning
PFSense: Source tracking table usage is high	Please check the number of sticky connections https://docs.netgate.com/pfsense/en/latest/monitoring/status/firewall-states-sources.html	`min(/PFSense by SNMP/pfsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX}`\|Warning
PFSense: DHCP server is not running	Please check DHCP server settings https://docs.netgate.com/pfsense/en/latest/services/dhcp/index.html	`last(/PFSense by SNMP/pfsense.dhcpd.status)=0`\|Average
PFSense: DNS server is not running	Please check DNS server settings https://docs.netgate.com/pfsense/en/latest/services/dns/index.html	`last(/PFSense by SNMP/pfsense.dns.status)=0`\|Average
PFSense: Web server is not running	Please check nginx service status.	`last(/PFSense by SNMP/pfsense.nginx.status)=0`\|Average

LLD rule Network interfaces discovery

Name	Description	Type	Key and additional info
Network interfaces discovery	Discovering interfaces from IF-MIB.	SNMP agent	pfsense.net.if.discovery

Item prototypes for Network interfaces discovery

Name	Description	Type	Key and additional info
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded	MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.discards[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.errors[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Bits received	MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded	MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.discards[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.errors[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Bits sent	MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Speed	MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of `n' then the speed of the interface is somewhere in the range of`n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero.	SNMP agent	net.if.speed[{#SNMPINDEX}] Preprocessing Custom multiplier: `1000000` Discard unchanged with heartbeat: `1h`
Interface [{#IFNAME}({#IFALIAS})]: Operational status	MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components.	SNMP agent	net.if.status[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
Interface [{#IFNAME}({#IFALIAS})]: Interface type	MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention.	SNMP agent	net.if.type[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
Interface [{#IFNAME}({#IFALIAS})]: Rules references count	MIB: BEGEMOT-PF-MIB The number of rules referencing this interface.	SNMP agent	net.if.rules.refs[{#SNMPINDEX}]
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second

Trigger prototypes for Network interfaces discovery

Name	Description	Expression	Severity
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/PFSense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/PFSense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/PFSense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/PFSense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before	This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.	change(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])<>2)\|Info	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down	This trigger expression works as follows: 1. It can be triggered if the operations status is down. 2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.	`{$IFCONTROL:"{#IFNAME}"}=1 and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])=2)`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_opnsense_snmp

View README Download JSON

OPNsense by SNMP

Overview

Template for monitoring OPNsense by SNMP

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

OPNsense 22.1.9

Configuration

Setup

Enable bsnmpd daemon by creating new config file "/etc/rc.conf.d/bsnmpd" with the following content: bsnmpd_enable="YES"
Uncomment the following lines in "/etc/snmpd.config" file to enable required SNMP modules: begemotSnmpdModulePath."hostres" = "/usr/lib/snmphostres.so" begemotSnmpdModulePath."pf" = "/usr/lib/snmppf.so"
Start bsnmpd daemon with the following command: /etc/rc.d/bsnmpd start
Setup a firewall rule to get access from Zabbix proxy or Zabbix server by SNMP (https://docs.opnsense.org/manual/firewall.html).
Link the template to a host.

Macros used

Name	Description	Default
{$IF.ERRORS.WARN}	Threshold of error packets rate for warning trigger. Can be used with interface name as context.	`2`
{$IF.UTIL.MAX}	Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context.	`90`
{$IFCONTROL}	Macro for operational state of the interface for link down trigger. Can be used with interface name as context.	`1`
{$NET.IF.IFADMINSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*`
{$NET.IF.IFADMINSTATUS.NOT_MATCHES}	Ignore down(2) administrative status.	`^2$`
{$NET.IF.IFALIAS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFALIAS.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFDESCR.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFDESCR.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFNAME.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`(^pflog[0-9.]$\|^pfsync[0-9.]$)`
{$NET.IF.IFOPERSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*$`
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}	Ignore notPresent(6).	`^6$`
{$NET.IF.IFTYPE.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFTYPE.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$SNMP.TIMEOUT}	The time interval for SNMP availability trigger.	`5m`
{$STATE.TABLE.UTIL.MAX}	Threshold of state table utilization trigger in %.	`90`
{$SOURCE.TRACKING.TABLE.UTIL.MAX}	Threshold of source tracking table utilization trigger in %.	`90`

Items

Name	Description	Type	Key and additional info
SNMP agent availability	Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - not available 1 - available 2 - unknown	Zabbix internal	zabbix[host,snmp,available]
Packet filter running status	MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled.	SNMP agent	opnsense.pf.status
States table current	MIB: BEGEMOT-PF-MIB Number of entries in the state table.	SNMP agent	opnsense.state.table.count
States table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset.	SNMP agent	opnsense.state.table.limit
States table utilization in %	Utilization of state table in %.	Calculated	opnsense.state.table.pused
Source tracking table current	MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table.	SNMP agent	opnsense.source.tracking.table.count
Source tracking table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset.	SNMP agent	opnsense.source.tracking.table.limit
Source tracking table utilization in %	Utilization of source tracking table in %.	Calculated	opnsense.source.tracking.table.pused
DHCP server status	MIB: HOST-RESOURCES-MIB The status of DHCP server process.	SNMP agent	opnsense.dhcpd.status Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0`
DNS server status	MIB: HOST-RESOURCES-MIB The status of DNS server process.	SNMP agent	opnsense.dns.status Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0`
Web server status	MIB: HOST-RESOURCES-MIB The status of lighttpd process.	SNMP agent	opnsense.lighttpd.status Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0`
Packets matched a filter rule	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.match Preprocessing Change per second
Packets with bad offset	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.bad.offset Preprocessing Change per second
Fragmented packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.fragment Preprocessing Change per second
Short packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.short Preprocessing Change per second
Normalized packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.normalize Preprocessing Change per second
Packets dropped due to memory limitation	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.mem.drop Preprocessing Change per second
Firewall rules count	MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system.	SNMP agent	opnsense.rules.count

Triggers

Name	Description	Expression
OPNsense: No SNMP data collection	SNMP is not available for polling. Please check device connectivity and SNMP settings.	`max(/OPNsense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0`\|Warning
OPNsense: Packet filter is not running	Please check PF status.	`last(/OPNsense by SNMP/opnsense.pf.status)<>1`\|High
OPNsense: State table usage is high	Please check the number of connections.	`min(/OPNsense by SNMP/opnsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX}`\|Warning
OPNsense: Source tracking table usage is high	Please check the number of sticky connections.	`min(/OPNsense by SNMP/opnsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX}`\|Warning
OPNsense: DHCP server is not running	Please check DHCP server settings.	`last(/OPNsense by SNMP/opnsense.dhcpd.status)=0`\|Average
OPNsense: DNS server is not running	Please check DNS server settings.	`last(/OPNsense by SNMP/opnsense.dns.status)=0`\|Average
OPNsense: Web server is not running	Please check lighttpd service status.	`last(/OPNsense by SNMP/opnsense.lighttpd.status)=0`\|Average

LLD rule Network interfaces discovery

Name	Description	Type	Key and additional info
Network interfaces discovery	Discovering interfaces from IF-MIB.	SNMP agent	opnsense.net.if.discovery

Item prototypes for Network interfaces discovery

Name	Description	Type	Key and additional info
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded	MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.discards[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.errors[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Bits received	MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded	MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.discards[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.errors[{#SNMPINDEX}] Preprocessing Change per second:
Interface [{#IFNAME}({#IFALIAS})]: Bits sent	MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Speed	MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of `n' then the speed of the interface is somewhere in the range of`n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero.	SNMP agent	net.if.speed[{#SNMPINDEX}] Preprocessing Custom multiplier: `1000000` Discard unchanged with heartbeat: `1h`
Interface [{#IFNAME}({#IFALIAS})]: Operational status	MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components.	SNMP agent	net.if.status[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
Interface [{#IFNAME}({#IFALIAS})]: Interface type	MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention.	SNMP agent	net.if.type[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
Interface [{#IFNAME}({#IFALIAS})]: Rules references count	MIB: BEGEMOT-PF-MIB The number of rules referencing this interface.	SNMP agent	net.if.rules.refs[{#SNMPINDEX}]
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second

Trigger prototypes for Network interfaces discovery

Name	Description	Expression	Severity
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/OPNsense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/OPNsense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/OPNsense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/OPNsense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before	This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.	change(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])<>2)\|Info	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down	This trigger expression works as follows: 1. It can be triggered if the operations status is down. 2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.	`{$IFCONTROL:"{#IFNAME}"}=1 and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])=2)`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_openweathermap_http

View README Download JSON

OpenWeatherMap by HTTP

Overview

This template is designed for the effortless deployment of OpenWeatherMap monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

OpenWeatherMap API

Configuration

Setup

Create a host.
Link the template to the host.
Customize the values of {$OPENWEATHERMAP.API.TOKEN} and {$LOCATION} macros.
OpenWeatherMap API Tokens are available in your OpenWeatherMap account https://home.openweathermap.org/api_keys.
Locations can be set by few ways:
- by geo coordinates (for example: 56.95,24.0833)
- by location name (for example: Riga)
- by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz
- by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by | delimiter. For example: 43.81821,7.76115|Riga|2643743|94040,us. Please note that API requests by city name, zip-codes and city id will be deprecated soon.
Language and units macros can be customized too if necessary. List of available languages: https://openweathermap.org/current#multi. Available units of measurement are: standard, metric and imperial https://openweathermap.org/current#data.

Macros used

Name	Description	Default
{$OPENWEATHERMAP.API.TOKEN}	Specify openweathermap API key.
{$LANG}	List of available languages https://openweathermap.org/current#multi.	`en`
{$LOCATION}	Locations can be set by few ways: 1. by geo coordinates (for example: 56.95,24.0833) 2. by location name (for example: Riga) 3. by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz 4. by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by `\|` delimiter. For example: `43.81821,7.76115\|Riga\|2643743\|94040,us`. Please note that API requests by city name, zip-codes and city id will be deprecated soon.	`Riga`
{$OPENWEATHERMAP.API.ENDPOINT}	OpenWeatherMap API endpoint.	`api.openweathermap.org/data/2.5/weather?`
{$UNITS}	Available units of measurement are standard, metric and imperial https://openweathermap.org/current#data.	`metric`
{$TEMP.CRIT.HIGH}	Threshold for high temperature trigger.	`30`
{$TEMP.CRIT.LOW}	Threshold for low temperature trigger.	`-20`

Items

Name Description Type Key and additional info

Get data

JSON array with result of OpenWeatherMap API requests.

Script

openweathermap.get.data

Get data collection errors

Errors from get data requests by script item.

Dependent item

openweathermap.get.errors

Preprocessing

JSON Path: $.errors
Discard unchanged with heartbeat: 1h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
OpenWeatherMap: There are errors in requests to OpenWeatherMap API	Zabbix has received errors in requests to OpenWeatherMap API.	`length(last(/OpenWeatherMap by HTTP/openweathermap.get.errors))>0`\|Average	Manual close: Yes

LLD rule Locations discovery

Name Description Type Key and additional info

Locations discovery

Weather metrics discovery by location.

Dependent item

openweathermap.locations.discovery

Preprocessing

JSON Path: $.data
Does not match regular expression: \[\]
⛔️Custom on fail: Set error to: Failed to receive data about required locations from API
Discard unchanged with heartbeat: 1h

Item prototypes for Locations discovery

Name	Description	Type	Key and additional info
[{#LOCATION}, {#COUNTRY}]: Data	JSON with result of OpenWeatherMap API request by location.	Dependent item	openweathermap.location.data[{#ID}] Preprocessing JSON Path: `$.data.[?(@.id=='{#ID}')].first()`
[{#LOCATION}, {#COUNTRY}]: Atmospheric pressure	Atmospheric pressure in Pa.	Dependent item	openweathermap.pressure[{#ID}] Preprocessing JSON Path: `$.main.pressure` Custom multiplier: `100` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Cloudiness	Cloudiness in %.	Dependent item	openweathermap.clouds[{#ID}] Preprocessing JSON Path: `$.clouds.all` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Humidity	Humidity in %.	Dependent item	openweathermap.humidity[{#ID}] Preprocessing JSON Path: `$.main.humidity` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Rain volume for the last one hour	Rain volume for the lat one hour in m.	Dependent item	openweathermap.rain[{#ID}] Preprocessing JSON Path: `$.rain.1h` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Short weather status	Short weather status description.	Dependent item	openweathermap.description[{#ID}] Preprocessing JSON Path: `$.weather..description.first()` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Snow volume for the last one hour	Snow volume for the lat one hour in m.	Dependent item	openweathermap.snow[{#ID}] Preprocessing JSON Path: `$.snow.1h` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Temperature	Atmospheric temperature value.	Dependent item	openweathermap.temp[{#ID}] Preprocessing JSON Path: `$.main.temp` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Visibility	Visibility in m.	Dependent item	openweathermap.visibility[{#ID}] Preprocessing JSON Path: `$.visibility` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Wind direction	Wind direction in degrees.	Dependent item	openweathermap.wind.direction[{#ID}] Preprocessing JSON Path: `$.wind.deg` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Wind speed	Wind speed value.	Dependent item	openweathermap.wind.speed[{#ID}] Preprocessing JSON Path: `$.wind.speed` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Locations discovery

Name	Description	Expression	Severity	Dependencies and additional info
OpenWeatherMap: [{#LOCATION}, {#COUNTRY}]: Temperature is too high	Temperature value is too high.	`min(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)>{$TEMP.CRIT.HIGH}`\|Average	Manual close: Yes
OpenWeatherMap: [{#LOCATION}, {#COUNTRY}]: Temperature is too low	Temperature value is too low.	`max(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)<{$TEMP.CRIT.LOW}`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nutanix_prism_element_http

View README Download JSON

Nutanix Prism Element by HTTP

Overview

This template is designed for the effortless deployment of Nutanix Prism Element monitoring and doesn't require any external scripts.

The templates "Nutanix Host Prism Element by HTTP" and "Nutanix Cluster Prism Element by HTTP" can be used in discovery, as well as manually linked to a host.

More details can be found in the official documentation:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nutanix Prism Element 6.5.5.7 (API v2.0)

Configuration

Setup

Create a new Nutanix user and add the role "Viewer"
Create a new host
Link the template to host created earlier
Set the host macros (on the host or template level) required for getting data:

{$NUTANIX.PRISM.ELEMENT.IP}
{$NUTANIX.PRISM.ELEMENT.PORT}

Set the host macros (on the host or template level) with the login and password of the Nutanix user created earlier:

{$NUTANIX.USER}
{$NUTANIX.PASSWORD}

Macros used

Name	Description	Default
{$NUTANIX.PRISM.ELEMENT.IP}	Set the Nutanix API IP here.	`<Put your IP here>`
{$NUTANIX.PRISM.ELEMENT.PORT}	Set the Nutanix API port here.	`9440`
{$NUTANIX.USER}	Nutanix API username.	`<Put your API username here>`
{$NUTANIX.PASSWORD}	Nutanix API password.	`<Put your API password here>`
{$NUTANIX.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$NUTANIX.CLUSTER.DISCOVERY.NAME.MATCHES}	Filter of discoverable Nutanix clusters by name.	`.*`
{$NUTANIX.CLUSTER.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered Nutanix clusters by name.	`CHANGE_IF_NEEDED`
{$NUTANIX.HOST.DISCOVERY.NAME.MATCHES}	Filter of discoverable Nutanix hosts by name.	`.*`
{$NUTANIX.HOST.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered Nutanix hosts by name.	`CHANGE_IF_NEEDED`
{$NUTANIX.STORAGE.CONTAINER.DISCOVERY.NAME.MATCHES}	Filter of discoverable storage containers by name.	`.*`
{$NUTANIX.STORAGE.CONTAINER.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered storage containers by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get cluster	Get the available clusters.	Script	nutanix.cluster.get
Get cluster check	Data collection check. Check the latest values for details.	Dependent item	nutanix.cluster.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Get host	Get the available hosts.	Script	nutanix.host.get
Get host check	Data collection check. Check the latest values for details.	Dependent item	nutanix.host.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Get storage container	Get the available storage containers.	Script	nutanix.storage.container.get
Get storage container check	Data collection check. Check the latest values for details.	Dependent item	nutanix.storage.container.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`

Triggers

Name	Description	Expression
Nutanix: Failed to get cluster data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Prism Element by HTTP/nutanix.cluster.get.check))>0`\|High
Nutanix: Failed to get host data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Prism Element by HTTP/nutanix.host.get.check))>0`\|High
Nutanix: Failed to get storage container data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Prism Element by HTTP/nutanix.storage.container.get.check))>0`\|High

LLD rule Cluster discovery

Name Description Type Key and additional info

Cluster discovery

Discovery of all clusters.

Dependent item

nutanix.cluster.discovery

Preprocessing

JSON Path: $.entities
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

LLD rule Host discovery

Name Description Type Key and additional info

Host discovery

Discovery of all hosts.

Dependent item

nutanix.host.discovery

Preprocessing

JSON Path: $.entities
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

LLD rule Storage container discovery

Name Description Type Key and additional info

Storage container discovery

Discovery of all storage containers.

Dependent item

nutanix.storage.container.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Storage container discovery

Name	Description	Type	Key and additional info
Container [{#STORAGE.CONTAINER.NAME}]: Space: Total, bytes	The total space of the storage container.	Dependent item	nutanix.storage.container.capacity.bytes["{#STORAGE.CONTAINER.UUID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Container [{#STORAGE.CONTAINER.NAME}]: Space: Free, bytes	The free space of the storage container.	Dependent item	nutanix.storage.container.free.bytes["{#STORAGE.CONTAINER.UUID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Container [{#STORAGE.CONTAINER.NAME}]: Replication factor	The replication factor of the storage container.	Dependent item	nutanix.storage.container.replication.factor["{#STORAGE.CONTAINER.UUID}"] Preprocessing JSON Path: `$.['{#STORAGE.CONTAINER.UUID}'].replication_factor` Discard unchanged with heartbeat: `1h`
Container [{#STORAGE.CONTAINER.NAME}]: Space: Used, bytes	The used space of the storage container.	Dependent item	nutanix.storage.container.usage.bytes["{#STORAGE.CONTAINER.UUID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Nutanix Cluster Prism Element by HTTP

Overview

This template is designed for the effortless deployment of Nutanix Cluster Prism Element monitoring and doesn't require any external scripts.

This template can be used in discovery, as well as manually linked to a host - to do so, attach it to the host and manually set the value of the {$NUTANIX.CLUSTER.UUID} macro.

More details can be found in the official documentation:

on retrieving UUIDs;
on the Nutanix Prism Element REST API;
on differences between Nutanix API versions.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nutanix Prism Element 6.5.5.7 (API v2.0)

Configuration

Setup

Create a new Nutanix user and add the role "Viewer"
Create a new host
Link the template to the host created earlier
Set the host macros (on the host or template level) required for getting data:

{$NUTANIX.PRISM.ELEMENT.IP}
{$NUTANIX.PRISM.ELEMENT.PORT}

Set the host macros (on the host or template level) with the login and password of the Nutanix user created earlier:

{$NUTANIX.USER}
{$NUTANIX.PASSWORD}

Set the host macros (on the host or template level) with the UUID of the Nutanix Cluster:

{$NUTANIX.CLUSTER.UUID}

Macros used

Name	Description	Default
{$NUTANIX.PRISM.ELEMENT.IP}	Set the Nutanix API IP here.	`<Put your IP here>`
{$NUTANIX.PRISM.ELEMENT.PORT}	Set the Nutanix API port here.	`9440`
{$NUTANIX.USER}	Nutanix API username.	`<Put your API username here>`
{$NUTANIX.PASSWORD}	Nutanix API password.	`<Put your API password here>`
{$NUTANIX.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$NUTANIX.CLUSTER.UUID}	UUID of the cluster.
{$NUTANIX.TIMEOUT}	API response timeout.	`10s`
{$NUTANIX.ALERT.DISCOVERY.NAME.MATCHES}	Filter of discoverable Nutanix alerts by name.	`.*`
{$NUTANIX.ALERT.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered Nutanix alerts by name.	`CHANGE_IF_NEEDED`
{$NUTANIX.ALERT.DISCOVERY.STATE.MATCHES}	Filter to exclude discovered Nutanix alerts by state. Set "1" for filtering only problem alerts or "0" for resolved ones.	`.*`
{$NUTANIX.ALERT.DISCOVERY.SEVERITY.MATCHES}	Filter to exclude discovered Nutanix alerts by severity. Set all possible severities for filtering in the range 0-2. "0" - Info, "1" - Warning, "2" - Critical.	`.*`

Items

Name	Description	Type	Key and additional info
Get metric	Get data about basic metrics.	Script	nutanix.cluster.metric.get
Get metric check	Data collection check. Check the latest values for details.	Dependent item	nutanix.cluster.metric.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Get alert	Get data about alerts.	Script	nutanix.cluster.alert.get
Get alert check	Data collection check. Check the latest values for details.	Dependent item	nutanix.cluster.alert.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Content Cache: Hit rate, %	Content cache hits over all lookups.	Dependent item	nutanix.cluster.content.cache.hit.percent Preprocessing JSON Path: `$.stats.content_cache_hit_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Content Cache: Logical memory usage, bytes	Logical memory used to cache data without deduplication in bytes.	Dependent item	nutanix.cluster.content.cache.logical.memory.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_logical_memory_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Logical saved memory usage, bytes	Memory saved due to content cache deduplication in bytes.	Dependent item	nutanix.cluster.content.cache.saved.memory.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_saved_memory_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Logical SSD usage, bytes	Logical SSD memory used to cache data without deduplication in bytes.	Dependent item	nutanix.cluster.content.cache.logical.ssd.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_logical_ssd_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Number of lookups	Number of lookups on the content cache.	Dependent item	nutanix.cluster.content.cache.lookups.num Preprocessing JSON Path: `$.stats.content_cache_num_lookups` ⛔️Custom on fail: Discard value
Content Cache: Physical memory usage, bytes	Real memory used to cache data via the content cache in bytes.	Dependent item	nutanix.cluster.content.cache.physical.memory.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_physical_memory_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Physical SSD usage, bytes	Real SSD usage used to cache data via the content cache in bytes.	Dependent item	nutanix.cluster.content.cache.physical.ssd.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_physical_ssd_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: References	Average number of content cache references.	Dependent item	nutanix.cluster.content.cache.dedup.ref.num Preprocessing JSON Path: `$.stats.content_cache_num_dedup_ref_count_pph` ⛔️Custom on fail: Discard value
Content Cache: Saved SSD usage, bytes	SSD usage saved due to content cache deduplication in bytes.	Dependent item	nutanix.cluster.content.cache.saved.ssd.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_saved_ssd_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Controller: Random IO	The number of random Input/Output operations from the controller.	Dependent item	nutanix.cluster.controller.io.random Preprocessing JSON Path: `$.stats.controller_num_random_io` ⛔️Custom on fail: Discard value
Controller: Random IO, %	The percentage of random Input/Output from the controller.	Dependent item	nutanix.cluster.controller.io.random.percent Preprocessing JSON Path: `$.stats.controller_random_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Controller: Sequence IO	The number of sequential Input/Output operations from the controller.	Dependent item	nutanix.cluster.controller.io.sequence Preprocessing JSON Path: `$.stats.controller_num_seq_io` ⛔️Custom on fail: Discard value
Controller: Sequence IO, %	The percentage of sequential Input/Output from the controller.	Dependent item	nutanix.cluster.controller.io.sequence.percent Preprocessing JSON Path: `$.stats.controller_seq_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Storage Controller: Timespan, sec	Controller timespan.	Dependent item	nutanix.cluster.storage.controller.timespan.sec Preprocessing JSON Path: `$.stats.controller_timespan_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6` Discard unchanged with heartbeat: `1h`
Storage Controller: IO total, bytes	Total controller Input/Output size.	Dependent item	nutanix.cluster.storage.controller.io.total.bytes Preprocessing JSON Path: `$.stats.controller_total_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: IO total, sec	Total controller Input/Output time.	Dependent item	nutanix.cluster.storage.controller.io.total.sec Preprocessing JSON Path: `$.stats.controller_total_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: IO total read, bytes	Total controller read Input/Output size.	Dependent item	nutanix.cluster.storage.controller.io.read.total.bytes Preprocessing JSON Path: `$.stats.controller_total_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: IO total read, sec	Total controller read Input/Output time.	Dependent item	nutanix.cluster.storage.controller.io.read.total.sec Preprocessing JSON Path: `$.stats.controller_total_read_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Cluster operation mode	The cluster operation mode. One of the following: - NORMAL; - OVERRIDE; - READONLY; - STANDALONE; - SWITCHTOTWO_NODE; - UNKNOWN.	Dependent item	nutanix.cluster.cluster.operation.mode Preprocessing JSON Path: `$.operation_mode` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
General: Current redundancy factor	Current value of the redundancy factor on the cluster.	Dependent item	nutanix.cluster.redundancy.factor.current Preprocessing JSON Path: `$.cluster_redundancy_state.current_redundancy_factor` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
General: Desired redundancy factor	The desired value of the redundancy factor on the cluster.	Dependent item	nutanix.cluster.redundancy.factor.desired Preprocessing JSON Path: `$.cluster_redundancy_state.desired_redundancy_factor` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
General: IO	The number of Input/Output operations from the disk.	Dependent item	nutanix.cluster.general.io Preprocessing JSON Path: `$.stats.num_io` ⛔️Custom on fail: Discard value
General: IOPS	Input/Output operations per second from the disk.	Dependent item	nutanix.cluster.general.iops Preprocessing JSON Path: `$.stats.num_iops` ⛔️Custom on fail: Discard value
General: IO, bandwidth	Data transferred in B/sec from the disk.	Dependent item	nutanix.cluster.general.io.bandwidth Preprocessing JSON Path: `$.stats.io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: IO, latency	Input/Output latency from the disk.	Dependent item	nutanix.cluster.general.io.latency Preprocessing JSON Path: `$.stats.avg_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Random IO	The number of random Input/Output operations.	Dependent item	nutanix.cluster.general.io.random Preprocessing JSON Path: `$.stats.num_random_io` ⛔️Custom on fail: Discard value
General: Random IO, %	The percentage of random Input/Output operations.	Dependent item	nutanix.cluster.general.io.random.percent Preprocessing JSON Path: `$.stats.random_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Read IO	Total number of Input/Output read operations.	Dependent item	nutanix.cluster.general.io.read Preprocessing JSON Path: `$.stats.num_read_io` ⛔️Custom on fail: Discard value
General: Read IOPS	Input/Output read operations per second from the disk.	Dependent item	nutanix.cluster.general.iops.read Preprocessing JSON Path: `$.stats.num_read_iops` ⛔️Custom on fail: Discard value
General: Read IO, %	The total percentage of Input/Output operations that are reads.	Dependent item	nutanix.cluster.general.io.read.percent Preprocessing JSON Path: `$.stats.read_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Read IO, bandwidth	Read data transferred in B/sec from the disk.	Dependent item	nutanix.cluster.general.io.read.bandwidth Preprocessing JSON Path: `$.stats.read_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Read IO, latency	Average Input/Output read latency.	Dependent item	nutanix.cluster.general.io.read.latency Preprocessing JSON Path: `$.stats.avg_read_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Sequence IO	The number of sequential Input/Output operations.	Dependent item	nutanix.cluster.general.io.sequence Preprocessing JSON Path: `$.stats.num_seq_io` ⛔️Custom on fail: Discard value
General: Sequence IO, %	The percentage of sequential Input/Output.	Dependent item	nutanix.cluster.general.io.sequence.percent Preprocessing JSON Path: `$.stats.seq_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Storage capacity, bytes	Total size of the datastores used by this system in bytes.	Dependent item	nutanix.cluster.general.storage.capacity.bytes Preprocessing JSON Path: `$.usage_stats.['storage.capacity_bytes']` Discard unchanged with heartbeat: `1h`
General: Storage free, bytes	Total free space of the datastores used by this system in bytes.	Dependent item	nutanix.cluster.general.storage.free.bytes Preprocessing JSON Path: `$.usage_stats.['storage.free_bytes']` Discard unchanged with heartbeat: `1h`
General: Storage logical usage, bytes	Total logical space used by the datastores of this system in bytes.	Dependent item	nutanix.cluster.general.storage.logical.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage.logical_usage_bytes']` Discard unchanged with heartbeat: `1h`
General: Storage usage, bytes	Total physical datastore space used by this host and all its snapshots on the datastores.	Dependent item	nutanix.cluster.general.storage.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage.usage_bytes']` Discard unchanged with heartbeat: `1h`
General: Timespan, sec	Cluster timespan.	Dependent item	nutanix.cluster.general.timespan.sec Preprocessing JSON Path: `$.stats.timespan_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: IO total, sec	Total time of Input/Output operations.	Dependent item	nutanix.cluster.general.io.total.sec Preprocessing JSON Path: `$.stats.total_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: IO total, bytes	Total size of Input/Output operations.	Dependent item	nutanix.cluster.general.io.total.bytes Preprocessing JSON Path: `$.stats.total_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: IO total read, sec	Total time of Input/Output read operations.	Dependent item	nutanix.cluster.general.io.read.total.sec Preprocessing JSON Path: `$.stats.total_read_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: IO total read, bytes	Total size of Input/Output read operations.	Dependent item	nutanix.cluster.general.io.read.total.bytes Preprocessing JSON Path: `$.stats.total_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Total transformed usage, bytes	Actual usage of storage.	Dependent item	nutanix.cluster.general.transformed.usage.total.bytes Preprocessing JSON Path: `$.stats.total_transformed_usage_bytes` ⛔️Custom on fail: Discard value
General: Total untransformed usage, bytes	Logical usage of storage (physical usage divided by the replication factor).	Dependent item	nutanix.cluster.general.untransformed.usage.total.bytes Preprocessing JSON Path: `$.stats.total_untransformed_usage_bytes` ⛔️Custom on fail: Discard value
General: Upgrade progress	Indicates whether the cluster is currently in an update state.	Dependent item	nutanix.cluster.general.upgrade.progress Preprocessing JSON Path: `$.is_upgrade_in_progress` ⛔️Custom on fail: Discard value Boolean to decimal ⛔️Custom on fail: Set value to: `2` Discard unchanged with heartbeat: `1h`
General: Version	Current software version in the cluster.	Dependent item	nutanix.cluster.general.upgrade.version Preprocessing JSON Path: `$.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`
General: Write IO	Input/Output write operations from the disk.	Dependent item	nutanix.cluster.general.io.write Preprocessing JSON Path: `$.stats.num_write_io` ⛔️Custom on fail: Discard value
General: Write IOPS	Total number of Input/Output write operations per second.	Dependent item	nutanix.cluster.general.iops.write Preprocessing JSON Path: `$.stats.num_write_iops` ⛔️Custom on fail: Discard value
General: Write IO, %	Total percentage of Input/Output operations that are writes.	Dependent item	nutanix.cluster.general.io.write.percent Preprocessing JSON Path: `$.stats.write_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Write IO, bandwidth	Write data transferred in B/sec from the disk.	Dependent item	nutanix.cluster.general.io.write.bandwidth Preprocessing JSON Path: `$.stats.write_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Write IO, latency	Average Input/Output write operation latency.	Dependent item	nutanix.cluster.general.io.write.latency Preprocessing JSON Path: `$.stats.avg_write_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: CPU usage, %	Percentage of CPU used by the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.cpu.usage.percent Preprocessing JSON Path: `$.stats.hypervisor_cpu_usage_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Hypervisor: IOPS	Input/Output operations per second from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.iops Preprocessing JSON Path: `$.stats.hypervisor_num_iops` ⛔️Custom on fail: Discard value
Hypervisor: IO, bandwidth	Data transferred in B/sec from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.bandwidth Preprocessing JSON Path: `$.stats.hypervisor_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: IO, latency	Input/Output operation latency from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.latency Preprocessing JSON Path: `$.stats.hypervisor_avg_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Memory usage, %	Percentage of memory used by the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.memory.usage.percent Preprocessing JSON Path: `$.stats.hypervisor_memory_usage_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Hypervisor: IO	The number of Input/Output operations from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io Preprocessing JSON Path: `$.stats.hypervisor_num_io` ⛔️Custom on fail: Discard value
Hypervisor: Read IO	The number of Input/Output read operations from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.read Preprocessing JSON Path: `$.stats.hypervisor_num_read_io` ⛔️Custom on fail: Discard value
Hypervisor: Read IOPS	Input/Output read operations per second from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.iops.read Preprocessing JSON Path: `$.stats.hypervisor_num_read_iops` ⛔️Custom on fail: Discard value
Hypervisor: Read IO, bandwidth	Read data transferred in B/sec from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.read.bandwidth Preprocessing JSON Path: `$.stats.hypervisor_read_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: Read IO, latency	Input/Output read latency from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.read.latency Preprocessing JSON Path: `$.stats.hypervisor_avg_read_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Timespan, sec	Hypervisor timespan.	Dependent item	nutanix.cluster.hypervisor.timespan.sec Preprocessing JSON Path: `$.stats.hypervisor_timespan_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: IO total, sec	Total Input/Output operation time from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.total.sec Preprocessing JSON Path: `$.stats.hypervisor_total_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: IO total, bytes	Total Input/Output operation size from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.total.bytes Preprocessing JSON Path: `$.stats.hypervisor_total_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: IO total read, bytes	Total Input/Output read operation size from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.read.total.bytes Preprocessing JSON Path: `$.stats.hypervisor_total_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: IO total read, sec	Total Input/Output read operation time from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.read.total.sec Preprocessing JSON Path: `$.stats.hypervisor_total_read_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Write IOPS	Input/Output write operations per second from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.iops.write Preprocessing JSON Path: `$.stats.hypervisor_num_write_iops` ⛔️Custom on fail: Discard value
Hypervisor: Write IO	Input/Output write operations from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.write Preprocessing JSON Path: `$.stats.hypervisor_num_write_io` ⛔️Custom on fail: Discard value
Hypervisor: Write IO, bandwidth	Write data transferred in B/sec from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.write.bandwidth Preprocessing JSON Path: `$.stats.hypervisor_write_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: Write IO, latency	Input/Output write latency from the Hypervisor.	Dependent item	nutanix.cluster.hypervisor.io.write.latency Preprocessing JSON Path: `$.stats.hypervisor_avg_write_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: IOPS	Input/Output operations per second from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.iops Preprocessing JSON Path: `$.stats.controller_num_iops` ⛔️Custom on fail: Discard value
Storage Controller: IO	Input/Output operations from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io Preprocessing JSON Path: `$.stats.controller_num_io` ⛔️Custom on fail: Discard value
Storage Controller: IO, bandwidth	Data transferred in B/sec from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.bandwidth Preprocessing JSON Path: `$.stats.controller_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: IO, latency	Input/Output latency from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.latency Preprocessing JSON Path: `$.stats.controller_avg_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: Read IOPS	Input/Output read operations per second from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.iops.read Preprocessing JSON Path: `$.stats.controller_num_read_iops` ⛔️Custom on fail: Discard value
Storage Controller: Read IO	Input/Output read operations from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.read Preprocessing JSON Path: `$.stats.controller_num_read_io` ⛔️Custom on fail: Discard value
Storage Controller: Read IO, %	Percentage of Input/Output operations from the Storage Controller that are reads.	Dependent item	nutanix.cluster.storage.controller.io.read.percent Preprocessing JSON Path: `$.stats.controller_read_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Storage Controller: Read IO, bandwidth	Read data transferred in B/sec from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.read.bandwidth Preprocessing JSON Path: `$.stats.controller_read_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: Read IO, latency	Input/Output read latency from the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.read.latency Preprocessing JSON Path: `$.stats.controller_avg_read_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: Read IO, bytes	Storage controller average read Input/Output in bytes.	Dependent item	nutanix.cluster.storage.controller.io.read.bytes Preprocessing JSON Path: `$.stats.controller_avg_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: Total transformed usage, bytes	Actual usage of the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.transformed.usage.total.bytes Preprocessing JSON Path: `$.stats.controller_total_transformed_usage_bytes` ⛔️Custom on fail: Discard value
Storage Controller: Write IO	Input/Output write operations to the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.write Preprocessing JSON Path: `$.stats.controller_num_write_io` ⛔️Custom on fail: Discard value
Storage Controller: Write IOPS	Input/Output write operations per second to the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.iops.write Preprocessing JSON Path: `$.stats.controller_num_write_iops` ⛔️Custom on fail: Discard value
Storage Controller: Write IO, %	Percentage of Input/Output operations to the Storage Controller that are writes.	Dependent item	nutanix.cluster.storage.controller.io.write.percent Preprocessing JSON Path: `$.stats.controller_write_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Storage Controller: Write IO, bandwidth	Write data transferred in B/sec to the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.write.bandwidth Preprocessing JSON Path: `$.stats.controller_write_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: Write IO, latency	Input/Output write latency to the Storage Controller.	Dependent item	nutanix.cluster.storage.controller.io.write.latency Preprocessing JSON Path: `$.stats.controller_avg_write_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: Write IO, bytes	Storage Controller average write Input/Output in bytes.	Dependent item	nutanix.cluster.storage.controller.io.write.bytes Preprocessing JSON Path: `$.stats.controller_avg_write_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Tier: Das-sata capacity, bytes	The total capacity of Das-sata in bytes.	Dependent item	nutanix.cluster.storage.controller.tier.das_sata.capacity.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.das-sata.capacity_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: Das-sata free, bytes	The free space of Das-sata in bytes.	Dependent item	nutanix.cluster.storage.controller.tier.das_sata.free.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.das-sata.free_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: Das-sata usage, bytes	The used space of Das-sata in bytes.	Dependent item	nutanix.cluster.storage.controller.tier.das_sata.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.das-sata.usage_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: SSD capacity, bytes	The total capacity of SSD in bytes.	Dependent item	nutanix.cluster.storage.controller.tier.ssd.capacity.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.ssd.capacity_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: SSD free, bytes	The free space of SSD in bytes.	Dependent item	nutanix.cluster.storage.controller.tier.ssd.free.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.ssd.free_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: SSD usage, bytes	The used space of SSD in bytes.	Dependent item	nutanix.cluster.storage.controller.tier.ssd.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.ssd.usage_bytes']` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression
Nutanix: Failed to get metric data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.metric.get.check))>0`\|High
Nutanix: Failed to get alert data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.alert.get.check))>0`\|High
Nutanix: Redundancy factor mismatched	Current redundancy factor does not match the desired redundancy factor.	`last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.redundancy.factor.current)<>last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.redundancy.factor.desired)`\|High

LLD rule Alert discovery

Name Description Type Key and additional info

Alert discovery

Discovery of all alerts.

Alerts will be grouped by title. For each alert, in addition to the basic information, the number of activation and last alert ID will be available.

Dependent item

nutanix.cluster.alert.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Alert discovery

Name	Description	Type	Key and additional info
Alert [{#ALERT.NAME}]: Full title	The full title of the alert.	Dependent item	nutanix.cluster.alert.title["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.title` Discard unchanged with heartbeat: `12h`
Alert [{#ALERT.NAME}]: Create datetime	The alert creation date and time.	Dependent item	nutanix.cluster.alert.created["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.created` Custom multiplier: `0.000001` Discard unchanged with heartbeat: `1h`
Alert [{#ALERT.NAME}]: Severity	Alert severity. One of the following: - Info; - Warning; - Critical; - Unknown.	Dependent item	nutanix.cluster.alert.severity["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.severity` Discard unchanged with heartbeat: `1h`
Alert [{#ALERT.NAME}]: State	Alert state. One of the following: - OK; - Problem.	Dependent item	nutanix.cluster.alert.state["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.state` Discard unchanged with heartbeat: `1h`
Alert [{#ALERT.NAME}]: Detailed message	Detailed information about the current alert.	Dependent item	nutanix.cluster.alert.message["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.message` Discard unchanged with heartbeat: `12h`
Alert [{#ALERT.NAME}]: Last alert ID	Latest ID of the alert.	Dependent item	nutanix.cluster.alert.last_id["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.last_id` Discard unchanged with heartbeat: `12h`
Alert [{#ALERT.NAME}]: Count alerts	The number of times this alert was triggered.	Dependent item	nutanix.cluster.alert.count["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.count` Discard unchanged with heartbeat: `1h`

Nutanix Host Prism Element by HTTP

Overview

This template is designed for the effortless deployment of Nutanix Host Prism Element monitoring and doesn't require any external scripts.

This template can be used in discovery, as well as manually linked to a host - to do so, attach it to the host and manually set the value of the {$NUTANIX.HOST.UUID} macro.

More details can be found in the official documentation:

on retrieving UUIDs;
on the Nutanix Prism Element REST API;
on differences between Nutanix API versions.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nutanix Prism Element 6.5.5.7 (API v2.0)

Configuration

Setup

Create a new Nutanix user and add the role "Viewer"
Create a new host
Link the template to the host created earlier
Set the host macros (on the host or template level) required for getting data:

{$NUTANIX.PRISM.ELEMENT.IP}
{$NUTANIX.PRISM.ELEMENT.PORT}

Set the host macros (on the host or template level) with the login and password of the Nutanix user created earlier:

{$NUTANIX.USER}
{$NUTANIX.PASSWORD}

Set the host macros (on the host or template level) with the UUID of the Nutanix Host:

{$NUTANIX.HOST.UUID}

Macros used

Name	Description	Default
{$NUTANIX.PRISM.ELEMENT.IP}	Set the Nutanix API IP here.	`<Put your IP here>`
{$NUTANIX.PRISM.ELEMENT.PORT}	Set the Nutanix API port here.	`9440`
{$NUTANIX.USER}	Nutanix API username.	`<Put your API username here>`
{$NUTANIX.PASSWORD}	Nutanix API password.	`<Put your API password here>`
{$NUTANIX.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$NUTANIX.HOST.UUID}	UUID of the host.
{$NUTANIX.TIMEOUT}	API response timeout.	`10s`
{$NUTANIX.ALERT.DISCOVERY.NAME.MATCHES}	Filter of discoverable Nutanix alerts by name.	`.*`
{$NUTANIX.ALERT.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered Nutanix alerts by name.	`CHANGE_IF_NEEDED`
{$NUTANIX.ALERT.DISCOVERY.STATE.MATCHES}	Filter to exclude discovered Nutanix alerts by state. Set "1" for filtering only problem alerts or "0" for resolved ones.	`.*`
{$NUTANIX.ALERT.DISCOVERY.SEVERITY.MATCHES}	Filter to exclude discovered Nutanix alerts by severity. Set all possible severities for filtering in the range 0-2. "0" - Info, "1" - Warning, "2" - Critical.	`.*`

Items

Name	Description	Type	Key and additional info
Get metric	Get data about basic metrics.	Script	nutanix.host.metric.get
Get metric check	Data collection check. Check the latest values for details.	Dependent item	nutanix.host.metric.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Get disk	Get data about installed disks.	Script	nutanix.host.disk.get
Get disk check	Data collection check. Check the latest values for details.	Dependent item	nutanix.host.disk.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Get alert	Get data about alerts.	Script	nutanix.host.alert.get
Get alert check	Data collection check. Check the latest values for details.	Dependent item	nutanix.host.alert.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Content Cache: Hit rate, %	Content cache hits over all lookups.	Dependent item	nutanix.host.content.cache.hit.percent Preprocessing JSON Path: `$.stats.content_cache_hit_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Content Cache: Logical memory usage, bytes	Logical memory used to cache data without deduplication in bytes.	Dependent item	nutanix.host.content.cache.logical.memory.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_logical_memory_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Logical saved memory usage, bytes	Memory saved due to content cache deduplication in bytes.	Dependent item	nutanix.host.content.cache.saved.memory.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_saved_memory_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Logical SSD usage, bytes	Logical SSD memory used to cache data without deduplication in bytes.	Dependent item	nutanix.host.content.cache.logical.ssd.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_logical_ssd_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Number of lookups	Number of lookups on the content cache.	Dependent item	nutanix.host.content.cache.lookups.num Preprocessing JSON Path: `$.stats.content_cache_num_lookups` ⛔️Custom on fail: Discard value
Content Cache: Physical memory usage, bytes	Real memory used to cache data via the content cache in bytes.	Dependent item	nutanix.host.content.cache.physical.memory.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_physical_memory_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: Physical SSD usage, bytes	Real SSD usage used to cache data via the content cache in bytes.	Dependent item	nutanix.host.content.cache.physical.ssd.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_physical_ssd_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Content Cache: References	Average number of content cache references.	Dependent item	nutanix.host.content.cache.dedup.ref.num Preprocessing JSON Path: `$.stats.content_cache_num_dedup_ref_count_pph` ⛔️Custom on fail: Discard value
Content Cache: Saved SSD usage, bytes	SSD usage saved due to content cache deduplication in bytes.	Dependent item	nutanix.host.content.cache.saved.ssd.usage.bytes Preprocessing JSON Path: `$.stats.content_cache_saved_ssd_usage_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Controller: Random IO	The number of random Input/Output operations from the controller.	Dependent item	nutanix.host.controller.io.random Preprocessing JSON Path: `$.stats.controller_num_random_io` ⛔️Custom on fail: Discard value
Controller: Random IO, %	The percentage of random Input/Output from the controller.	Dependent item	nutanix.host.controller.io.random.percent Preprocessing JSON Path: `$.stats.controller_random_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Controller: Sequence IO	The number of sequential Input/Output operations from the controller.	Dependent item	nutanix.host.controller.io.sequence Preprocessing JSON Path: `$.stats.controller_num_seq_io` ⛔️Custom on fail: Discard value
Controller: Sequence IO, %	The percentage of sequential Input/Output from the controller.	Dependent item	nutanix.host.controller.io.sequence.percent Preprocessing JSON Path: `$.stats.controller_seq_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Storage Controller: Timespan, sec	Controller timespan.	Dependent item	nutanix.host.storage.controller.timespan.sec Preprocessing JSON Path: `$.stats.controller_timespan_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6` Discard unchanged with heartbeat: `1h`
Storage Controller: IO total, bytes	Total controller Input/Output size.	Dependent item	nutanix.host.storage.controller.io.total.bytes Preprocessing JSON Path: `$.stats.controller_total_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: IO total, sec	Total controller Input/Output time.	Dependent item	nutanix.host.storage.controller.io.total.sec Preprocessing JSON Path: `$.stats.controller_total_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: IO total read, bytes	Total controller read Input/Output size.	Dependent item	nutanix.host.storage.controller.io.read.total.bytes Preprocessing JSON Path: `$.stats.controller_total_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: IO total read, sec	Total controller read Input/Output time.	Dependent item	nutanix.host.storage.controller.io.read.total.sec Preprocessing JSON Path: `$.stats.controller_total_read_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Boot time	The last host boot time.	Dependent item	nutanix.host.general.boot.time Preprocessing JSON Path: `$.boot_time_in_usecs` Custom multiplier: `1.0E-6` Discard unchanged with heartbeat: `1h`
General: CPU frequency	The processor frequency.	Dependent item	nutanix.host.general.cpu.frequency Preprocessing JSON Path: `$.cpu_frequency_in_hz` Discard unchanged with heartbeat: `1h`
General: CPU model	The processor model.	Dependent item	nutanix.host.general.cpu.model Preprocessing JSON Path: `$.cpu_model` Discard unchanged with heartbeat: `12h`
General: Host state	Displays the host state. One of the following: - NEW; - NORMAL; - MARKEDFORREMOVALBUTNOT_DETACHABLE; - DETACHABLE.	Dependent item	nutanix.host.general.state Preprocessing JSON Path: `$.state` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
General: Host type	Displays the host type. One of the following: - HYPERCONVERGED; - COMPUTEONLY.	Dependent item	nutanix.host.general.type Preprocessing JSON Path: `$.host_type` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
General: IOPS	Input/Output operations per second from the disk.	Dependent item	nutanix.host.general.iops Preprocessing JSON Path: `$.stats.num_iops` ⛔️Custom on fail: Discard value
General: IO	The number of Input/Output operations from the disk.	Dependent item	nutanix.host.general.io Preprocessing JSON Path: `$.stats.num_io` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: IO, bandwidth	Data transferred in B/sec from the disk.	Dependent item	nutanix.host.general.io.bandwidth Preprocessing JSON Path: `$.stats.io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: IO, latency	Input/Output latency from the disk.	Dependent item	nutanix.host.general.io.latency Preprocessing JSON Path: `$.stats.avg_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Degrade status	Indicates whether the host is in a degraded state. One of the following: - Normal; - Degraded; - Unknown.	Dependent item	nutanix.host.general.degraded Preprocessing JSON Path: `$.is_degraded` Boolean to decimal ⛔️Custom on fail: Set value to: `2` Discard unchanged with heartbeat: `1h`
General: Maintenance mode	Indicates whether the host is in maintenance mode. One of the following: - Normal; - Maintenance; - Unknown.	Dependent item	nutanix.host.general.maintenance Preprocessing JSON Path: `$.host_in_maintenance_mode` Boolean to decimal ⛔️Custom on fail: Set value to: `2` Discard unchanged with heartbeat: `1h`
General: Number of virtual machines	Number of virtual machines running on this host.	Dependent item	nutanix.host.general.vms.num Preprocessing JSON Path: `$.num_vms` Discard unchanged with heartbeat: `1h`
General: Random IO	The number of random Input/Output operations.	Dependent item	nutanix.host.general.io.random Preprocessing JSON Path: `$.stats.num_random_io` ⛔️Custom on fail: Discard value
General: Random IO, %	The percentage of random Input/Output.	Dependent item	nutanix.host.general.io.random.percent Preprocessing JSON Path: `$.stats.random_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Read IO	Input/Output read operations from the disk.	Dependent item	nutanix.host.general.io.read Preprocessing JSON Path: `$.stats.num_read_io` ⛔️Custom on fail: Discard value
General: Read IOPS	Total number of Input/Output read operations per second.	Dependent item	nutanix.host.general.iops.read Preprocessing JSON Path: `$.stats.num_read_iops` ⛔️Custom on fail: Discard value
General: Read IO, %	The total percentage of Input/Output operations that are reads.	Dependent item	nutanix.host.general.io.read.percent Preprocessing JSON Path: `$.stats.read_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Read IO, bandwidth	Read data transferred in B/sec from the disk.	Dependent item	nutanix.host.general.io.read.bandwidth Preprocessing JSON Path: `$.stats.read_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Read IO, latency	Average Input/Output read latency.	Dependent item	nutanix.host.general.io.read.latency Preprocessing JSON Path: `$.stats.avg_read_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Reboot pending	Indicates whether the host is pending to reboot.	Dependent item	nutanix.host.general.reboot Preprocessing JSON Path: `$.reboot_pending` Boolean to decimal ⛔️Custom on fail: Set value to: `2` Discard unchanged with heartbeat: `1h`
General: Sequence IO	The number of sequential Input/Output operations.	Dependent item	nutanix.host.general.io.sequence Preprocessing JSON Path: `$.stats.num_seq_io` ⛔️Custom on fail: Discard value
General: Sequence IO, %	The percentage of sequential Input/Output.	Dependent item	nutanix.host.general.io.sequence.percent Preprocessing JSON Path: `$.stats.seq_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Storage capacity, bytes	Total size of the datastores used by this system in bytes.	Dependent item	nutanix.host.general.storage.capacity.bytes Preprocessing JSON Path: `$.usage_stats.['storage.capacity_bytes']` Discard unchanged with heartbeat: `1h`
General: Storage free, bytes	Total free space of all the datastores used by this system in bytes.	Dependent item	nutanix.host.general.storage.free.bytes Preprocessing JSON Path: `$.usage_stats.['storage.free_bytes']` Discard unchanged with heartbeat: `1h`
General: Storage logical usage, bytes	Total logical used space by the datastores of this system in bytes.	Dependent item	nutanix.host.general.storage.logical.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage.logical_usage_bytes']` Discard unchanged with heartbeat: `1h`
General: Storage usage, bytes	Total physical datastore space used by this host and all its snapshots on the datastores.	Dependent item	nutanix.host.general.storage.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage.usage_bytes']` Discard unchanged with heartbeat: `1h`
General: Timespan, sec	Host timespan.	Dependent item	nutanix.host.general.timespan.sec Preprocessing JSON Path: `$.stats.timespan_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: Total CPU capacity	Total host CPU capacity in Hz.	Dependent item	nutanix.host.general.cpu.capacity.hz Preprocessing JSON Path: `$.cpu_capacity_in_hz` Discard unchanged with heartbeat: `1h`
General: IO total, sec	Total time of Input/Output operations.	Dependent item	nutanix.host.general.io.total.sec Preprocessing JSON Path: `$.stats.total_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: IO total, bytes	Total size of Input/Output operations.	Dependent item	nutanix.host.general.io.total.bytes Preprocessing JSON Path: `$.stats.total_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Total memory, bytes	Total host memory in bytes.	Dependent item	nutanix.host.general.memory.total.bytes Preprocessing JSON Path: `$.memory_capacity_in_bytes` Discard unchanged with heartbeat: `1h`
General: IO total read, sec	Total time of Input/Output read operations.	Dependent item	nutanix.host.general.io.read.total.sec Preprocessing JSON Path: `$.stats.total_read_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
General: IO total read, bytes	Total size of Input/Output read operations.	Dependent item	nutanix.host.general.io.read.total.bytes Preprocessing JSON Path: `$.stats.total_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Total transformed usage, bytes	Actual usage of storage.	Dependent item	nutanix.host.general.transformed.usage.total.bytes Preprocessing JSON Path: `$.stats.total_transformed_usage_bytes` ⛔️Custom on fail: Discard value
General: Total untransformed usage, bytes	Logical usage of storage (physical usage divided by the replication factor).	Dependent item	nutanix.host.general.untransformed.usage.total.bytes Preprocessing JSON Path: `$.stats.total_untransformed_usage_bytes` ⛔️Custom on fail: Discard value
General: Write IO	Total number of Input/Output write operations.	Dependent item	nutanix.host.general.io.write Preprocessing JSON Path: `$.stats.num_write_io` ⛔️Custom on fail: Discard value
General: Write IOPS	Total number of Input/Output operations write per second.	Dependent item	nutanix.host.general.iops.write Preprocessing JSON Path: `$.stats.num_write_iops` ⛔️Custom on fail: Discard value
General: Write IO, %	Total percentage of Input/Output operations that are writes.	Dependent item	nutanix.host.general.io.write.percent Preprocessing JSON Path: `$.stats.write_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
General: Write IO, bandwidth	Write data transferred in B/sec from the disk.	Dependent item	nutanix.host.general.io.write.bandwidth Preprocessing JSON Path: `$.stats.write_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
General: Write IO, latency	Average Input/Output write operation latency.	Dependent item	nutanix.host.general.io.write.latency Preprocessing JSON Path: `$.stats.avg_write_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: CPU usage, %	Percentage of CPU used by the Hypervisor.	Dependent item	nutanix.host.hypervisor.cpu.usage.percent Preprocessing JSON Path: `$.stats.hypervisor_cpu_usage_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Hypervisor: Full name	Full name of the Hypervisor running on the host.	Dependent item	nutanix.host.hypervisor.name Preprocessing JSON Path: `$.hypervisor_full_name` Discard unchanged with heartbeat: `12h`
Hypervisor: IOPS	Input/Output operations per second from the Hypervisor.	Dependent item	nutanix.host.hypervisor.iops Preprocessing JSON Path: `$.stats.hypervisor_num_iops` ⛔️Custom on fail: Discard value
Hypervisor: IO, bandwidth	Data transferred in B/sec from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.bandwidth Preprocessing JSON Path: `$.stats.hypervisor_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: IO, latency	Input/Output operation latency from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.latency Preprocessing JSON Path: `$.stats.hypervisor_avg_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Memory usage, %	Percentage of memory used by the Hypervisor.	Dependent item	nutanix.host.hypervisor.memory.usage.percent Preprocessing JSON Path: `$.stats.hypervisor_memory_usage_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Hypervisor: IO	The number of Input/Output operations from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io Preprocessing JSON Path: `$.stats.hypervisor_num_io` ⛔️Custom on fail: Discard value
Hypervisor: Read IOPS	The number of Input/Output read operations from the Hypervisor.	Dependent item	nutanix.host.hypervisor.iops.read Preprocessing JSON Path: `$.stats.hypervisor_num_read_iops` ⛔️Custom on fail: Discard value
Hypervisor: Read IO	Input/Output read operations per second from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.read Preprocessing JSON Path: `$.stats.hypervisor_num_read_io` ⛔️Custom on fail: Discard value
Hypervisor: Read IO, bandwidth	Read data transferred in B/sec from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.read.bandwidth Preprocessing JSON Path: `$.stats.hypervisor_read_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: Read IO, latency	Input/Output read latency from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.read.latency Preprocessing JSON Path: `$.stats.hypervisor_avg_read_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Received, bytes	Bytes received over the network reported by the Hypervisor.	Dependent item	nutanix.host.hypervisor.received.bytes Preprocessing JSON Path: `$.stats.hypervisor_num_received_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Hypervisor: Timespan, sec	Hypervisor timespan.	Dependent item	nutanix.host.hypervisor.timespan.sec Preprocessing JSON Path: `$.stats.hypervisor_timespan_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: IO total, sec	Total Input/Output operation time from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.total.sec Preprocessing JSON Path: `$.stats.hypervisor_total_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: IO total, bytes	Total Input/Output operation size from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.total.bytes Preprocessing JSON Path: `$.stats.hypervisor_total_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: IO total read, bytes	Total size of Input/Output read operations from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.read.total.bytes Preprocessing JSON Path: `$.stats.hypervisor_total_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: IO total read, sec	Total time of Input/Output read operations from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.read.total.sec Preprocessing JSON Path: `$.stats.hypervisor_total_read_io_time_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Transmitted, bytes	Bytes transmitted over the network reported by the Hypervisor.	Dependent item	nutanix.host.hypervisor.transmitted.bytes Preprocessing JSON Path: `$.stats.hypervisor_num_transmitted_bytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Hypervisor: Write IOPS	Input/Output write operations per second from the Hypervisor.	Dependent item	nutanix.host.hypervisor.iops.write Preprocessing JSON Path: `$.stats.hypervisor_num_write_iops` ⛔️Custom on fail: Discard value
Hypervisor: Write IO	Input/Output write operations from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.write Preprocessing JSON Path: `$.stats.hypervisor_num_write_io` ⛔️Custom on fail: Discard value
Hypervisor: Write IO, bandwidth	Write data transferred in B/sec from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.write.bandwidth Preprocessing JSON Path: `$.stats.hypervisor_write_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Hypervisor: Write IO, latency	Input/Output write latency from the Hypervisor.	Dependent item	nutanix.host.hypervisor.io.write.latency Preprocessing JSON Path: `$.stats.hypervisor_avg_write_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Hypervisor: Number of CPU cores	The number of CPU cores.	Dependent item	nutanix.host.hypervisor.cpu.cores.num Preprocessing JSON Path: `$.num_cpu_cores` Discard unchanged with heartbeat: `12h`
Hypervisor: Number of CPU sockets	The number of CPU sockets.	Dependent item	nutanix.host.hypervisor.cpu.sockets.num Preprocessing JSON Path: `$.num_cpu_sockets` Discard unchanged with heartbeat: `12h`
Hypervisor: Number of CPU threads	The number of CPU threads.	Dependent item	nutanix.host.hypervisor.cpu.threads.num Preprocessing JSON Path: `$.num_cpu_threads` Discard unchanged with heartbeat: `12h`
Storage Controller: IOPS	Input/Output operations per second from the Storage Controller.	Dependent item	nutanix.host.storage.controller.iops Preprocessing JSON Path: `$.stats.controller_num_iops` ⛔️Custom on fail: Discard value
Storage Controller: IO	Input/Output operations from the Storage Controller.	Dependent item	nutanix.host.storage.controller.io Preprocessing JSON Path: `$.stats.controller_num_io` ⛔️Custom on fail: Discard value
Storage Controller: IO, bandwidth	Data transferred in B/sec from the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.bandwidth Preprocessing JSON Path: `$.stats.controller_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: IO, latency	Input/Output latency from the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.latency Preprocessing JSON Path: `$.stats.controller_avg_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: Read IOPS	Input/Output read operations per second from the Storage Controller.	Dependent item	nutanix.host.storage.controller.iops.read Preprocessing JSON Path: `$.stats.controller_num_read_iops` ⛔️Custom on fail: Discard value
Storage Controller: Read IO	Input/Output read operations from the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.read Preprocessing JSON Path: `$.stats.controller_num_read_io` ⛔️Custom on fail: Discard value
Storage Controller: Read IO, %	Percentage of Input/Output operations from the Storage Controller that are reads.	Dependent item	nutanix.host.storage.controller.io.read.percent Preprocessing JSON Path: `$.stats.controller_read_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Storage Controller: Read IO, bandwidth	Read data transferred in B/sec from the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.read.bandwidth Preprocessing JSON Path: `$.stats.controller_read_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: Read IO, latency	Input/Output read latency from the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.read.latency Preprocessing JSON Path: `$.stats.controller_avg_read_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: Read IO, bytes	Storage Controller average read Input/Output in bytes.	Dependent item	nutanix.host.storage.controller.io.read.bytes Preprocessing JSON Path: `$.stats.controller_avg_read_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: Total transformed usage, bytes	Actual usage of the Storage Controller.	Dependent item	nutanix.host.storage.controller.transformed.usage.total.bytes Preprocessing JSON Path: `$.stats.controller_total_transformed_usage_bytes` ⛔️Custom on fail: Discard value
Storage Controller: Write IO	Input/Output write operations to the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.write Preprocessing JSON Path: `$.stats.controller_num_write_io` ⛔️Custom on fail: Discard value
Storage Controller: Write IOPS	Input/Output write operations per second to the Storage Controller.	Dependent item	nutanix.host.storage.controller.iops.write Preprocessing JSON Path: `$.stats.controller_num_write_iops` ⛔️Custom on fail: Discard value
Storage Controller: Write IO, %	Percentage of Input/Output operations to the Storage Controller that are writes.	Dependent item	nutanix.host.storage.controller.io.write.percent Preprocessing JSON Path: `$.stats.controller_write_io_ppm` ⛔️Custom on fail: Discard value Custom multiplier: `0.0001`
Storage Controller: Write IO, bandwidth	Write data transferred in B/sec to the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.write.bandwidth Preprocessing JSON Path: `$.stats.controller_write_io_bandwidth_kBps` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Controller: Write IO, latency	Input/Output write latency to the Storage Controller.	Dependent item	nutanix.host.storage.controller.io.write.latency Preprocessing JSON Path: `$.stats.controller_avg_write_io_latency_usecs` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E-6`
Storage Controller: Write IO, bytes	Storage Controller average write Input/Output in bytes.	Dependent item	nutanix.host.storage.controller.io.write.bytes Preprocessing JSON Path: `$.stats.controller_avg_write_io_size_kbytes` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Storage Tier: Das-sata capacity, bytes	The total capacity of Das-sata in bytes.	Dependent item	nutanix.host.storage.controller.tier.das_sata.capacity.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.das-sata.capacity_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: Das-sata free, bytes	The free space of Das-sata in bytes.	Dependent item	nutanix.host.storage.controller.tier.das_sata.free.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.das-sata.free_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: Das-sata usage, bytes	The used space of Das-sata in bytes.	Dependent item	nutanix.host.storage.controller.tier.das_sata.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.das-sata.usage_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: SSD capacity, bytes	The total capacity of SSD in bytes.	Dependent item	nutanix.host.storage.controller.tier.ssd.capacity.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.ssd.capacity_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: SSD free, bytes	The free space of SSD in bytes.	Dependent item	nutanix.host.storage.controller.tier.ssd.free.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.ssd.free_bytes']` Discard unchanged with heartbeat: `1h`
Storage Tier: SSD usage, bytes	The used space of SSD in bytes.	Dependent item	nutanix.host.storage.controller.tier.ssd.usage.bytes Preprocessing JSON Path: `$.usage_stats.['storage_tier.ssd.usage_bytes']` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression
Nutanix: Failed to get metric data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Host Prism Element by HTTP/nutanix.host.metric.get.check))>0`\|High
Nutanix: Failed to get disk data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Host Prism Element by HTTP/nutanix.host.disk.get.check))>0`\|High
Nutanix: Failed to get alert data from the API	Failed to get data from the API. Check the latest values for details.	`length(last(/Nutanix Host Prism Element by HTTP/nutanix.host.alert.get.check))>0`\|High
Nutanix: Host is in degraded status	Host is in a degraded status. The host may soon become unavailable.	`last(/Nutanix Host Prism Element by HTTP/nutanix.host.general.degraded)=1`\|High

LLD rule Disk discovery

Name Description Type Key and additional info

Disk discovery

Discovery of all disks.

Dependent item

nutanix.host.disk.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Disk discovery

Name	Description	Type	Key and additional info
Disk [{#DISK.SERIAL}]: Bandwidth	Bandwidth of the disk in B/sec.	Dependent item	nutanix.host.disk.io.bandwidth["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].stats.io_bandwidth_kBps` Custom multiplier: `1024`
Disk [{#DISK.SERIAL}]: Space: Total, bytes	The total disk space in bytes.	Dependent item	nutanix.host.disk.capacity.bytes["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].usage_stats.['storage.capacity_bytes']` Discard unchanged with heartbeat: `12h`
Disk [{#DISK.SERIAL}]: Space: Free, bytes	The free disk space in bytes.	Dependent item	nutanix.host.disk.free.bytes["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].usage_stats.['storage.free_bytes']` Discard unchanged with heartbeat: `1h`
Disk [{#DISK.SERIAL}]: IOPS	The number of Input/Output operations from the disk.	Dependent item	nutanix.host.disk.iops["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].stats.num_iops`
Disk [{#DISK.SERIAL}]: IO, latency	The average Input/Output operation latency.	Dependent item	nutanix.host.disk.io.avg.latency["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].stats.avg_io_latency_usecs` Custom multiplier: `1.0E-6`
Disk [{#DISK.SERIAL}]: Space: Logical usage, bytes	The logical used disk space in bytes.	Dependent item	nutanix.host.disk.logical.usage.bytes["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].usage_stats.['storage.logical_usage_bytes']` Discard unchanged with heartbeat: `1h`
Disk [{#DISK.SERIAL}]: Online	Indicates whether the disk is online.	Dependent item	nutanix.host.disk.online["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].online` Boolean to decimal ⛔️Custom on fail: Set value to: `2` Discard unchanged with heartbeat: `1h`
Disk [{#DISK.SERIAL}]: Status	Current disk status. One of the following: - NORMAL; - DATAMIGRATIONINITIATED; - MARKEDFORREMOVALBUTNOT_DETACHABLE; - DETACHABLE.	Dependent item	nutanix.host.disk.status["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].disk_status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Disk [{#DISK.SERIAL}]: Space: Used, bytes	The used disk space in bytes.	Dependent item	nutanix.host.disk.usage.bytes["{#DISK.SERIAL}"] Preprocessing JSON Path: `$.['{#DISK.ID}'].usage_stats.['storage.usage_bytes']` Discard unchanged with heartbeat: `1h`

LLD rule Alert discovery

Name Description Type Key and additional info

Alert discovery

Discovery of all alerts.

Alerts will be grouped by title. For each alert, in addition to the basic information, the number of activation and last alert ID will be available.

Dependent item

nutanix.host.alert.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Alert discovery

Name	Description	Type	Key and additional info
Alert [{#ALERT.NAME}]: Full title	The full title of the alert.	Dependent item	nutanix.host.alert.title["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.title` Discard unchanged with heartbeat: `12h`
Alert [{#ALERT.NAME}]: Create datetime	The alert creation date and time.	Dependent item	nutanix.host.alert.created["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.created` Custom multiplier: `0.000001` Discard unchanged with heartbeat: `1h`
Alert [{#ALERT.NAME}]: Severity	Alert severity. One of the following: - Info; - Warning; - Critical; - Unknown.	Dependent item	nutanix.host.alert.severity["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.severity` Discard unchanged with heartbeat: `1h`
Alert [{#ALERT.NAME}]: State	Alert state. One of the following: - OK; - Problem.	Dependent item	nutanix.host.alert.state["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.state` Discard unchanged with heartbeat: `1h`
Alert [{#ALERT.NAME}]: Detailed message	Detailed information about the current alert.	Dependent item	nutanix.host.alert.message["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.message` Discard unchanged with heartbeat: `12h`
Alert [{#ALERT.NAME}]: Last alert ID	Latest ID of the alert.	Dependent item	nutanix.host.alert.last_id["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.last_id` Discard unchanged with heartbeat: `12h`
Alert [{#ALERT.NAME}]: Count alerts	The number of times this alert was triggered.	Dependent item	nutanix.host.alert.count["{#ALERT.KEY}"] Preprocessing JSON Path: `$.{#ALERT.KEY}.count` Discard unchanged with heartbeat: `1h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nomad_http

View README Download JSON

HashiCorp Nomad by HTTP

Overview

This template is designed to monitor HashiCorp Nomad by Zabbix. It works without any external scripts. Currently the template supports Nomad servers and clients discovery.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Nomad version 1.5.6/1.6.0

Configuration

Setup

Create a synthetic Nomad host. It should be one of the Nomad cluster members, load-balancing service (if cluster is used) or a single node in a selected Nomad region.
Define the {$NOMAD.ENDPOINT.API.URL} macro value with correct web protocol, host and port.
Prepare an ACL token with node:read, namespace:read-job, agent:read and management permissions applied. Define the {$NOMAD.TOKEN} macro value. > Refer to the vendor documentation about Nomad native ACL or Nomad Vault-generated tokens if you have the HashiCorp Vault integration configured.

Additional information:

Synthetic Nomad host will be used just as an endpoint for servers and clients discovery (general cluster information), it will not be monitored as a Nomad server or client, so that to prevent duplicate entities.
If you're not using ACL - skip 3rd setup step.
The Nomad servers/clients discovery is limited by region. If you're using multi-region cluster- create one synthetic host per region.
The Nomad server/client templates are ready for separate usage. Feel free to use if you prefer manual host creation.

Useful links

Macros used

Name	Description	Default
{$NOMAD.ENDPOINT.API.URL}	API endpoint URL for one of the Nomad cluster members.	`http://localhost:4646`
{$NOMAD.TOKEN}	Nomad authentication token.	`<PUT YOUR AUTH TOKEN>`
{$NOMAD.DATA.TIMEOUT}	Response timeout for an API.	`15s`
{$NOMAD.HTTP.PROXY}	Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used.
{$NOMAD.API.RESPONSE.SUCCESS}	HTTP API successful response code. Availability triggers threshold. Change, if needed.	`200`
{$NOMAD.SERVER.NAME.MATCHES}	The filter to include HashiCorp Nomad servers by name.	`.*`
{$NOMAD.SERVER.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad servers by name.	`CHANGE_IF_NEEDED`
{$NOMAD.SERVER.DC.MATCHES}	The filter to include HashiCorp Nomad servers by datacenter belonging.	`.*`
{$NOMAD.SERVER.DC.NOT_MATCHES}	The filter to exclude HashiCorp Nomad servers by datacenter belonging.	`CHANGE_IF_NEEDED`
{$NOMAD.CLIENT.NAME.MATCHES}	The filter to include HashiCorp Nomad clients by name.	`.*`
{$NOMAD.CLIENT.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad clients by name.	`CHANGE_IF_NEEDED`
{$NOMAD.CLIENT.DC.MATCHES}	The filter to include HashiCorp Nomad clients by datacenter belonging.	`.*`
{$NOMAD.CLIENT.DC.NOT_MATCHES}	The filter to exclude HashiCorp Nomad clients by datacenter belonging.	`CHANGE_IF_NEEDED`
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.MATCHES}	The filter to include HashiCorp Nomad clients by scheduling eligibility.	`.*`
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.NOT_MATCHES}	The filter to exclude HashiCorp Nomad clients by scheduling eligibility.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Nomad clients get	Nomad clients data in raw format.	HTTP agent	nomad.client.nodes.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
Client nodes API response	Client nodes API response message.	Dependent item	nomad.client.nodes.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Nomad servers get	Nomad servers data in raw format.	Script	nomad.server.nodes.get
Server-related APIs response	Server-related (`operator/raft/configuration`, `agent/members`) APIs error response message.	Dependent item	nomad.server.api.response Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: `HTTP/1.1 200 OK` Discard unchanged with heartbeat: `1h`
Region	Current cluster region.	Dependent item	nomad.region Preprocessing JSON Path: `$..region.first()`
Nomad servers count	Nomad servers count.	Dependent item	nomad.servers.count Preprocessing JSON Path: `$[?(@.Name)].length()`
Nomad clients count	Nomad clients count.	Dependent item	nomad.clients.count Preprocessing JSON Path: `$.body[?(@.Name)].length()`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Nomad: Client nodes API connection has failed	Client nodes API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad by HTTP/nomad.client.nodes.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes
HashiCorp Nomad: Server-related API connection has failed	Server-related API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad by HTTP/nomad.server.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes

LLD rule Clients discovery

Name Description Type Key and additional info

Clients discovery

Client nodes discovery.

Dependent item

nomad.clients.discovery

Preprocessing

JSON Path: $.body
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 1h

LLD rule Servers discovery

Name Description Type Key and additional info

Servers discovery

Server nodes discovery.

Dependent item

nomad.servers.discovery

Preprocessing

Check for error in JSON: $.error
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 1h

HashiCorp Nomad Client by HTTP

Overview

This template is designed to monitor HashiCorp Nomad clients by Zabbix. It works without any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Nomad version 1.5.6/1.6.0

Configuration

Setup

Enable telemetry in HashiCorp Nomad agent configuration file. Set the Prometheus metrics format. >Refer to the vendor documentation.
Prepare an ACL token with node:read, namespace:read-job permissions applied. Define the {$NOMAD.TOKEN} macro value. > Refer to the vendor documentation about Nomad native ACL or Nomad Vault-generated tokens if you're using integration with HashiCorp Vault.
Set the values for the {$NOMAD.CLIENT.API.SCHEME} and {$NOMAD.CLIENT.API.PORT} macros to define the common Nomad API web schema and connection port.

Additional information:

You have to prepare an additional ACL token only if you wish to monitor Nomad clients as separate entities. If you're using clients discovery - token will be inherited from the master host linked to the HashiCorp Nomad by HTTP template.
If you're not using ACL - skip 2nd setup step.
The Nomad clients use the default web schema - HTTP and default API port - 4646. If you're using clients discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.CLIENT.API.SCHEME:NECESSARY.IP}} or/and {{$NOMAD.CLIENT.API.PORT:NECESSARY.IP}} on master host or template level.
Some metrics may not be collected depending on your HashiCorp Nomad agent version and configuration.

Useful links:

Macros used

Name	Description	Default
{$NOMAD.CLIENT.API.SCHEME}	Nomad client API scheme.	`http`
{$NOMAD.CLIENT.API.PORT}	Nomad client API port.	`4646`
{$NOMAD.TOKEN}	Nomad authentication token.	`<PUT YOUR AUTH TOKEN>`
{$NOMAD.DATA.TIMEOUT}	Response timeout for an API.	`15s`
{$NOMAD.HTTP.PROXY}	Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used.
{$NOMAD.API.RESPONSE.SUCCESS}	HTTP API successful response code. Availability triggers threshold. Change, if needed.	`200`
{$NOMAD.CLIENT.RPC.PORT}	Nomad RPC service port.	`4647`
{$NOMAD.CLIENT.SERF.PORT}	Nomad serf service port.	`4648`
{$NOMAD.CLIENT.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$NOMAD.DISK.NAME.MATCHES}	The filter to include HashiCorp Nomad client disks by name.	`.*`
{$NOMAD.DISK.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client disks by name.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.NAME.MATCHES}	The filter to include HashiCorp Nomad client jobs by name.	`.*`
{$NOMAD.JOB.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by name.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.NAMESPACE.MATCHES}	The filter to include HashiCorp Nomad client jobs by namespace.	`.*`
{$NOMAD.JOB.NAMESPACE.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by namespace.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.TYPE.MATCHES}	The filter to include HashiCorp Nomad client jobs by type.	`.*`
{$NOMAD.JOB.TYPE.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by type.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.TASK.GROUP.MATCHES}	The filter to include HashiCorp Nomad client jobs by task group belonging.	`.*`
{$NOMAD.JOB.TASK.GROUP.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by task group belonging.	`CHANGE_IF_NEEDED`
{$NOMAD.DRIVER.NAME.MATCHES}	The filter to include HashiCorp Nomad client drivers by name.	`.*`
{$NOMAD.DRIVER.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client drivers by name.	`CHANGE_IF_NEEDED`
{$NOMAD.DRIVER.DETECT.MATCHES}	The filter to include HashiCorp Nomad client drivers by detection state. Possible filtering values: `true`, `false`.	`.*`
{$NOMAD.DRIVER.DETECT.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client drivers by detection state. Possible filtering values: `true`, `false`.	`CHANGE_IF_NEEDED`
{$NOMAD.CPU.UTIL.MIN}	CPU utilization threshold. Measured as a percentage.	`90`
{$NOMAD.RAM.AVAIL.MIN}	CPU utilization threshold. Measured as a percentage.	`5`
{$NOMAD.INODES.FREE.MIN.WARN}	Warning threshold of the filesystem metadata utilization. Measured as a percentage.	`20`
{$NOMAD.INODES.FREE.MIN.CRIT}	Critical threshold of the filesystem metadata utilization. Measured as a percentage.	`10`

Items

Name	Description	Type	Key and additional info
Telemetry get	Telemetry data in raw format.	HTTP agent	nomad.client.data.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
Metrics	Nomad client metrics in raw format.	Dependent item	nomad.client.metrics.get Preprocessing JSON Path: `$.body` ⛔️Custom on fail: Discard value
Monitoring API response	Monitoring API response message.	Dependent item	nomad.client.data.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Service [rpc] state	Current [rpc] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
Service [serf] state	Current [serf] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
CPU allocated	Total amount of CPU shares the scheduler has allocated to tasks.	Dependent item	nomad.client.allocated.cpu Preprocessing Prometheus pattern: `VALUE(nomad_client_allocated_cpu)` ⛔️Custom on fail: Discard value
CPU unallocated	Total amount of CPU shares free for the scheduler to allocate to tasks.	Dependent item	nomad.client.unallocated.cpu Preprocessing Prometheus pattern: `VALUE(nomad_client_unallocated_cpu)` ⛔️Custom on fail: Discard value
Memory allocated	Total amount of memory the scheduler has allocated to tasks.	Dependent item	nomad.client.allocated.memory Preprocessing Prometheus pattern: `VALUE(nomad_client_allocated_memory)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
Memory unallocated	Total amount of memory free for the scheduler to allocate to tasks.	Dependent item	nomad.client.unallocated.memory Preprocessing Prometheus pattern: `VALUE(nomad_client_unallocated_memory)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
Disk allocated	Total amount of disk space the scheduler has allocated to tasks.	Dependent item	nomad.client.allocated.disk Preprocessing Prometheus pattern: `VALUE(nomad_client_allocated_disk)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
Disk unallocated	Total amount of disk space free for the scheduler to allocate to tasks.	Dependent item	nomad.client.unallocated.disk Preprocessing Prometheus pattern: `VALUE(nomad_client_unallocated_disk)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
Allocations blocked	Number of allocations waiting for previous versions.	Dependent item	nomad.client.allocations.blocked Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_blocked)` ⛔️Custom on fail: Set value to: `0`
Allocations migrating	Number of allocations migrating data from previous versions.	Dependent item	nomad.client.allocations.migrating Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_migrating)` ⛔️Custom on fail: Set value to: `0`
Allocations pending	Number of allocations pending (received by the client but not yet running).	Dependent item	nomad.client.allocations.pending Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_pending)` ⛔️Custom on fail: Set value to: `0`
Allocations starting	Number of allocations starting.	Dependent item	nomad.client.allocations.start Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_start)` ⛔️Custom on fail: Set value to: `0`
Allocations running	Number of allocations running.	Dependent item	nomad.client.allocations.running Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_running)` ⛔️Custom on fail: Set value to: `0`
Allocations terminal	Number of allocations terminal.	Dependent item	nomad.client.allocations.terminal Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_terminal)` ⛔️Custom on fail: Set value to: `0`
Allocations failed, rate	Number of allocations failed.	Dependent item	nomad.client.allocations.failed Preprocessing Prometheus pattern: `SUM(nomad_client_allocs_failed)` ⛔️Custom on fail: Set value to: `0` Change per second Discard unchanged with heartbeat: `1h`
Allocations completed, rate	Number of allocations completed.	Dependent item	nomad.client.allocations.complete Preprocessing Prometheus pattern: `SUM(nomad_client_allocs_complete)` ⛔️Custom on fail: Set value to: `0` Change per second Discard unchanged with heartbeat: `1h`
Allocations restarted, rate	Number of allocations restarted.	Dependent item	nomad.client.allocations.restart Preprocessing Prometheus pattern: `SUM(nomad_client_allocs_restart)` ⛔️Custom on fail: Set value to: `0` Change per second Discard unchanged with heartbeat: `1h`
Allocations OOM killed	Number of allocations OOM killed.	Dependent item	nomad.client.allocations.oom_killed Preprocessing Prometheus pattern: `VALUE(nomad_client_allocs_oom_killed)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
CPU idle utilization	CPU utilization in idle state.	Dependent item	nomad.client.cpu.idle Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_idle)` ⛔️Custom on fail: Discard value
CPU system utilization	CPU utilization in system space.	Dependent item	nomad.client.cpu.system Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_system)` ⛔️Custom on fail: Discard value
CPU total utilization	Total CPU utilization.	Dependent item	nomad.client.cpu.total Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_total)` ⛔️Custom on fail: Discard value
CPU user utilization	CPU utilization in user space.	Dependent item	nomad.client.cpu.user Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_user)` ⛔️Custom on fail: Discard value
Memory available	Total amount of memory available to processes which includes free and cached memory.	Dependent item	nomad.client.memory.available Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_available)` ⛔️Custom on fail: Discard value
Memory free	Amount of memory which is free.	Dependent item	nomad.client.memory.free Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_free)`
Memory size	Total amount of physical memory on the node.	Dependent item	nomad.client.memory.total Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_total)`
Memory used	Amount of memory used by processes.	Dependent item	nomad.client.memory.used Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_used)`
Uptime	Uptime of the host running the Nomad client.	Dependent item	nomad.client.uptime Preprocessing Prometheus pattern: `VALUE(nomad_client_uptime)`
Node info get	Node info data in raw format.	HTTP agent	nomad.client.node.info.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
Nomad client version	Nomad client version.	Dependent item	nomad.client.version Preprocessing JSON Path: `$.body..Version.first()`
Nodes API response	Nodes API response message.	Dependent item	nomad.client.node.info.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Allocated jobs get	Allocated jobs data in raw format.	HTTP agent	nomad.client.job.allocs.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
Allocations API response	Allocations API response message.	Dependent item	nomad.client.job.allocs.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
HashiCorp Nomad Client: Monitoring API connection has failed	Monitoring API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Client by HTTP/nomad.client.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes
HashiCorp Nomad Client: Service [rpc] is down	Cannot establish the connection to [rpc] service port {$NOMAD.CLIENT.RPC.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Client: Service [serf] is down	Cannot establish the connection to [serf] service port {$NOMAD.CLIENT.SERF.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Client: OOM killed allocations found	OOM killed allocations found.	`last(/HashiCorp Nomad Client by HTTP/nomad.client.allocations.oom_killed) > 0`\|Warning	Manual close: Yes
HashiCorp Nomad Client: High CPU utilization	CPU utilization is too high. The system might be slow to respond.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.cpu.total, 10m) >= {$NOMAD.CPU.UTIL.MIN}`\|Average
HashiCorp Nomad Client: High memory utilization	RAM utilization is too high. The system might be slow to respond.	`(min(/HashiCorp Nomad Client by HTTP/nomad.client.memory.available, 10m) / last(/HashiCorp Nomad Client by HTTP/nomad.client.memory.total))*100 <= {$NOMAD.RAM.AVAIL.MIN}`\|Average
HashiCorp Nomad Client: The host has been restarted	The host uptime is less than 10 minutes.	`last(/HashiCorp Nomad Client by HTTP/nomad.client.uptime) < 10m`\|Warning	Manual close: Yes
HashiCorp Nomad Client: Nomad client version has changed	Nomad client version has changed.	`change(/HashiCorp Nomad Client by HTTP/nomad.client.version)<>0`\|Info	Manual close: Yes
HashiCorp Nomad Client: Nodes API connection has failed	Nodes API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Client by HTTP/nomad.client.node.info.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes Depends on: HashiCorp Nomad Client: Monitoring API connection has failed
HashiCorp Nomad Client: Allocations API connection has failed	Allocations API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Client by HTTP/nomad.client.job.allocs.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes Depends on: HashiCorp Nomad Client: Monitoring API connection has failed

LLD rule Drivers discovery

Name Description Type Key and additional info

Drivers discovery

Client drivers discovery.

Dependent item

nomad.client.drivers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Drivers discovery

Name Description Type Key and additional info

Driver [{#DRIVER.NAME}] state

Driver [{#DRIVER.NAME}] state.

Dependent item

nomad.client.driver.state["{#DRIVER.NAME}"]

Preprocessing

JSON Path: $.body..Drivers.{#DRIVER.NAME}.Healthy.first()
Boolean to decimal
Discard unchanged with heartbeat: 1h

Driver [{#DRIVER.NAME}] detection state

Driver [{#DRIVER.NAME}] detection state.

Dependent item

nomad.client.driver.detected["{#DRIVER.NAME}"]

Preprocessing

JSON Path: $.body..Drivers.{#DRIVER.NAME}.Detected.first()
Boolean to decimal

Trigger prototypes for Drivers discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] is in unhealthy state	The [{#DRIVER.NAME}] driver detected, but its state is unhealthy.	`last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.state["{#DRIVER.NAME}"]) = 0 and last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) = 1`\|Warning	Manual close: Yes
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state has changed	The [{#DRIVER.NAME}] driver detection state has changed.	`change(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) <> 0`\|Info	Manual close: Yes

LLD rule Physical disks discovery

Name Description Type Key and additional info

Physical disks discovery

Physical disks discovery.

Dependent item

nomad.client.disk.discovery

Preprocessing

Prometheus to JSON: nomad_client_host_disk_available{disk=~".*"}

Item prototypes for Physical disks discovery

Name	Description	Type	Key and additional info
Disk ["{#DEV.NAME}"] space available	Amount of space which is available on ["{#DEV.NAME}"] disk.	Dependent item	nomad.client.disk.available["{#DEV.NAME}"] Preprocessing Prometheus pattern: `VALUE(nomad_client_host_disk_available{disk="{#DEV.NAME}"})`
Disk ["{#DEV.NAME}"] inodes utilization	Disk space consumed by the inodes on ["{#DEV.NAME}"] disk.	Dependent item	nomad.client.disk.inodes_percent["{#DEV.NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Disk ["{#DEV.NAME}"] size	Total size of the ["{#DEV.NAME}"] device.	Dependent item	nomad.client.disk.size["{#DEV.NAME}"] Preprocessing Prometheus pattern: `VALUE(nomad_client_host_disk_size{disk="{#DEV.NAME}"})`
Disk ["{#DEV.NAME}"] space utilization	Percentage of disk ["{#DEV.NAME}"] space used.	Dependent item	nomad.client.disk.used_percent["{#DEV.NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Disk ["{#DEV.NAME}"] space used	Amount of disk ["{#DEV.NAME}"] space which has been used.	Dependent item	nomad.client.disk.used["{#DEV.NAME}"] Preprocessing Prometheus pattern: `VALUE(nomad_client_host_disk_used{disk="{#DEV.NAME}"})`

Trigger prototypes for Physical disks discovery

Name	Description	Expression	Severity
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device	It may become impossible to write to a disk if there are no index nodes left. The following error messages may be returned as symptoms, even though the free space: - No space left on device; - Disk is full.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.WARN:"{#DEV.NAME}"}`\|Warning	Manual close: Yes Depends on: HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device	It may become impossible to write to a disk if there are no index nodes left. The following error messages may be returned as symptoms, even though the free space: - No space left on device; - Disk is full.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.CRIT:"{#DEV.NAME}"}`\|Average	Manual close: Yes
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization	High disk [{#DEV.NAME}] utilization.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.WARN:"{#DEV.NAME}"}`\|Warning	Manual close: Yes Depends on: HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization	High disk [{#DEV.NAME}] utilization.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.CRIT:"{#DEV.NAME}"}`\|Average	Manual close: Yes

LLD rule Allocated jobs discovery

Name Description Type Key and additional info

Allocated jobs discovery

Allocated jobs discovery.

Dependent item

nomad.client.alloc.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Allocated jobs discovery

Name	Description	Type	Key and additional info
Job ["{#JOB.NAME}"] CPU allocated	Total CPU resources allocated by the ["{#JOB.NAME}"] job across all cores.	Dependent item	nomad.client.allocs.cpu.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] CPU system utilization	Total CPU resources consumed by the ["{#JOB.NAME}"] job in system space.	Dependent item	nomad.client.allocs.cpu.system["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] CPU user utilization	Total CPU resources consumed by the ["{#JOB.NAME}"] job in user space.	Dependent item	nomad.client.allocs.cpu.user["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] CPU total utilization	Total CPU resources consumed by the ["{#JOB.NAME}"] job across all cores.	Dependent item	nomad.client.allocs.cpu.total_percent["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] CPU throttled periods time	Total number of CPU periods that the ["{#JOB.NAME}"] job was throttled.	Dependent item	nomad.client.allocs.cpu.throttled_periods["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Custom multiplier: `1e-09`
Job ["{#JOB.NAME}"] CPU throttled time	Total time that the ["{#JOB.NAME}"] job was throttled.	Dependent item	nomad.client.allocs.cpu.throttled_time["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Job ["{#JOB.NAME}"] CPU ticks	CPU ticks consumed by the process for the ["{#JOB.NAME}"] job in the last collection interval.	Dependent item	nomad.client.allocs.cpu.total_ticks["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] Memory allocated	Amount of memory allocated by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] Memory cached	Amount of memory cached by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.cache["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] Memory used	Total amount of memory used by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.usage["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Job ["{#JOB.NAME}"] Memory swapped	Amount of memory swapped by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.swap["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`

HashiCorp Nomad Server by HTTP

Overview

This template is designed to monitor HashiCorp Nomad servers by Zabbix. It works without any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Nomad version 1.5.6/1.6.0

Configuration

Setup

Enable telemetry in HashiCorp Nomad agent configuration file. Set the Prometheus metrics format. >Refer to the vendor documentation.
Set the values for the {$NOMAD.SERVER.API.SCHEME} and {$NOMAD.SERVER.API.PORT} macros to define the common Nomad API web schema and connection port.

Additional information:

The Nomad servers use the default web schema - HTTP and default API port - 4646. If you're using servers discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.SERVER.API.SCHEME:NECESSARY.IP}} or/and {{$NOMAD.SERVER.API.PORT:NECESSARY.IP}} on master host or template level.
Some metrics may not be collected depending on your HashiCorp Nomad agent version, configuration and cluster role.
Don't forget to define the {$NOMAD.REDUNDANCY.MIN} macro value, based on your cluster nodes amount to configure the failure tolerance triggers correctly.

Useful links:

Macros used

Name	Description	Default
{$NOMAD.SERVER.API.SCHEME}	Nomad SERVER API scheme.	`http`
{$NOMAD.SERVER.API.PORT}	Nomad SERVER API port.	`4646`
{$NOMAD.TOKEN}	Nomad authentication token.	`<PUT YOUR AUTH TOKEN>`
{$NOMAD.DATA.TIMEOUT}	Response timeout for an API.	`15s`
{$NOMAD.HTTP.PROXY}	Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used.
{$NOMAD.API.RESPONSE.SUCCESS}	HTTP API successful response code. Availability triggers threshold. Change, if needed.	`200`
{$NOMAD.SERVER.RPC.PORT}	Nomad RPC service port.	`4647`
{$NOMAD.SERVER.SERF.PORT}	Nomad serf service port.	`4648`
{$NOMAD.REDUNDANCY.MIN}	Amount of redundant servers to keep the cluster safe. Default value - '1' for the 3-nodes cluster. Change if needed.	`1`
{$NOMAD.OPEN.FDS.MAX}	Maximum percentage of used file descriptors.	`90`
{$NOMAD.SERVER.LEADER.LATENCY}	Leader last contact latency threshold.	`0.3s`

Items

Name	Description	Type	Key and additional info
Telemetry get	Telemetry data in raw format.	HTTP agent	nomad.server.data.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
Metrics	Nomad server metrics in raw format.	Dependent item	nomad.server.metrics.get Preprocessing JSON Path: `$.body` ⛔️Custom on fail: Discard value
Monitoring API response	Monitoring API response message.	Dependent item	nomad.server.data.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Internal stats get	Internal stats data in raw format.	HTTP agent	nomad.server.stats.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
Internal stats API response	Internal stats API response message.	Dependent item	nomad.server.stats.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Nomad server version	Nomad server version.	Dependent item	nomad.server.version Preprocessing JSON Path: `$.body.config.Version.Version`
Nomad raft version	Nomad raft version.	Dependent item	nomad.raft.version Preprocessing JSON Path: `$.body.stats.raft.protocol_version` ⛔️Custom on fail: Discard value
Raft peers	Current cluster raft peers amount.	Dependent item	nomad.server.raft.peers Preprocessing JSON Path: `$.body.stats.raft.num_peers` ⛔️Custom on fail: Discard value
Cluster role	Current role in the cluster.	Dependent item	nomad.server.raft.cluster_role Preprocessing JSON Path: `$.body.stats.raft.state` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
CPU time, rate	Total user and system CPU time spent in seconds.	Dependent item	nomad.server.cpu.time Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Memory used	Memory utilization in bytes.	Dependent item	nomad.server.runtime.alloc_bytes Preprocessing Prometheus pattern: `VALUE(nomad_runtime_alloc_bytes)` ⛔️Custom on fail: Discard value
Virtual memory size	Virtual memory size in bytes.	Dependent item	nomad.server.virtualmemorybytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Resident memory size	Resident memory size in bytes.	Dependent item	nomad.server.residentmemorybytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
Heap objects	Number of objects on the heap. General memory pressure indicator.	Dependent item	nomad.server.runtime.heap_objects Preprocessing Prometheus pattern: `VALUE(nomad_runtime_heap_objects)` ⛔️Custom on fail: Discard value
Open file descriptors	Number of open file descriptors.	Dependent item	nomad.server.processopenfds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	nomad.server.processmaxfds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
Goroutines	Number of goroutines and general load pressure indicator.	Dependent item	nomad.server.runtime.num_goroutines Preprocessing Prometheus pattern: `VALUE(nomad_runtime_num_goroutines)` ⛔️Custom on fail: Discard value
Evaluations pending	Evaluations that are pending until an existing evaluation for the same job completes.	Dependent item	nomad.server.broker.total_pending Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_pending)` ⛔️Custom on fail: Discard value
Evaluations ready	Number of evaluations ready to be processed.	Dependent item	nomad.server.broker.total_ready Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_ready)` ⛔️Custom on fail: Discard value
Evaluations unacked	Evaluations dispatched for processing but incomplete.	Dependent item	nomad.server.broker.total_unacked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_unacked)` ⛔️Custom on fail: Discard value
CPU shares for blocked evaluations	Amount of CPU shares requested by blocked evals.	Dependent item	nomad.server.blocked_evals.cpu Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_cpu)` ⛔️Custom on fail: Discard value
Memory shares by blocked evaluations	Amount of memory requested by blocked evals.	Dependent item	nomad.server.blocked_evals.memory Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_memory)` ⛔️Custom on fail: Discard value
CPU shares for blocked job evaluations	Amount of CPU shares requested by blocked evals of a job.	Dependent item	nomad.server.blocked_evals.job.cpu Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_job_cpu)` ⛔️Custom on fail: Discard value
Memory shares for blocked job evaluations	Amount of memory requested by blocked evals of a job.	Dependent item	nomad.server.blocked_evals.job.memory Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_job_memory)` ⛔️Custom on fail: Discard value
Evaluations blocked	Count of evals in the blocked state for any reason (cluster resource exhaustion or quota limits).	Dependent item	nomad.server.blockedevals.totalblocked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_blocked)` ⛔️Custom on fail: Discard value
Evaluations escaped	Count of evals that have escaped computed node classes. This indicates a scheduler optimization was skipped and is not usually a source of concern.	Dependent item	nomad.server.blockedevals.totalescaped Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_escaped)` ⛔️Custom on fail: Discard value
Evaluations waiting	Count of evals waiting to be enqueued.	Dependent item	nomad.server.broker.total_waiting Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_waiting)` ⛔️Custom on fail: Discard value
Evaluations blocked due to quota limit	Count of blocked evals due to quota limits (the resources for these jobs are not counted in other blockedevals metrics, except for totalblocked).	Dependent item	nomad.server.blockedevals.totalquota_limit Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_quota_limit)` ⛔️Custom on fail: Discard value
Evaluations enqueue time	Average time elapsed with evaluations waiting to be enqueued.	Dependent item	nomad.server.broker.eval_waiting Preprocessing Prometheus pattern: `AVG(nomad_nomad_eval_ack_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC evaluation acknowledgement time	Time elapsed for Eval.Ack RPC call.	Dependent item	nomad.server.eval.ack Preprocessing Prometheus pattern: `VALUE(nomad_nomad_eval_ack_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC job summary time	Time elapsed for Job.Summary RPC call.	Dependent item	nomad.server.jobsummary.getjob_summary Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_summary_get_job_summary_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Heartbeats active	Number of active heartbeat timers. Each timer represents a Nomad client connection.	Dependent item	nomad.server.heartbeat.active Preprocessing Prometheus pattern: `VALUE(nomad_nomad_heartbeat_active)` ⛔️Custom on fail: Discard value
RPC requests, rate	Number of RPC requests being handled.	Dependent item	nomad.server.rpc.request Preprocessing Prometheus pattern: `VALUE(nomad_nomad_rpc_request)` ⛔️Custom on fail: Discard value Change per second
RPC error requests, rate	Number of RPC requests being handled that result in an error.	Dependent item	nomad.server.rpc.request_error Preprocessing Prometheus pattern: `VALUE(nomad_nomad_rpc_request)` ⛔️Custom on fail: Discard value Change per second
RPC queries, rate	Number of RPC queries.	Dependent item	nomad.server.rpc.query Preprocessing Prometheus pattern: `VALUE(nomad_nomad_rpc_query)` ⛔️Custom on fail: Discard value Change per second
RPC job allocations time	Time elapsed for Job.Allocations RPC call.	Dependent item	nomad.server.job.allocations Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_allocations_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC job evaluations time	Time elapsed for Job.Evaluations RPC call.	Dependent item	nomad.server.job.evaluations Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_evaluations_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC get job time	Time elapsed for Job.GetJob RPC call.	Dependent item	nomad.server.job.get_job Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_get_job_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Plan apply time	Time elapsed to apply a plan.	Dependent item	nomad.server.plan.apply Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_apply_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Plan evaluate time	Time elapsed to evaluate a plan.	Dependent item	nomad.server.plan.evaluate Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_evaluate_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC plan submit time	Time elapsed for Plan.Submit RPC call.	Dependent item	nomad.server.plan.submit Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_submit_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Plan raft index processing time	Time elapsed that planner waits for the raft index of the plan to be processed.	Dependent item	nomad.server.plan.waitforindex Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_wait_for_index_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC list time	Time elapsed for Node.List RPC call.	Dependent item	nomad.server.client.list Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_list_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC update allocations time	Time elapsed for Node.UpdateAlloc RPC call.	Dependent item	nomad.server.client.update_alloc Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_update_alloc_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC update status time	Time elapsed for Node.UpdateStatus RPC call.	Dependent item	nomad.server.client.update_status Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_update_status_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC get client allocs time	Time elapsed for Node.GetClientAllocs RPC call.	Dependent item	nomad.server.client.getclientallocs Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_get_client_allocs_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
RPC eval dequeue time	Time elapsed for Eval.Dequeue RPC call.	Dependent item	nomad.server.client.dequeue Preprocessing Prometheus pattern: `VALUE(nomad_nomad_eval_dequeue_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Vault token last renewal	Time since last successful Vault token renewal.	Dependent item	nomad.server.vault.tokenlastrenewal Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_token_last_renewal)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Vault token next renewal	Time until next Vault token renewal attempt.	Dependent item	nomad.server.vault.tokennextrenewal Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_token_next_renewal)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Vault token TTL	Time to live for Vault token.	Dependent item	nomad.server.vault.token_ttl Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_token_ttl)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Vault tokens revoked	Count of revoked tokens.	Dependent item	nomad.server.vault.distributedtokensrevoked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_distributed_tokens_revoking)` ⛔️Custom on fail: Discard value
Jobs dead	Number of dead jobs.	Dependent item	nomad.server.job_status.dead Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_status_dead)` ⛔️Custom on fail: Set value to: `0`
Jobs pending	Number of pending jobs.	Dependent item	nomad.server.job_status.pending Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_status_pending)` ⛔️Custom on fail: Set value to: `0`
Jobs running	Number of running jobs.	Dependent item	nomad.server.job_status.running Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_status_running)` ⛔️Custom on fail: Set value to: `0`
Job allocations completed	Number of complete allocations for a job.	Dependent item	nomad.server.job_summary.complete Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_complete)` ⛔️Custom on fail: Set value to: `0`
Job allocations failed	Number of failed allocations for a job.	Dependent item	nomad.server.job_summary.failed Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_failed)` ⛔️Custom on fail: Set value to: `0`
Job allocations lost	Number of lost allocations for a job.	Dependent item	nomad.server.job_summary.lost Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_lost)` ⛔️Custom on fail: Set value to: `0`
Job allocations unknown	Number of unknown allocations for a job.	Dependent item	nomad.server.job_summary.unknown Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_unknown)` ⛔️Custom on fail: Set value to: `0`
Job allocations queued	Number of queued allocations for a job.	Dependent item	nomad.server.job_summary.queued Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_queued)` ⛔️Custom on fail: Set value to: `0`
Job allocations running	Number of running allocations for a job.	Dependent item	nomad.server.job_summary.running Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_running)` ⛔️Custom on fail: Set value to: `0`
Job allocations starting	Number of starting allocations for a job.	Dependent item	nomad.server.job_summary.starting Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_starting)` ⛔️Custom on fail: Set value to: `0`
Gossip time	Time elapsed to broadcast gossip messages.	Dependent item	nomad.server.memberlist.gossip Preprocessing Prometheus pattern: `VALUE(nomad_memberlist_gossip_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Leader barrier time	Time elapsed to establish a raft barrier during leader transition.	Dependent item	nomad.server.leader.barrier Preprocessing Prometheus pattern: `VALUE(nomad_nomad_leader_barrier_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Reconcile peer time	Time elapsed to reconcile a serf peer with state store.	Dependent item	nomad.server.leader.reconcile_member Preprocessing Prometheus pattern: `VALUE(nomad_nomad_leader_reconcileMember_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Total reconcile time	Time elapsed to reconcile all serf peers with state store.	Dependent item	nomad.server.leader.reconcile Preprocessing Prometheus pattern: `VALUE(nomad_nomad_leader_reconcile_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Leader last contact	Time since last contact to leader. General indicator of Raft latency.	Dependent item	nomad.server.raft.leader.lastContact Preprocessing Prometheus pattern: `VALUE(nomad_raft_leader_lastContact{quantile="0.99"})` ⛔️Custom on fail: Discard value Replace: `NaN -> 0` Custom multiplier: `0.001`
Plan queue	Count of evals in the plan queue.	Dependent item	nomad.server.plan.queue_depth Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_queue_depth)` ⛔️Custom on fail: Discard value
Worker evaluation create time	Time elapsed for worker to create an eval.	Dependent item	nomad.server.worker.create_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Worker evaluation dequeue time	Time elapsed for worker to dequeue an eval.	Dependent item	nomad.server.worker.dequeue_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Worker invoke scheduler time	Time elapsed for worker to invoke the scheduler.	Dependent item	nomad.server.worker.invokeschedulerservice Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_invoke_scheduler_service_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Worker acknowledgement send time	Time elapsed for worker to send acknowledgement.	Dependent item	nomad.server.worker.send_ack Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_send_ack_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Worker submit plan time	Time elapsed for worker to submit plan.	Dependent item	nomad.server.worker.submit_plan Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_submit_plan_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Worker update evaluation time	Time elapsed for worker to submit updated eval.	Dependent item	nomad.server.worker.update_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_update_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Worker log replication time	Time elapsed that worker waits for the raft index of the eval to be processed.	Dependent item	nomad.server.worker.waitforindex Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_wait_for_index_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Raft calls blocked, rate	Count of blocking raft API calls.	Dependent item	nomad.server.raft.barrier Preprocessing Prometheus pattern: `VALUE(nomad_raft_barrier)` ⛔️Custom on fail: Discard value Change per second
Raft commit logs enqueued	Count of logs enqueued.	Dependent item	nomad.server.raft.commitnumlogs Preprocessing Prometheus pattern: `VALUE(nomad_raft_commitNumLogs)` ⛔️Custom on fail: Discard value
Raft transactions, rate	Number of Raft transactions.	Dependent item	nomad.server.raft.apply Preprocessing Prometheus pattern: `VALUE(nomad_raft_apply)` ⛔️Custom on fail: Set value to: `0` Change per second
Raft commit time	Time elapsed to commit writes.	Dependent item	nomad.server.raft.commit_time Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Raft transaction commit time	Raft transaction commit time.	Dependent item	nomad.server.raft.replication.appendEntries Preprocessing Prometheus pattern: `AVG(nomad_raft_replication_appendEntries_rpc)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
FSM apply time	Time elapsed to apply write to FSM.	Dependent item	nomad.server.raft.fsm.apply Preprocessing Prometheus pattern: `VALUE(nomad_raft_fsm_apply_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
FSM enqueue time	Time elapsed to enqueue write to FSM.	Dependent item	nomad.server.raft.fsm.enqueue Preprocessing Prometheus pattern: `VALUE(nomad_raft_fsm_enqueue_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
FSM autopilot time	Time elapsed to apply Autopilot raft entry.	Dependent item	nomad.server.raft.fsm.autopilot Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_autopilot_sum)` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `1e-09`
FSM register node time	Time elapsed to apply RegisterNode raft entry.	Dependent item	nomad.server.raft.fsm.register_node Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_register_node_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
FSM index	Current index applied to FSM.	Dependent item	nomad.server.raft.applied_index Preprocessing Prometheus pattern: `VALUE(nomad_raft_appliedIndex)` ⛔️Custom on fail: Discard value
Raft last index	Most recent index seen.	Dependent item	nomad.server.raft.last_index Preprocessing Prometheus pattern: `VALUE(nomad_raft_lastIndex)` ⛔️Custom on fail: Discard value
Dispatch log time	Time elapsed to write log, mark in flight, and start replication.	Dependent item	nomad.server.raft.leader.dispatch_log Preprocessing Prometheus pattern: `VALUE(nomad_raft_leader_dispatchLog_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Logs dispatched	Count of logs dispatched.	Dependent item	nomad.server.raft.leader.dispatchnumlogs Preprocessing Prometheus pattern: `VALUE(nomad_raft_leader_dispatchNumLogs)` ⛔️Custom on fail: Set value to: `0`
Heartbeat fails	Count of failing to heartbeat and starting election.	Dependent item	nomad.server.raft.transition.heartbeat_timeout Preprocessing Prometheus pattern: `VALUE(nomad_raft_transition_heartbeat_timeout)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Objects freed, rate	Count of objects freed from heap by go runtime GC.	Dependent item	nomad.server.runtime.free_count Preprocessing Prometheus pattern: `VALUE(nomad_runtime_free_count)` ⛔️Custom on fail: Discard value Change per second
GC pause time	Go runtime GC pause times.	Dependent item	nomad.server.runtime.gcpausens Preprocessing Prometheus pattern: `VALUE(nomad_runtime_gc_pause_ns_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
GC metadata size	Go runtime GC metadata size in bytes.	Dependent item	nomad.server.runtime.sys_bytes Preprocessing Prometheus pattern: `VALUE(nomad_runtime_sys_bytes)` ⛔️Custom on fail: Discard value
GC runs	Count of go runtime GC runs.	Dependent item	nomad.server.runtime.totalgcruns Preprocessing Prometheus pattern: `VALUE(nomad_runtime_total_gc_runs)` ⛔️Custom on fail: Discard value
Memberlist events	Count of memberlist events received.	Dependent item	nomad.server.serf.queue.event Preprocessing Prometheus pattern: `VALUE(nomad_serf_queue_Event_sum)` ⛔️Custom on fail: Discard value
Memberlist changes	Count of memberlist changes.	Dependent item	nomad.server.serf.queue.intent Preprocessing Prometheus pattern: `VALUE(nomad_serf_queue_Intent_sum)` ⛔️Custom on fail: Discard value
Memberlist queries	Count of memberlist queries.	Dependent item	nomad.server.serf.queue.queries Preprocessing Prometheus pattern: `VALUE(nomad_serf_queue_Query_sum)` ⛔️Custom on fail: Discard value
Snapshot index	Current snapshot index.	Dependent item	nomad.server.state.snapshot.index Preprocessing Prometheus pattern: `VALUE(nomad_state_snapshotIndex)` ⛔️Custom on fail: Discard value
Services ready to schedule	Count of service evals ready to be scheduled.	Dependent item	nomad.server.broker.service_ready Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_service_ready)` ⛔️Custom on fail: Discard value
Services unacknowledged	Count of unacknowledged service evals.	Dependent item	nomad.server.broker.service_unacked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_service_unacked)` ⛔️Custom on fail: Discard value
System evaluations ready to schedule	Count of service evals ready to be scheduled.	Dependent item	nomad.server.broker.system_ready Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_system_ready)` ⛔️Custom on fail: Discard value
System evaluations unacknowledged	Count of unacknowledged system evals.	Dependent item	nomad.server.broker.system_unacked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_system_unacked)` ⛔️Custom on fail: Discard value
BoltDB free pages	Number of BoltDB free pages.	Dependent item	nomad.server.raft.boltdb.numfreepages Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_numFreePages)` ⛔️Custom on fail: Discard value
BoltDB pending pages	Number of BoltDB pending pages.	Dependent item	nomad.server.raft.boltdb.numpendingpages Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_numPendingPages)` ⛔️Custom on fail: Discard value
BoltDB free page bytes	Number of free page bytes.	Dependent item	nomad.server.raft.boltdb.freepagebytes Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_freePageBytes)` ⛔️Custom on fail: Discard value
BoltDB freelist bytes	Number of freelist bytes.	Dependent item	nomad.server.raft.boltdb.freelist_bytes Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_freelistBytes)` ⛔️Custom on fail: Discard value
BoltDB read transactions, rate	Count of total read transactions.	Dependent item	nomad.server.raft.boltdb.totalreadtxn Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_totalReadTxn)` ⛔️Custom on fail: Discard value Change per second
BoltDB open read transactions	Number of current open read transactions.	Dependent item	nomad.server.raft.boltdb.openreadtxn Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_openReadTxn)` ⛔️Custom on fail: Discard value
BoltDB pages in use	Number of pages in use.	Dependent item	nomad.server.raft.boltdb.txstats.page_count Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_pageCount)` ⛔️Custom on fail: Discard value
BoltDB page allocations, rate	Number of page allocations.	Dependent item	nomad.server.raft.boltdb.txstats.page_alloc Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_pageAlloc)` ⛔️Custom on fail: Discard value Change per second
BoltDB cursors	Count of total database cursors.	Dependent item	nomad.server.raft.boltdb.txstats.cursor_count Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_cursorCount)` ⛔️Custom on fail: Discard value Change per second
BoltDB nodes, rate	Count of total database nodes.	Dependent item	nomad.server.raft.boltdb.txstats.node_count Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_nodeCount)` ⛔️Custom on fail: Discard value Change per second
BoltDB node dereferences, rate	Count of total database node dereferences.	Dependent item	nomad.server.raft.boltdb.txstats.node_deref Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_nodeDeref)` ⛔️Custom on fail: Discard value Change per second
BoltDB rebalance operations, rate	Count of total rebalance operations.	Dependent item	nomad.server.raft.boltdb.txstats.rebalance Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_rebalance)` ⛔️Custom on fail: Discard value Change per second
BoltDB split operations, rate	Count of total split operations.	Dependent item	nomad.server.raft.boltdb.txstats.split Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_split)` ⛔️Custom on fail: Discard value Change per second
BoltDB spill operations, rate	Count of total spill operations.	Dependent item	nomad.server.raft.boltdb.txstats.spill Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_spill)` ⛔️Custom on fail: Discard value Change per second
BoltDB write operations, rate	Count of total write operations.	Dependent item	nomad.server.raft.boltdb.txstats.write Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_write)` ⛔️Custom on fail: Discard value Change per second
BoltDB rebalance time	Sample of rebalance operation times.	Dependent item	nomad.server.raft.boltdb.txstats.rebalance_time Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_rebalanceTime_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
BoltDB spill time	Sample of spill operation times.	Dependent item	nomad.server.raft.boltdb.txstats.spill_time Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_spillTime_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
BoltDB write time	Sample of write operation times.	Dependent item	nomad.server.raft.boltdb.txstats.write_time Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_writeTime_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Service [rpc] state	Current [rpc] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
Service [serf] state	Current [serf] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
Namespace list time	Time elapsed for Namespace.ListNamespaces.	Dependent item	nomad.server.namespace.list_namespace Preprocessing Prometheus pattern: `VALUE(nomad_nomad_namespace_list_namespace_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Autopilot state	Current autopilot state.	Dependent item	nomad.server.autopilot.state Preprocessing Prometheus pattern: `VALUE(nomad_nomad_autopilot_healthy)` ⛔️Custom on fail: Discard value
Autopilot failure tolerance	The number of redundant healthy servers that can fail without causing an outage.	Dependent item	nomad.server.autopilot.failure_tolerance Preprocessing Prometheus pattern: `VALUE(nomad_nomad_autopilot_failure_tolerance)` ⛔️Custom on fail: Discard value
FSM allocation client update time	Time elapsed to apply AllocClientUpdate raft entry.	Dependent item	nomad.server.allocclientupdate Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_alloc_client_update_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
FSM apply plan results time	Time elapsed to apply ApplyPlanResults raft entry.	Dependent item	nomad.server.fsm.applyplanresults Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_apply_plan_results_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
FSM update evaluation time	Time elapsed to apply UpdateEval raft entry.	Dependent item	nomad.server.fsm.update_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_update_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
FSM job registration time	Time elapsed to apply RegisterJob raft entry.	Dependent item	nomad.server.fsm.register_job Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_register_job_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Allocation reschedule attempts	Count of attempts to reschedule an allocation.	Dependent item	nomad.server.scheduler.allocs.rescheduled.attempted Preprocessing Prometheus pattern: `SUM(nomad_scheduler_allocs_reschedule_attempted)` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity
HashiCorp Nomad Server: Monitoring API connection has failed	Monitoring API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Server by HTTP/nomad.server.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Internal stats API connection has failed	Internal stats API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Server by HTTP/nomad.server.stats.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes Depends on: HashiCorp Nomad Server: Monitoring API connection has failed
HashiCorp Nomad Server: Nomad server version has changed	Nomad server version has changed.	`change(/HashiCorp Nomad Server by HTTP/nomad.server.version)<>0`\|Info	Manual close: Yes
HashiCorp Nomad Server: Cluster role has changed	Cluster role has changed.	`change(/HashiCorp Nomad Server by HTTP/nomad.server.raft.cluster_role) <> 0`\|Info	Manual close: Yes
HashiCorp Nomad Server: Current number of open files is too high	Heavy file descriptor usage (i.e., near the process file descriptor limit) indicates a potential file descriptor exhaustion issue.	`min(/HashiCorp Nomad Server by HTTP/nomad.server.process_open_fds,5m)/last(/HashiCorp Nomad Server by HTTP/nomad.server.process_max_fds)*100>{$NOMAD.OPEN.FDS.MAX}`\|Warning
HashiCorp Nomad Server: Dead jobs found	Jobs with the `Dead` state discovered. Check the {$NOMAD.SERVER.API.SCHEME}://{HOST.IP}:{$NOMAD.SERVER.API.PORT}/v1/jobs URL for the details.	`last(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead) > 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead,5m) = 0`\|Warning	Manual close: Yes
HashiCorp Nomad Server: Leader last contact timeout exceeded	The nomad.raft.leader.lastContact metric is a general indicator of Raft latency which can be used to observe how Raft timing is performing and guide infrastructure provisioning. If this number trends upwards, look at CPU, disk IOPs, and network latency. nomad.raft.leader.lastContact should not get too close to the leader lease timeout of 500ms.	`min(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) >= {$NOMAD.SERVER.LEADER.LATENCY} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) = 0`\|Warning
HashiCorp Nomad Server: Service [rpc] is down	Cannot establish the connection to [rpc] service port {$NOMAD.SERVER.RPC.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Service [serf] is down	Cannot establish the connection to [serf] service port {$NOMAD.SERVER.SERF.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Autopilot is unhealthy	The autopilot is in unhealthy state. The successful failover probability is extremely low.	`last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state) = 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state,5m) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Autopilot redundancy is low	The autopilot redundancy is low. Cluster crash risk is high due to one more server failure.	`last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance) < {$NOMAD.REDUNDANCY.MIN} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance,5m) = 0`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_plus_http

View README Download JSON

NGINX Plus by HTTP

Overview

This template is designed for the effortless deployment of Nginx Plus monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

NGINX Plus 1.19.10

Configuration

Setup

Enable the NGINX Plus API. > Refer to the vendor documentation.
Set the {$NGINX.API.ENDPOINT} such as <scheme>://<host>:<port>/<location>/.

Note that depending on the number of zones and upstreams discovery operation may be expensive. Therefore, use the following filters with these macros:

{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES}
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES}
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES}
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES}
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES}
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES}
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES}
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES}
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES}
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES}
{$NGINX.LLD.FILTER.RESOLVER.MATCHES}
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES}

Macros used

Name	Description	Default
{$NGINX.API.ENDPOINT}	NGINX Plus API URL in the format `<scheme>://<host>:<port>/<location>/`.
{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES}	The filter to include the necessary discovered HTTP server zones.	`.*`
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES}	The filter to exclude discovered HTTP server zones.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES}	The filter to include the necessary discovered HTTP location zones.	`.*`
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES}	The filter to exclude discovered HTTP location zones.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES}	The filter to include the necessary discovered HTTP upstreams.	`.*`
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES}	The filter to exclude discovered HTTP upstreams.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES}	The filter to include discovered server zones of the "stream" directive.	`.*`
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES}	The filter to exclude discovered server zones of the "stream" directive.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES}	The filter to include the necessary discovered upstreams of the "stream" directive.	`.*`
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES}	The filter to exclude discovered upstreams of the "stream" directive	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.RESOLVER.MATCHES}	The filter to include the necessary discovered `Resolvers`.	`.*`
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES}	The filter to exclude discovered `Resolvers`.	`CHANGE_IF_NEEDED`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`
{$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}	The maximum percentage of errors with the status code `4xx` (for a trigger expression).	`5`
{$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}	The maximum percentage of errors with the status code `5xx` (for a trigger expression).	`5`

Items

Name	Description	Type	Key and additional info
Get info	Return status of the NGINX running instance.	HTTP agent	nginx.info
Get connections	Returns the statistics of client connections.	HTTP agent	nginx.connections
Get SSL	Returns the SSL statistics.	HTTP agent	nginx.ssl
Get requests	Returns the status of the client's HTTP requests.	HTTP agent	nginx.requests
Get HTTP zones	Returns the status information for each HTTP server zone.	HTTP agent	nginx.http.server_zones
Get HTTP location zones	Returns the status information for each HTTP location zone.	HTTP agent	nginx.http.location_zones
Get HTTP upstreams	Returns the status of each HTTP upstream server group and its servers.	HTTP agent	nginx.http.upstreams
Get Stream server zones	Returns the status information for each server zone configured in the "stream" directive.	HTTP agent	nginx.stream.server_zones
Get Stream upstreams	Returns status of each stream upstream server group and its servers.	HTTP agent	nginx.stream.upstreams
Get resolvers	Returns the status information for each Resolver zone.	HTTP agent	nginx.resolvers
Get info error	The description of NGINX errors.	Dependent item	nginx.info.error Preprocessing JSON Path: `$.error.text` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Version	A version number of NGINX.	Dependent item	nginx.info.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `3h`
Address	The address of the server that accepted status request.	Dependent item	nginx.info.address Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `3h`
Generation	The total number of configuration reloads.	Dependent item	nginx.info.generation Preprocessing JSON Path: `$.generation` Discard unchanged with heartbeat: `30m`
Uptime	The server uptime.	Dependent item	nginx.info.uptime Preprocessing JSON Path: `$.load_timestamp` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Connections accepted, rate	The total number of accepted client connections per second.	Dependent item	nginx.connections.accepted.rate Preprocessing JSON Path: `$.accepted` Change per second
Connections dropped	The total number of dropped client connections.	Dependent item	nginx.connections.dropped Preprocessing JSON Path: `$.dropped`
Connections active	The current number of active client connections.	Dependent item	nginx.connections.active Preprocessing JSON Path: `$.active`
Connections idle	The current number of idle client connections.	Dependent item	nginx.connections.idle Preprocessing JSON Path: `$.idle`
SSL handshakes, rate	The total number of successful SSL handshakes per second.	Dependent item	nginx.ssl.handshakes.rate Preprocessing JSON Path: `$.handshakes` Change per second
SSL handshakes failed, rate	The total number of failed SSL handshakes per second.	Dependent item	nginx.ssl.handshakes_failed.rate Preprocessing JSON Path: `$.handshakes_failed` Change per second
SSL session reuses, rate	The total number of session reuses during SSL handshake per second.	Dependent item	nginx.ssl.session_reuses.rate Preprocessing JSON Path: `$.session_reuses` Change per second
Requests total, rate	The total number of client requests per second.	Dependent item	nginx.requests.total.rate Preprocessing JSON Path: `$.total` Change per second
Requests current	The current number of client requests.	Dependent item	nginx.requests.current Preprocessing JSON Path: `$.current`

Triggers

Name	Description	Expression	Severity
NGINX Plus: Server response error		`length(last(/NGINX Plus by HTTP/nginx.info.error))>0`\|High
NGINX Plus: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/NGINX Plus by HTTP/nginx.info.version,#1)<>last(/NGINX Plus by HTTP/nginx.info.version,#2) and length(last(/NGINX Plus by HTTP/nginx.info.version))>0`\|Info	Manual close: Yes
NGINX Plus: Host has been restarted	Uptime is less than 10 minutes.	`last(/NGINX Plus by HTTP/nginx.info.uptime)<10m`\|Info	Manual close: Yes
NGINX Plus: Failed to fetch info data	Zabbix has not received any data for metrics for the last 30 minutes	`nodata(/NGINX Plus by HTTP/nginx.info.uptime,30m)=1`\|Warning	Manual close: Yes
NGINX Plus: High connections drop rate	The rate of dropped connections is greater than `{$NGINX.DROP_RATE.MAX.WARN}` for the last 5 minutes.	`min(/NGINX Plus by HTTP/nginx.connections.dropped,5m) > {$NGINX.DROP_RATE.MAX.WARN}`\|Warning

LLD rule HTTP server zones discovery

Name Description Type Key and additional info

HTTP server zones discovery

Dependent item

nginx.http.server_zones.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP server zones discovery

Name	Description	Type	Key and additional info
HTTP server zone [{#NAME}]: Raw data	The raw data of the HTTP server zone with the name `{#NAME}` .	Dependent item	nginx.http.server_zones.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
HTTP server zone [{#NAME}]: Processing	The number of client requests that are currently being processed.	Dependent item	nginx.http.server_zones.processing[{#NAME}] Preprocessing JSON Path: `$.processing`
HTTP server zone [{#NAME}]: Requests, rate	The total number of client requests received from clients per second.	Dependent item	nginx.http.server_zones.requests.rate[{#NAME}] Preprocessing JSON Path: `$.requests` Change per second
HTTP server zone [{#NAME}]: Responses 1xx, rate	The number of responses with `1xx` status code per second.	Dependent item	nginx.http.server_zones.responses.1xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.1xx` Change per second
HTTP server zone [{#NAME}]: Responses 2xx, rate	The number of responses with `2xx` status code per second.	Dependent item	nginx.http.server_zones.responses.2xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.2xx` Change per second
HTTP server zone [{#NAME}]: Responses 3xx, rate	The number of responses with `3xx` status code per second.	Dependent item	nginx.http.server_zones.responses.3xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.3xx` Change per second
HTTP server zone [{#NAME}]: Responses 4xx, rate	The number of responses with `4xx` status code per second.	Dependent item	nginx.http.server_zones.responses.4xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.4xx` Change per second
HTTP server zone [{#NAME}]: Responses 5xx, rate	The number of responses with `5xx` status code per second.	Dependent item	nginx.http.server_zones.responses.5xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.5xx` Change per second
HTTP server zone [{#NAME}]: Responses total, rate	The total number of responses sent to clients per second.	Dependent item	nginx.http.server_zones.responses.total.rate[{#NAME}] Preprocessing JSON Path: `$.responses.total` Change per second
HTTP server zone [{#NAME}]: Discarded, rate	The total number of requests completed without sending a response per second.	Dependent item	nginx.http.server_zones.discarded.rate[{#NAME}] Preprocessing JSON Path: `$.discarded` Change per second
HTTP server zone [{#NAME}]: Received, rate	The total number of bytes received from clients per second.	Dependent item	nginx.http.server_zones.received.rate[{#NAME}] Preprocessing JSON Path: `$.received` Change per second
HTTP server zone [{#NAME}]: Sent, rate	The total number of bytes sent to clients per second.	Dependent item	nginx.http.server_zones.sent.rate[{#NAME}] Preprocessing JSON Path: `$.sent` Change per second

LLD rule HTTP location zones discovery

Name Description Type Key and additional info

HTTP location zones discovery

Dependent item

nginx.http.location_zones.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP location zones discovery

Name	Description	Type	Key and additional info
HTTP location zone [{#NAME}]: Raw data	The raw data of the location zone with the name `{#NAME}`.	Dependent item	nginx.http.location_zones.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
HTTP location zone [{#NAME}]: Requests, rate	The total number of client requests received from clients per second.	Dependent item	nginx.http.location_zones.requests.rate[{#NAME}] Preprocessing JSON Path: `$.requests` Change per second
HTTP location zone [{#NAME}]: Responses 1xx, rate	The number of responses with `1xx` status code per second.	Dependent item	nginx.http.location_zones.responses.1xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.1xx` Change per second
HTTP location zone [{#NAME}]: Responses 2xx, rate	The number of responses with `2xx` status code per second.	Dependent item	nginx.http.location_zones.responses.2xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.2xx` Change per second
HTTP location zone [{#NAME}]: Responses 3xx, rate	The number of responses with `3xx` status code per second.	Dependent item	nginx.http.location_zones.responses.3xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.3xx` Change per second
HTTP location zone [{#NAME}]: Responses 4xx, rate	The number of responses with `4xx` status code per second.	Dependent item	nginx.http.location_zones.responses.4xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.4xx` Change per second
HTTP location zone [{#NAME}]: Responses 5xx, rate	The number of responses with `5xx` status code per second.	Dependent item	nginx.http.location_zones.responses.5xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.5xx` Change per second
HTTP location zone [{#NAME}]: Responses total, rate	The total number of responses sent to clients per second.	Dependent item	nginx.http.location_zones.responses.total.rate[{#NAME}] Preprocessing JSON Path: `$.responses.total` Change per second
HTTP location zone [{#NAME}]: Discarded, rate	The total number of requests completed without sending a response per second.	Dependent item	nginx.http.location_zones.discarded.rate[{#NAME}] Preprocessing JSON Path: `$.discarded` Change per second
HTTP location zone [{#NAME}]: Received, rate	The total number of bytes received from clients per second.	Dependent item	nginx.http.location_zones.received.rate[{#NAME}] Preprocessing JSON Path: `$.received` Change per second
HTTP location zone [{#NAME}]: Sent, rate	The total number of bytes sent to clients per second.	Dependent item	nginx.http.location_zones.sent.rate[{#NAME}] Preprocessing JSON Path: `$.sent` Change per second

LLD rule HTTP upstreams discovery

Name Description Type Key and additional info

HTTP upstreams discovery

Dependent item

nginx.http.upstreams.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP upstreams discovery

Name	Description	Type	Key and additional info
HTTP upstream [{#NAME}]: Raw data	The raw data of the HTTP upstream with the name `{#NAME}`.	Dependent item	nginx.http.upstreams.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
HTTP upstream [{#NAME}]: Keepalive	The current number of idle keepalive connections.	Dependent item	nginx.http.upstreams.keepalive[{#NAME}] Preprocessing JSON Path: `$.keepalive`
HTTP upstream [{#NAME}]: Zombies	The current number of servers removed from the group but still processing active client requests.	Dependent item	nginx.http.upstreams.zombies[{#NAME}] Preprocessing JSON Path: `$.zombies`
HTTP upstream [{#NAME}]: Zone	The name of the shared memory zone that keeps the group's configuration and run-time state.	Dependent item	nginx.http.upstreams.zone[{#NAME}] Preprocessing JSON Path: `$.zone` Discard unchanged with heartbeat: `3h`

LLD rule HTTP upstream peers discovery

Name Description Type Key and additional info

HTTP upstream peers discovery

Dependent item

nginx.http.upstream.peers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP upstream peers discovery

Name	Description	Type	Key and additional info
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data	The raw data of the HTTP upstream with the name `[{#UPSTREAM}]`and peer with the name`[{#PEER}]`.	Dependent item	nginx.http.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$['{#UPSTREAM}'].peers[?(@.server == '{#PEER}')].first()`
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: State	The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”.	Dependent item	nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.state` Discard unchanged with heartbeat: `3h`
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Active	The current number of active connections.	Dependent item	nginx.http.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.active`
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Requests, rate	The total number of client requests forwarded to this server per second.	Dependent item	nginx.http.upstream.peer.requests.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.requests` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 1xx, rate	The number of responses with `1xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.1xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.1xx` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 2xx, rate	The number of responses with `2xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.2xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.2xx` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 3xx, rate	The number of responses with `3xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.3xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.3xx` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 4xx, rate	The number of responses with `4xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.4xx` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 5xx, rate	The number of responses with `5xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.5xx` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses total, rate	The total number of responses obtained from this server.	Dependent item	nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.total` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate	The total number of bytes sent to this server per second.	Dependent item	nginx.http.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.sent` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate	The total number of bytes received from this server per second.	Dependent item	nginx.http.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.received` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate	The total number of unsuccessful attempts to communicate with the server per second.	Dependent item	nginx.http.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.fails` Change per second
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail	Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the `max_fails` threshold.	Dependent item	nginx.http.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.unavail`
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Header time	The average time to get the response header from the server.	Dependent item	nginx.http.upstream.peer.header_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.header_time` ⛔️Custom on fail: Discard value
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Response time	The average time to get the full response from the server.	Dependent item	nginx.http.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.response_time` ⛔️Custom on fail: Discard value
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check	The total number of health check requests made.	Dependent item	nginx.http.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.checks`
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails	The number of failed health checks.	Dependent item	nginx.http.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.fails`
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy	Displays how many times the server has become `unhealthy` (the state - “unhealthy”.	Dependent item	nginx.http.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.unhealthy`

Trigger prototypes for HTTP upstream peers discovery

Name	Description	Expression	Severity	Dependencies and additional info
NGINX Plus: HTTP upstream server is not in UP or DOWN state.		`find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0`\|Warning
NGINX Plus: Too many HTTP requests with code 4xx		`sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}/100))`\|Warning
NGINX Plus: Too many HTTP requests with code 5xx		`sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}/100))`\|High

LLD rule Stream server zones discovery

Name Description Type Key and additional info

Stream server zones discovery

Dependent item

nginx.stream.server_zones.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Stream server zones discovery

Name	Description	Type	Key and additional info
Stream server zone [{#NAME}]: Raw data	The raw data of server zone with the name `{#NAME}`, configured in the "stream" directive.	Dependent item	nginx.stream.server_zones.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Stream server zone [{#NAME}]: Processing	The number of client connections that are currently being processed.	Dependent item	nginx.stream.server_zones.processing[{#NAME}] Preprocessing JSON Path: `$.processing`
Stream server zone [{#NAME}]: Connections, rate	The total number of connections accepted from clients per second.	Dependent item	nginx.stream.server_zones.connections.rate[{#NAME}] Preprocessing JSON Path: `$.connections` Change per second
Stream server zone [{#NAME}]: Sessions 2xx, rate	The total number of sessions completed with status code `2xx` per second.	Dependent item	nginx.stream.server_zones.sessions.2xx.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.2xx` Change per second
Stream server zone [{#NAME}]: Sessions 4xx, rate	The total number of sessions completed with status code `4xx` per second.	Dependent item	nginx.stream.server_zones.sessions.4xx.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.4xx` Change per second
Stream server zone [{#NAME}]: Sessions 5xx, rate	The total number of sessions completed with status code `5xx` per second.	Dependent item	nginx.stream.server_zones.sessions.5xx.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.5xx` Change per second
Stream server zone [{#NAME}]: Sessions total, rate	The total number of completed client sessions per second.	Dependent item	nginx.stream.server_zones.sessions.total.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.total` Change per second
Stream server zone [{#NAME}]: Discarded, rate	The total number of connections completed without creating a session per second.	Dependent item	nginx.stream.server_zones.discarded.rate[{#NAME}] Preprocessing JSON Path: `$.discarded` Change per second
Stream server zone [{#NAME}]: Received, rate	The total number of bytes received from clients per second.	Dependent item	nginx.stream.server_zones.received.rate[{#NAME}] Preprocessing JSON Path: `$.received` Change per second
Stream server zone [{#NAME}]: Sent, rate	The total number of bytes sent to clients per second.	Dependent item	nginx.stream.server_zones.sent.rate[{#NAME}] Preprocessing JSON Path: `$.sent` Change per second

LLD rule Stream upstreams discovery

Name Description Type Key and additional info

Stream upstreams discovery

Dependent item

nginx.stream.upstreams.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Stream upstreams discovery

Name Description Type Key and additional info

Stream upstream [{#NAME}]: Raw data

The raw data of the upstream with the name [{#UPSTREAM}], configured in the "stream" directive.

Dependent item

nginx.stream.upstreams.raw[{#NAME}]

Preprocessing

JSON Path: $['{#NAME}']

Stream upstream [{#NAME}]: Zombies

Dependent item

nginx.stream.upstreams.zombies[{#NAME}]

Preprocessing

JSON Path: $.zombies

Stream upstream [{#NAME}]: Zone

Dependent item

nginx.stream.upstreams.zone[{#NAME}]

Preprocessing

JSON Path: $.zone
Discard unchanged with heartbeat: 3h

LLD rule Stream upstream peers discovery

Name Description Type Key and additional info

Stream upstream peers discovery

Dependent item

nginx.stream.upstream.peers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Stream upstream peers discovery

Name	Description	Type	Key and additional info
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data	The raw data of the upstream with the name `[{#UPSTREAM}]`and peer with the name`[{#PEER}]`, configured in the "stream" directive.	Dependent item	nginx.stream.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$['{#UPSTREAM}'].peers[?(@.server == '{#PEER}')].first()`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: State	The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”.	Dependent item	nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.state` Discard unchanged with heartbeat: `3h`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Active	The current number of connections.	Dependent item	nginx.stream.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.active`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate	The total number of bytes sent to this server per second.	Dependent item	nginx.stream.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.sent` Change per second
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate	The total number of bytes received from this server per second.	Dependent item	nginx.stream.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.received` Change per second
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate	The total number of unsuccessful attempts to communicate with the server per second.	Dependent item	nginx.stream.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.fails` Change per second
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail	Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the `max_fails` threshold.	Dependent item	nginx.stream.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.unavail`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connections	The total number of client connections forwarded to this server.	Dependent item	nginx.stream.upstream.peer.connections.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.connections`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connect time	The average time to connect to the upstream server.	Dependent item	nginx.stream.upstream.peer.connect_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.connect_time` ⛔️Custom on fail: Discard value
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: First byte time	The average time to receive the first byte of data.	Dependent item	nginx.stream.upstream.peer.firstbytetime.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.first_byte_time` ⛔️Custom on fail: Discard value
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Response time	The average time to receive the last byte of data.	Dependent item	nginx.stream.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.response_time` ⛔️Custom on fail: Discard value
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check	The total number of health check requests made.	Dependent item	nginx.stream.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.checks`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails	The number of failed health checks.	Dependent item	nginx.stream.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.fails`
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy	Displays how many times the server has become `unhealthy` (the state - “unhealthy”).	Dependent item	nginx.stream.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.unhealthy`

Trigger prototypes for Stream upstream peers discovery

Name	Description	Expression	Severity	Dependencies and additional info
NGINX Plus: Stream upstream server is not in UP or DOWN state.		`find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0`\|Warning

LLD rule Resolvers discovery

Name Description Type Key and additional info

Resolvers discovery

Dependent item

nginx.resolvers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Resolvers discovery

Name	Description	Type	Key and additional info
Resolver [{#NAME}]: Raw data	The raw data of the `Resolver` with the name `{#NAME}`.	Dependent item	nginx.resolvers.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Resolver [{#NAME}]: Requests name, rate	The total number of requests to resolve names to addresses per second.	Dependent item	nginx.resolvers.requests.name.rate[{#NAME}] Preprocessing JSON Path: `$.requests.name` Change per second
Resolver [{#NAME}]: Requests srv, rate	The total number of requests to resolve SRV records per second.	Dependent item	nginx.resolvers.requests.srv.rate[{#NAME}] Preprocessing JSON Path: `$.requests.srv` Change per second
Resolver [{#NAME}]: Requests addr, rate	The total number of requests to resolve addresses to names per second.	Dependent item	nginx.resolvers.requests.addr.rate[{#NAME}] Preprocessing JSON Path: `$.requests.addr` Change per second
Resolver [{#NAME}]: Responses noerror, rate	The total number of successful responses per second.	Dependent item	nginx.resolvers.responses.noerror.rate[{#NAME}] Preprocessing JSON Path: `$.responses.noerror` Change per second
Resolver [{#NAME}]: Responses formerr, rate	The total number of `FORMERR` (format error) responses per second.	Dependent item	nginx.resolvers.responses.formerr.rate[{#NAME}] Preprocessing JSON Path: `$.responses.formerr` Change per second
Resolver [{#NAME}]: Responses servfail, rate	The total number of `SERVFAIL` (server failure) responses per second.	Dependent item	nginx.resolvers.responses.servfail.rate[{#NAME}] Preprocessing JSON Path: `$.responses.servfail` Change per second
Resolver [{#NAME}]: Responses nxdomain, rate	The total number of `NXDOMAIN` (host not found) responses per second.	Dependent item	nginx.resolvers.responses.nxdomain.rate[{#NAME}] Preprocessing JSON Path: `$.responses.nxdomain` Change per second
Resolver [{#NAME}]: Responses notimp, rate	The total number of `NOTIMP` (unimplemented) responses per second.	Dependent item	nginx.resolvers.responses.notimp.rate[{#NAME}] Preprocessing JSON Path: `$.responses.notimp` Change per second
Resolver [{#NAME}]: Responses refused, rate	The total number of `REFUSED` (operation refused) responses per second.	Dependent item	nginx.resolvers.responses.refused.rate[{#NAME}] Preprocessing JSON Path: `$.responses.refused` Change per second
Resolver [{#NAME}]: Responses timedout, rate	The total number of timed out requests per second.	Dependent item	nginx.resolvers.responses.timedout.rate[{#NAME}] Preprocessing JSON Path: `$.responses.timedout` Change per second
Resolver [{#NAME}]: Responses unknown, rate	The total number of requests completed with an unknown error per second.	Dependent item	nginx.resolvers.responses.unknown.rate[{#NAME}] Preprocessing JSON Path: `$.responses.unknown` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_http

View README Download JSON

Nginx by HTTP

Overview

This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling the module ngx_http_stub_status_module with HTTP agent remotely:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nginx 1.17.2

Configuration

Setup

See the setup instructions for ngx_http_stub_status_module.

Test the availability of the http_stub_status_module with nginx -V 2>&1 | grep -o with-http_stub_status_module.

Example configuration of Nginx:

location = /basic_status {
    stub_status;
    allow <IP of your Zabbix server/proxy>;
    deny all;
}

Set the hostname or IP address of the Nginx host or Nginx container in the {$NGINX.STUB_STATUS.HOST} macro. You can also change the status page port in the {$NGINX.STUB_STATUS.PORT} macro, the status page scheme in the {$NGINX.STUB_STATUS.SCHEME} macro and the status page path in the {$NGINX.STUB_STATUS.PATH} macro if necessary.

Example answer from Nginx:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Macros used

Name	Description	Default
{$NGINX.STUB_STATUS.HOST}	The hostname or IP address of the Nginx host or Nginx container of a stub_status.	`<SET STUB_STATUS HOST>`
{$NGINX.STUB_STATUS.SCHEME}	The protocol http or https of Nginx stub_status host or container.	`http`
{$NGINX.STUB_STATUS.PATH}	The path of the `Nginx stub_status` page.	`basic_status`
{$NGINX.STUB_STATUS.PORT}	The port of the `Nginx stub_status` host or container.	`80`
{$NGINX.RESPONSE_TIME.MAX.WARN}	The maximum response time of Nginx expressed in seconds for a trigger expression.	`10`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`

Items

Name	Description	Type	Key and additional info
Get stub status page	The following status information is provided: `Active connections` - the current number of active client connections including waiting connections. `Accepted` - the total number of accepted client connections. `Handled` - the total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections` limit). `Requests` - the total number of client requests. `Reading` - the current number of connections where Nginx is reading the request header. `Writing` - the current number of connections where Nginx is writing a response back to the client. `Waiting` - the current number of idle client connections waiting for a request. See also Module ngxhttpstubstatusmodule.	HTTP agent	nginx.getstubstatus
Service status		Simple check	net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Simple check	net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"]
Requests total	The total number of client requests.	Dependent item	nginx.requests.total Preprocessing Regular expression: `The text is too long. Please see the template.`
Requests per second	The total number of client requests.	Dependent item	nginx.requests.total.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections accepted per second	The total number of accepted client connections.	Dependent item	nginx.connections.accepted.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections dropped per second	The total number of dropped client connections.	Dependent item	nginx.connections.dropped.rate Preprocessing JavaScript: `The text is too long. Please see the template.` Change per second
Connections handled per second	The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections limit`).	Dependent item	nginx.connections.handled.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections active	The current number of active client connections including waiting connections.	Dependent item	nginx.connections.active Preprocessing Regular expression: `Active connections: ([0-9]+) \1`
Connections reading	The current number of connections where Nginx is reading the request header.	Dependent item	nginx.connections.reading Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \1`
Connections waiting	The current number of idle client connections waiting for a request.	Dependent item	nginx.connections.waiting Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \3`
Connections writing	The current number of connections where Nginx is writing a response back to the client.	Dependent item	nginx.connections.writing Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \2`
Version		Dependent item	nginx.version Preprocessing Regular expression: `(?i)Server: nginx\/(.+(?<!\r)) \1` Discard unchanged with heartbeat: `1d`

Triggers

Name	Description	Expression	Severity
Nginx: Failed to fetch stub status page	Zabbix has not received any data for items for the last 30 minutes.	`find(/Nginx by HTTP/nginx.get_stub_status,,"iregexp","HTTP\\/[\\d.]+\\s+200")=0 or nodata(/Nginx by HTTP/nginx.get_stub_status,30m)=1`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: Service is down		`last(/Nginx by HTTP/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0`\|Average	Manual close: Yes
Nginx: Service response time is too high		`min(/Nginx by HTTP/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: High connections drop rate	The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes.	`min(/Nginx by HTTP/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN}`\|Warning	Depends on: Nginx: Service is down
Nginx: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/Nginx by HTTP/nginx.version,#1)<>last(/Nginx by HTTP/nginx.version,#2) and length(last(/Nginx by HTTP/nginx.version))>0`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_agent_active

View README Download JSON

Nginx by Zabbix agent active

Overview

This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Nginx by Zabbix agent - collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Note that this template doesn't support HTTPS and redirects (limitations of web.page.get).

It also uses Zabbix agent to collect Nginx Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nginx 1.17.2

Configuration

Setup

See the setup instructions for ngxhttpstubstatusmodule. Test the availability of the http_stub_status_module nginx -V 2>&1 | grep -o with-http_stub_status_module.

Example configuration of Nginx:

location = /basic_status {
    stub_status;
    allow 127.0.0.1;
    allow ::1;
    deny all;
}

If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro.

Example answer from Nginx:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Note that this template doesn't support https and redirects (limitations of web.page.get).

Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$NGINX.STUB_STATUS.HOST}	The hostname or IP address of the Nginx host or Nginx container of `astub_status`.	`localhost`
{$NGINX.STUB_STATUS.PATH}	The path of the `Nginx stub_status` page.	`basic_status`
{$NGINX.STUB_STATUS.PORT}	The port of the `Nginx stub_status` host or container.	`80`
{$NGINX.RESPONSE_TIME.MAX.WARN}	The maximum response time of Nginx expressed in seconds for a trigger expression.	`10`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`
{$NGINX.PROCESS_NAME}	The process name filter for the Nginx process discovery.	`nginx`
{$NGINX.PROCESS.NAME.PARAMETER}	The process name of the Nginx server used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Get stub status page	The following status information is provided: `Active connections` - the current number of active client connections including waiting connections. `Accepted` - the total number of accepted client connections. `Handled` - the total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections` limit). `Requests` - the total number of client requests. `Reading` - the current number of connections where Nginx is reading the request header. `Writing` - the current number of connections where Nginx is writing a response back to the client. `Waiting` - the current number of idle client connections waiting for a request. See also Module ngxhttpstubstatusmodule.	Zabbix agent (active)	web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"]
Service status		Zabbix agent (active)	net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Zabbix agent (active)	net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"]
Requests total	The total number of client requests.	Dependent item	nginx.requests.total Preprocessing Regular expression: `The text is too long. Please see the template.`
Requests per second	The total number of client requests.	Dependent item	nginx.requests.total.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections accepted per second	The total number of accepted client connections.	Dependent item	nginx.connections.accepted.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections dropped per second	The total number of dropped client connections.	Dependent item	nginx.connections.dropped.rate Preprocessing JavaScript: `The text is too long. Please see the template.` Change per second
Connections handled per second	The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections limit`).	Dependent item	nginx.connections.handled.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections active	The current number of active client connections including waiting connections.	Dependent item	nginx.connections.active Preprocessing Regular expression: `Active connections: ([0-9]+) \1`
Connections reading	The current number of connections where Nginx is reading the request header.	Dependent item	nginx.connections.reading Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \1`
Connections waiting	The current number of idle client connections waiting for a request.	Dependent item	nginx.connections.waiting Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \3`
Connections writing	The current number of connections where Nginx is writing a response back to the client.	Dependent item	nginx.connections.writing Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \2`
Version		Dependent item	nginx.version Preprocessing Regular expression: `(?i)Server: nginx\/(.+(?<!\r)) \1` Discard unchanged with heartbeat: `1d`
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent (active)	proc.get[{$NGINX.PROCESS.NAME.PARAMETER},,,summary]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Nginx: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/Nginx by Zabbix agent active/nginx.version,#1)<>last(/Nginx by Zabbix agent active/nginx.version,#2) and length(last(/Nginx by Zabbix agent active/nginx.version))>0`\|Info	Manual close: Yes

LLD rule Nginx process discovery

Name	Description	Type	Key and additional info
Nginx process discovery	The discovery of Nginx process summary.	Dependent item	nginx.proc.discovery

Item prototypes for Nginx process discovery

Name	Description	Type	Key and additional info
CPU utilization	The percentage of the CPU utilization by a process {#NGINX.NAME}.	Zabbix agent (active)	proc.cpu.util[{#NGINX.NAME}]
Get process data	The summary metrics aggregated by a process {#NGINX.NAME}.	Dependent item	nginx.proc.get[{#NGINX.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#NGINX.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#NGINX.NAME} data`
Memory usage (vsize)	The summary of virtual memory used by a process {#NGINX.NAME} expressed in bytes.	Dependent item	nginx.proc.vmem[{#NGINX.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage (rss)	The summary of resident set size memory used by a process {#NGINX.NAME} expressed in bytes.	Dependent item	nginx.proc.rss[{#NGINX.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process {#NGINX.NAME}.	Dependent item	nginx.proc.pmem[{#NGINX.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Number of running processes	The number of running processes {#NGINX.NAME}.	Dependent item	nginx.proc.num[{#NGINX.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nginx process discovery

Name	Description	Expression	Severity
Nginx: Process is not running		`last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])=0`\|High
Nginx: Service is down		`last(/Nginx by Zabbix agent active/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0`\|Average	Manual close: Yes
Nginx: High connections drop rate	The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes.	`min(/Nginx by Zabbix agent active/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Depends on: Nginx: Service is down
Nginx: Service response time is too high		`min(/Nginx by Zabbix agent active/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: Failed to fetch stub status page	Zabbix has not received any data for items for the last 30 minutes.	`(find(/Nginx by Zabbix agent active/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"iregexp","HTTP\\/[\\d.]+\\s+200")=0 or nodata(/Nginx by Zabbix agent active/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Manual close: Yes Depends on: Nginx: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_agent

View README Download JSON

Nginx by Zabbix agent

Overview

This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Nginx by Zabbix agent - collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Note that this template doesn't support HTTPS and redirects (limitations of web.page.get).

It also uses Zabbix agent to collect Nginx Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nginx 1.17.2

Configuration

Setup

See the setup instructions for ngxhttpstubstatusmodule. Test the availability of the http_stub_status_module nginx -V 2>&1 | grep -o with-http_stub_status_module.

Example configuration of Nginx:

location = /basic_status {
    stub_status;
    allow 127.0.0.1;
    allow ::1;
    deny all;
}

If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro.

Example answer from Nginx:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Note that this template doesn't support https and redirects (limitations of web.page.get).

Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$NGINX.STUB_STATUS.HOST}	The hostname or IP address of the Nginx host or Nginx container of `astub_status`.	`localhost`
{$NGINX.STUB_STATUS.PATH}	The path of the `Nginx stub_status` page.	`basic_status`
{$NGINX.STUB_STATUS.PORT}	The port of the `Nginx stub_status` host or container.	`80`
{$NGINX.RESPONSE_TIME.MAX.WARN}	The maximum response time of Nginx expressed in seconds for a trigger expression.	`10`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`
{$NGINX.PROCESS_NAME}	The process name filter for the Nginx process discovery.	`nginx`
{$NGINX.PROCESS.NAME.PARAMETER}	The process name of the Nginx server used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Get stub status page	The following status information is provided: `Active connections` - the current number of active client connections including waiting connections. `Accepted` - the total number of accepted client connections. `Handled` - the total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections` limit). `Requests` - the total number of client requests. `Reading` - the current number of connections where Nginx is reading the request header. `Writing` - the current number of connections where Nginx is writing a response back to the client. `Waiting` - the current number of idle client connections waiting for a request. See also Module ngxhttpstubstatusmodule.	Zabbix agent	web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"]
Service status		Zabbix agent	net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Zabbix agent	net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"]
Requests total	The total number of client requests.	Dependent item	nginx.requests.total Preprocessing Regular expression: `The text is too long. Please see the template.`
Requests per second	The total number of client requests.	Dependent item	nginx.requests.total.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections accepted per second	The total number of accepted client connections.	Dependent item	nginx.connections.accepted.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections dropped per second	The total number of dropped client connections.	Dependent item	nginx.connections.dropped.rate Preprocessing JavaScript: `The text is too long. Please see the template.` Change per second
Connections handled per second	The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections limit`).	Dependent item	nginx.connections.handled.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Connections active	The current number of active client connections including waiting connections.	Dependent item	nginx.connections.active Preprocessing Regular expression: `Active connections: ([0-9]+) \1`
Connections reading	The current number of connections where Nginx is reading the request header.	Dependent item	nginx.connections.reading Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \1`
Connections waiting	The current number of idle client connections waiting for a request.	Dependent item	nginx.connections.waiting Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \3`
Connections writing	The current number of connections where Nginx is writing a response back to the client.	Dependent item	nginx.connections.writing Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \2`
Version		Dependent item	nginx.version Preprocessing Regular expression: `(?i)Server: nginx\/(.+(?<!\r)) \1` Discard unchanged with heartbeat: `1d`
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$NGINX.PROCESS.NAME.PARAMETER},,,summary]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Nginx: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/Nginx by Zabbix agent/nginx.version,#1)<>last(/Nginx by Zabbix agent/nginx.version,#2) and length(last(/Nginx by Zabbix agent/nginx.version))>0`\|Info	Manual close: Yes

LLD rule Nginx process discovery

Name	Description	Type	Key and additional info
Nginx process discovery	The discovery of Nginx process summary.	Dependent item	nginx.proc.discovery

Item prototypes for Nginx process discovery

Name	Description	Type	Key and additional info
CPU utilization	The percentage of the CPU utilization by a process {#NGINX.NAME}.	Zabbix agent	proc.cpu.util[{#NGINX.NAME}]
Get process data	The summary metrics aggregated by a process {#NGINX.NAME}.	Dependent item	nginx.proc.get[{#NGINX.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#NGINX.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#NGINX.NAME} data`
Memory usage (vsize)	The summary of virtual memory used by a process {#NGINX.NAME} expressed in bytes.	Dependent item	nginx.proc.vmem[{#NGINX.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage (rss)	The summary of resident set size memory used by a process {#NGINX.NAME} expressed in bytes.	Dependent item	nginx.proc.rss[{#NGINX.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process {#NGINX.NAME}.	Dependent item	nginx.proc.pmem[{#NGINX.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Number of running processes	The number of running processes {#NGINX.NAME}.	Dependent item	nginx.proc.num[{#NGINX.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nginx process discovery

Name	Description	Expression	Severity
Nginx: Process is not running		`last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])=0`\|High
Nginx: Service is down		`last(/Nginx by Zabbix agent/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Average	Manual close: Yes
Nginx: High connections drop rate	The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes.	`min(/Nginx by Zabbix agent/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Depends on: Nginx: Service is down
Nginx: Service response time is too high		`min(/Nginx by Zabbix agent/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: Failed to fetch stub status page	Zabbix has not received any data for items for the last 30 minutes.	`(find(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"iregexp","HTTP\\/[\\d.]+\\s+200")=0 or nodata(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Manual close: Yes Depends on: Nginx: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

nextcloud_http

View README Download JSON

Nextcloud by HTTP

Overview

This template is designed for monitoring Nextcloud by HTTP via Zabbix, and it works without any external scripts. Nextcloud is a suite of client-server software for creating and using file hosting services. For more information, see the official documentation

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Nextcloud ver. 27.0.1

Configuration

Setup

Set macros {$NEXTCLOUD.USER.NAME}, {$NEXTCLOUD.USER.PASSWORD}, {$NEXTCLOUD.ADDRESS}. The user must be included in the Administrators group.

Macros used

Name	Description	Default
{$NEXTCLOUD.SCHEMA}	HTTP or HTTPS protocol of Nextcloud.	`https`
{$NEXTCLOUD.USER.NAME}	Nextcloud username.	`root`
{$NEXTCLOUD.USER.PASSWORD}	Nextcloud user password.	`<Put the password here>`
{$NEXTCLOUD.ADDRESS}	IP or DNS name of Nextcloud server.	`127.0.0.1`
{$NEXTCLOUD.LLD.FILTER.USER.MATCHES}	Filter of discoverable users by name.	`.*`
{$NEXTCLOUD.LLD.FILTER.USER.NOT_MATCHES}	Filter to exclude discovered users by name.	`CHANGE_IF_NEEDED`
{$NEXTCLOUD.USER.QUOTA.PUSED.MAX}	Storage utilization threshold.	`90`
{$NEXTCLOUD.USER.MAX.INACTIVE}	How many days a user can be inactive.	`30`
{$NEXTCLOUD.CPU.LOAD.MAX}	CPU load threshold (the number of processes in the system run queue).	`95`
{$NEXTCLOUD.MEM.PUSED.MAX}	Memory utilization threshold.	`90`
{$NEXTCLOUD.SWAP.PUSED.MAX}	Swap utilization threshold.	`90`
{$NEXTCLOUD.PHP.MEM.PUSED.MAX}	PHP memory utilization threshold.	`90`
{$NEXTCLOUD.STORAGE.FREE.MIN}	Free space threshold.	`1G`
{$NEXTCLOUD.PROXY}	Proxy HTTP(S) address.

Items

Name	Description	Type	Key and additional info
Get server information	This item provides useful server information, such as CPU load, RAM usage, disk usage, number of users, etc. https://github.com/nextcloud/serverinfo	HTTP agent	nextcloud.serverinfo.get_data Preprocessing XML to JSON Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `<ocs><meta><status>failure</status><statuscode>999</statuscode><message/></meta><data><message>Unknown error</message></data></ocs>`
Server information status	Server information API status	Dependent item	nextcloud.serverinfo.status Preprocessing JSON Path: `$.ocs.meta.message` Discard unchanged with heartbeat: `1h`
Version	Nextcloud service version.	Dependent item	nextcloud.serverinfo.version Preprocessing JSON Path: `$.ocs.data.nextcloud.system.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Free space	The amount of free disk space.	Dependent item	nextcloud.serverinfo.freespace Preprocessing JSON Path: `$.ocs.data.nextcloud.system.freespace` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU load, avg 1m	The average system load (the number of processes in the system run queue), last 1 minute.	Dependent item	nextcloud.serverinfo.cpu.avg.1m Preprocessing JSON Path: `$.ocs.data.nextcloud.system.cpuload.element[0]` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU load, avg 5m	The average system load (the number of processes in the system run queue), last 5 minutes.	Dependent item	nextcloud.serverinfo.cpu.avg.5m Preprocessing JSON Path: `$.ocs.data.nextcloud.system.cpuload.element[1]` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU load, avg 15m	The average system load (the number of processes in the system run queue), last 15 minutes.	Dependent item	nextcloud.serverinfo.cpu.avg.15m Preprocessing JSON Path: `$.ocs.data.nextcloud.system.cpuload.element[2]` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Memory total	The size of the RAM.	Dependent item	nextcloud.serverinfo.mem.total Preprocessing JSON Path: `$.ocs.data.nextcloud.system.mem_total` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Memory free	The amount of free RAM.	Dependent item	nextcloud.serverinfo.mem.free Preprocessing JSON Path: `$.ocs.data.nextcloud.system.mem_free` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Memory used, in %	RAM usage, in percent.	Dependent item	nextcloud.serverinfo.mem.pused Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Swap total	The size of the swap memory.	Dependent item	nextcloud.serverinfo.swap.total Preprocessing JSON Path: `$.ocs.data.nextcloud.system.swap_total` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Swap free	The amount of free swap.	Dependent item	nextcloud.serverinfo.swap.free Preprocessing JSON Path: `$.ocs.data.nextcloud.system.swap_free` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Swap used, in %	Swap usage, in percent.	Dependent item	nextcloud.serverinfo.swap.pused Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Apps installed	The number of installed applications.	Dependent item	nextcloud.serverinfo.apps.installed Preprocessing JSON Path: `$.ocs.data.nextcloud.system.apps.num_installed` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Apps update available	The number of applications for which an update is available.	Dependent item	nextcloud.serverinfo.apps.update Preprocessing JSON Path: `$.ocs.data.nextcloud.system.apps.num_updates_available` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Web server	Web server description.	Dependent item	nextcloud.serverinfo.apps.webserver Preprocessing JSON Path: `$.ocs.data.server.webserver` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP version	PHP version	Dependent item	nextcloud.serverinfo.php.version Preprocessing JSON Path: `$.ocs.data.server.php.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP memory limit	By default, the PHP memory limit is generally set to 128 MB, but it can be customized based on the application's specific needs. The php.ini file is usually the standard location to set the PHP memory limit.	Dependent item	nextcloud.serverinfo.php.memory.limit Preprocessing JSON Path: `$.ocs.data.server.php.memory_limit` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP memory used	PHP memory used	Dependent item	nextcloud.serverinfo.php.memory.used Preprocessing JSON Path: `$.ocs.data.server.php.opcache.memory_usage.used_memory` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP memory free	PHP free memory size.	Dependent item	nextcloud.serverinfo.php.memory.free Preprocessing JSON Path: `$.ocs.data.server.php.opcache.memory_usage.free_memory` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP memory wasted	Memory allocated to the service but not in use.	Dependent item	nextcloud.serverinfo.php.memory.wasted Preprocessing JSON Path: `$.ocs.data.server.php.opcache.memory_usage.wasted_memory` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP memory wasted, in %	Memory allocated to the service but not in use, in percent.	Dependent item	nextcloud.serverinfo.php.memory.wasted_percentage Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP memory used, in %	PHP memory used percentage	Dependent item	nextcloud.serverinfo.php.memory.pused Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
PHP maximum execution time	By default, the maximum execution time for PHP scripts is set to 30 seconds. If a script runs for longer than 30 seconds, PHP stops the script and reports an error. You can control the amount of time PHP allows scripts to run by changing the 'maxexecutiontime' directive in your php.ini file.	Dependent item	nextcloud.serverinfo.php.maxexecutiontime Preprocessing JSON Path: `$.ocs.data.server.php.max_execution_time` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
PHP maximum upload file size	By default, the maximum upload file size for PHP scripts is set to 128 megabytes. However, you may want to change this limit. For example, you can set a lower limit to prevent users from uploading large files to your site. To do this, change the 'uploadmaxfilesize' and 'postmaxsize' directives.	Dependent item	nextcloud.serverinfo.php.uploadmaxfilesize Preprocessing JSON Path: `$.ocs.data.server.php.upload_max_filesize` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Database type	Database type.	Dependent item	nextcloud.serverinfo.db.type Preprocessing JSON Path: `$.ocs.data.server.database.type` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Database version	Database description.	Dependent item	nextcloud.serverinfo.db.version Preprocessing JSON Path: `$.ocs.data.server.database.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Database size	Size of database.	Dependent item	nextcloud.serverinfo.db.size Preprocessing JSON Path: `$.ocs.data.server.database.size` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Active users, last 5 minutes	The number of active users in the last 5 minutes.	Dependent item	nextcloud.serverinfo.active_users.last5m Preprocessing JSON Path: `$.ocs.data.activeUsers.last5minutes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Active users, last 1 hour	The number of active users in the last 1 hour.	Dependent item	nextcloud.serverinfo.active_users.last1h Preprocessing JSON Path: `$.ocs.data.activeUsers.last1hour` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Active users, last 24 hours	The number of active users in the last day.	Dependent item	nextcloud.serverinfo.active_users.last24hours Preprocessing JSON Path: `$.ocs.data.activeUsers.last24hours` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Nextcloud: Server information unavailable	Failed to get server information.	`last(/Nextcloud by HTTP/nextcloud.serverinfo.status)<>"OK"`\|High
Nextcloud: Version has changed	Nextcloud version has changed. Acknowledge to close the problem manually.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.version))>0`\|Info	Manual close: Yes
Nextcloud: Disk space is low	Condition should be the following: - the disk free space is less than `{$NEXTCLOUD.STORAGE.FREE.MIN}`;	`last(/Nextcloud by HTTP/nextcloud.serverinfo.freespace)<{$NEXTCLOUD.STORAGE.FREE.MIN}`\|Average	Manual close: Yes
Nextcloud: CPU load is too high	High CPU load.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.cpu.avg.1m,5m) > {$NEXTCLOUD.CPU.LOAD.MAX}`\|Average
Nextcloud: High memory utilization	The system is running out of free memory.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.mem.pused,5m) > {$NEXTCLOUD.MEM.PUSED.MAX}`\|Average
Nextcloud: High swap utilization	The system is running out of free swap.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.swap.pused,5m) > {$NEXTCLOUD.SWAP.PUSED.MAX}`\|Average
Nextcloud: Number of installed apps has been changed	Applications have been installed or removed.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.apps.installed)<>0`\|Info	Manual close: Yes
Nextcloud: Application updates are available	Updates are available for some of the installed applications.	`last(/Nextcloud by HTTP/nextcloud.serverinfo.apps.update)<>0`\|Warning	Manual close: Yes
Nextcloud: PHP version has changed	The PHP version has changed. Acknowledge to close the problem manually.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.php.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.php.version))>0`\|Info	Manual close: Yes
Nextcloud: High PHP memory utilization	The PHP is running out of free memory.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.php.memory.pused,5m) > {$NEXTCLOUD.PHP.MEM.PUSED.MAX}`\|Average
Nextcloud: Database version has changed	The Database version has changed. Acknowledge to close the problem manually.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.db.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.db.version))>0`\|Info	Manual close: Yes

LLD rule Nextcloud: User discovery

Name Description Type Key and additional info

Nextcloud: User discovery

User discovery.

HTTP agent

nextcloud.user.discovery

Preprocessing

JSON Path: $.ocs.data.users
⛔️Custom on fail: Set value to: []
JavaScript: The text is too long. Please see the template.

Item prototypes for Nextcloud: User discovery

Name	Description	Type	Key and additional info
User "{#NEXTCLOUD.USER}": Get data	Get common information about user	HTTP agent	nextcloud.user.get_data[{#NEXTCLOUD.USER}] Preprocessing XML to JSON Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `<ocs><meta><status>failure</status><statuscode>999</statuscode><message/></meta><data><message>Unknown error</message></data></ocs>`
User "{#NEXTCLOUD.USER}": Status	User account status.	Dependent item	nextcloud.user.enabled[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.enabled` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Storage location	The location of the user's store.	Dependent item	nextcloud.user.storageLocation[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.storageLocation` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Last login	The time the user has last logged in.	Dependent item	nextcloud.user.lastLogin[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.lastLogin` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Last login, days ago	The number of days since the user has last logged in.	Dependent item	nextcloud.user.inactive[{#NEXTCLOUD.USER}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Quota free space	The size of the free available space in the user's storage.	Dependent item	nextcloud.user.quota.free[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.free` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Quota used space	The size of the used available space in the user storage.	Dependent item	nextcloud.user.quota.used[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.used` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Quota total space	The size of space available in the user's storage.	Dependent item	nextcloud.user.quota.total[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Quota used space, in %	Usage of the allocated storage space, in percent.	Dependent item	nextcloud.user.quota.pused[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.relative` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Quota	The size of space available in the user's storage.	Dependent item	nextcloud.user.quota[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.quota` ⛔️Custom on fail: Discard value Replace: `none -> -99` Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Display name	User visible name.	Dependent item	nextcloud.user.displayname[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.displayname` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
User "{#NEXTCLOUD.USER}": Language	User language.	Dependent item	nextcloud.user.language[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.language` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nextcloud: User discovery

Name	Description	Expression
Nextcloud: User "{#NEXTCLOUD.USER}" status changed	User account status has changed.	`change(/Nextcloud by HTTP/nextcloud.user.enabled[{#NEXTCLOUD.USER}]) = 1`\|Info
Nextcloud: User "{#NEXTCLOUD.USER}": inactive	The user has not logged in for more than {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"} days.	`last(/Nextcloud by HTTP/nextcloud.user.inactive[{#NEXTCLOUD.USER}]) > {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"}`\|Info
Nextcloud: User "{#NEXTCLOUD.USER}": High quota utilization	More than {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"} percent of the allocated storage space has been used.	`min(/Nextcloud by HTTP/nextcloud.user.quota.pused[{#NEXTCLOUD.USER}],5m) > {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_microsoft_365_http

View README Download JSON

Microsoft 365 reports by HTTP

Overview

This template is designed to monitor Microsoft 365 by HTTP. It works without any external scripts and uses script items.
The template uses endpoints in the Microsoft Graph API to gather daily metrics from weekly reports.
The template is meant to be used as a long-term trend monitoring tool.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft 365

Configuration

Setup

Register the app with your Microsoft Entra ID.
Configure Microsoft Graph application permissions on the app ID:
- Reports.Read.All - required for app usage and activity metrics
- ServiceHealth.Read.All - required for service discovery and service status metrics
Request administrator consent.
Configure the macros: {$MS365.APP.ID}, {$MS365.PASSWORD}, {$MS365.TENANT.ID}.

Macros used

Name	Description	Default
{$MS365.APP.ID}	Microsoft application ID.
{$MS365.PASSWORD}	The secret for the registered Microsoft application.
{$MS365.TENANT.ID}	Microsoft tenant ID.
{$MS365.SERVICE.NAME.MATCHES}	This macro is used in the Microsoft cloud service discovery rule.	`.*`
{$MS365.SERVICE.NAME.NOT.MATCHES}	This macro is used in the Microsoft cloud service discovery rule.	`CHANGE_IF_NEEDED`
{$MS365.HTTP.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$MS365.API.TIMEOUT}	API response timeout.	`15s`

Items

Name	Description	Type	Key and additional info
Services: Get services	The list of Microsoft cloud services and their health statuses subscribed by a tenant. More information: https://learn.microsoft.com/en-us/graph/api/servicehealth-get?view=graph-rest-beta&tabs=http	Script	ms365.services.get
Teams: Get reports	Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta	Script	ms365.teams.reports.get
Outlook: Get reports	Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta	Script	ms365.outlook.reports.get
OneDrive: Get reports	Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta	Script	ms365.onedrive.reports.get
SharePoint: Get reports	Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta	Script	ms365.sharepoint.reports.get
Apps: Get reports	Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta	Script	ms365.apps.reports.get
Services: Get errors	A list of errors from API requests for Services metrics.	Dependent item	ms365.services.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Teams: Get errors	A list of errors from API requests for Teams metrics.	Dependent item	ms365.teams.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Outlook: Get errors	A list of errors from API requests for Outlook metrics.	Dependent item	ms365.outlook.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
OneDrive: Get errors	A list of errors from API requests for OneDrive metrics.	Dependent item	ms365.onedrive.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
SharePoint: Get errors	A list of errors from API requests for SharePoint metrics.	Dependent item	ms365.sharepoint.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Apps: Get errors	A list of errors from API requests for Apps metrics.	Dependent item	ms365.apps.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Teams: Device usage (users), web client	The number of unique licensed Microsoft Teams users recorded via the Teams web client over the week before the report refresh date.	Dependent item	ms365.teams.device.users.web Preprocessing JSON Path: `$.metrics.getTeamsDeviceUsageDistributionUserCounts[0].web` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (users), Android	The number of unique licensed Microsoft Teams users recorded via the Teams mobile client for Android over the week before the report refresh date.	Dependent item	ms365.teams.device.users.android Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (users), iOS	The number of unique licensed Microsoft Teams users recorded via the Teams mobile client for iOS over the week before the report refresh date.	Dependent item	ms365.teams.device.users.ios Preprocessing JSON Path: `$.metrics.getTeamsDeviceUsageDistributionUserCounts[0].ios` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (users), Mac	The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a macOS computer over the week before the report refresh date.	Dependent item	ms365.teams.device.users.mac Preprocessing JSON Path: `$.metrics.getTeamsDeviceUsageDistributionUserCounts[0].mac` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (users), Windows	The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a Windows-based computer over the week before the report refresh date.	Dependent item	ms365.teams.device.users.windows Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (users), Chrome OS	The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a ChromeOS computer over the week before the report refresh date.	Dependent item	ms365.teams.device.users.chromeos Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (users), Linux	The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a Linux computer over the week before the report refresh date.	Dependent item	ms365.teams.device.users.linux Preprocessing JSON Path: `$.metrics.getTeamsDeviceUsageDistributionUserCounts[0].linux` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), report date	The date of the report of device usage of both licensed and non-licensed users.	Dependent item	ms365.teams.device.total.report_date Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), web client	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams web client over the week before the report refresh date.	Dependent item	ms365.teams.device.total.web Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), Android	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams mobile client for Android over the week before the report refresh date.	Dependent item	ms365.teams.device.total.android Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), iOS	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams mobile client for iOS over the week before the report refresh date.	Dependent item	ms365.teams.device.total.ios Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), Mac	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a macOS computer over the week before the report refresh date.	Dependent item	ms365.teams.device.total.mac Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), Windows	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a Windows-based computer over the week before the report refresh date.	Dependent item	ms365.teams.device.total.windows Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), Chrome OS	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a ChromeOS computer over the week before the report refresh date.	Dependent item	ms365.teams.device.total.chromeos Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (total), Linux	The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a Linux computer over the week before the report refresh date.	Dependent item	ms365.teams.device.total.linux Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), web client	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams web client over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.web Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), Android	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams mobile client for Android over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.android Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), iOS	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams mobile client for iOS over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.ios Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), Mac	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a macOS computer over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.mac Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), Windows	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a Windows-based computer over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.windows Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), Chrome OS	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a ChromeOS computer over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.chromeos Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Device usage (guests), Linux	The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a Linux computer over the week before the report refresh date.	Dependent item	ms365.teams.device.guests.linux Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: User activity, report date	The date of the report of the number of activities by all users.	Dependent item	ms365.teams.activity.user.report_date Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, team chat messages	The number of unique messages that were posted in team chats by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. This includes original posts and replies.	Dependent item	ms365.teams.activity.user.messages.in_team Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, private chat messages	The number of unique messages that were posted in private chats by licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.messages.private Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, calls	The number of 1:1 calls licensed or non-licensed Microsoft Teams users participated in during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.calls Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, meetings	The sum of the scheduled one-time and recurring, ad-hoc, and unclassified meetings licensed or non-licensed Microsoft Teams users participated in during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.meetings.total Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, organized meetings	The sum of the scheduled one-time and recurring, ad-hoc, and unclassified meetings organized by licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.meetings.organized Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, attended meetings	The sum of the scheduled one-time and recurring, ad-hoc, and unclassified meetings attended by licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.meetings.attended Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, audio duration	The sum of the audio duration of licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.duration.audio Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d` JavaScript: `The text is too long. Please see the template.`
Teams: User activity, video duration	The sum of the video duration of licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.duration.video Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d` JavaScript: `The text is too long. Please see the template.`
Teams: User activity, screen share duration	The sum of the screen share duration of licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.duration.screen_share Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d` JavaScript: `The text is too long. Please see the template.`
Teams: User activity, post messages	The number of post messages in all channels made by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. A post is the original message in a teams chat.	Dependent item	ms365.teams.activity.user.messages.posts Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User activity, reply messages	The number of reply messages in all channels made by licensed or non-licensed Microsoft Teams users during the week before the report refresh date.	Dependent item	ms365.teams.activity.user.messages.replies Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (users), team chat messages	The number of licensed Microsoft Teams users who posted or replied in team chats during the week before the report refresh date.	Dependent item	ms365.teams.usercount.users.messages.inteam Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (users), private chat messages	The number of licensed Microsoft Teams users who posted or replied in private chats during the week before the report refresh date.	Dependent item	ms365.teams.user_count.users.messages.private Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (users), calls	The number of licensed Microsoft Teams users who participated in calls during the week before the report refresh date.	Dependent item	ms365.teams.user_count.users.calls Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (users), meetings	The number of licensed Microsoft Teams users who participated in meetings during the week before the report refresh date.	Dependent item	ms365.teams.user_count.users.meetings Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (total), report date	The date of the report of the number of licensed or non-licensed Microsoft Teams users in activity.	Dependent item	ms365.teams.usercount.total.reportdate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (total), team chat messages	The number of licensed or non-licensed Microsoft Teams users who posted or replied in team chats during the week before the report refresh date.	Dependent item	ms365.teams.usercount.total.messages.inteam Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (total), private chat messages	The number of licensed or non-licensed Microsoft Teams users who posted or replied in private chats during the week before the report refresh date.	Dependent item	ms365.teams.user_count.total.messages.private Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (total), calls	The number of licensed or non-licensed Microsoft Teams users who participated in calls during the week before the report refresh date.	Dependent item	ms365.teams.user_count.total.calls Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (total), meetings	The number of licensed or non-licensed Microsoft Teams users who participated in meetings during the week before the report refresh date.	Dependent item	ms365.teams.user_count.total.meetings Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: User count (guests), team chat messages	The number of non-licensed Microsoft Teams users (guests) who posted or replied in team chats during the week before the report refresh date.	Dependent item	ms365.teams.usercount.guests.messages.inteam Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: User count (guests), private chat messages	The number of non-licensed Microsoft Teams users (guests) who posted or replied in private chats during the week before the report refresh date.	Dependent item	ms365.teams.user_count.guests.messages.private Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: User count (guests), calls	The number of non-licensed Microsoft Teams users (guests) who participated in calls during the week before the report refresh date.	Dependent item	ms365.teams.user_count.guests.calls Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: User count (guests), meetings	The number of non-licensed Microsoft Teams users (guests) who participated in meetings during the week before the report refresh date.	Dependent item	ms365.teams.user_count.guests.meetings Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Teams: Team activity, report date	The date of the report of the number of team activities.	Dependent item	ms365.teams.activity.team.report_date Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, active shared channels	The number of active shared channels across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.activesharedchannels Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, active external users	The number of active external users across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.activeexternalusers Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, active users	The number of active users across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.active_users Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, active channels	The number of active channels across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.active_channels Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, channel messages	The number of channel messages across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.channel_messages Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, guests	The number of guests across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.guests Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, reactions	The number of reactions across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.reactions Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, meetings organized	The number of organized meetings across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.meetings_organized Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, post messages	The number of post messages across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.messages.posts Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, reply messages	The number of reply messages across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.messages.replies Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, urgent messages	The number of urgent messages across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.urgent_messages Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Teams: Team activity, mentions	The number of mentions across Microsoft Teams over the week before the report refresh date.	Dependent item	ms365.teams.activity.team.mentions Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Activity, report date	The date of the Outlook activity count report.	Dependent item	ms365.outlook.activity.report_date Preprocessing JSON Path: `$.metrics.getEmailActivityCounts[0].reportRefreshDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Activity, emails sent	The number of times an "Email sent" action was recorded.	Dependent item	ms365.outlook.activity.sent Preprocessing JSON Path: `$.metrics.getEmailActivityCounts[0].send` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Activity, emails received	The number of times an "Email received" action was recorded.	Dependent item	ms365.outlook.activity.received Preprocessing JSON Path: `$.metrics.getEmailActivityCounts[0].receive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Activity, emails read	The number of times an "Email read" action was recorded.	Dependent item	ms365.outlook.activity.read Preprocessing JSON Path: `$.metrics.getEmailActivityCounts[0].read` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Activity, meetings created	The number of times a "Meeting request sent" action was recorded.	Dependent item	ms365.outlook.activity.meetings_created Preprocessing JSON Path: `$.metrics.getEmailActivityCounts[0].meetingCreated` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Activity, meetings interacted	The number of times a meeting request accept, tentative, decline, or cancel action was recorded.	Dependent item	ms365.outlook.activity.meetings_interacted Preprocessing JSON Path: `$.metrics.getEmailActivityCounts[0].meetingInteracted` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, report date	The date of the Outlook activity user count report.	Dependent item	ms365.outlook.usercount.reportdate Preprocessing JSON Path: `$.metrics.getEmailActivityUserCounts[0].reportRefreshDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, emails sent	The number of users an "Email sent" action was recorded for.	Dependent item	ms365.outlook.user_count.sent Preprocessing JSON Path: `$.metrics.getEmailActivityUserCounts[0].send` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, emails received	The number of users an "Email received" action was recorded for.	Dependent item	ms365.outlook.user_count.received Preprocessing JSON Path: `$.metrics.getEmailActivityUserCounts[0].receive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, emails read	The number of users an "Email read" action was recorded for.	Dependent item	ms365.outlook.user_count.read Preprocessing JSON Path: `$.metrics.getEmailActivityUserCounts[0].read` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, meetings created	The number of users a "Meeting request sent" action was recorded for.	Dependent item	ms365.outlook.usercount.meetingscreated Preprocessing JSON Path: `$.metrics.getEmailActivityUserCounts[0].meetingCreated` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, meetings interacted	The number of users a meeting request accept, tentative, decline, or cancel action was recorded for.	Dependent item	ms365.outlook.usercount.meetingsinteracted Preprocessing JSON Path: `$.metrics.getEmailActivityUserCounts[0].meetingInteracted` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, app usage, report date	The date of the report of the unique user count per app.	Dependent item	ms365.outlook.usercount.reportdate.app Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, mail for Mac	The number of unique users of mail for Mac.	Dependent item	ms365.outlook.usercount.mailfor_mac Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].mailForMac` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, Outlook for Mac	The number of unique users of Outlook for Mac.	Dependent item	ms365.outlook.user_count.mac Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].outlookForMac` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, Outlook for Windows	The number of unique users of Outlook for Windows.	Dependent item	ms365.outlook.user_count.windows Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].outlookForWindows` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, Outlook for mobile	The number of unique users of Outlook for mobile.	Dependent item	ms365.outlook.user_count.mobile Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].outlookForMobile` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, Outlook for web	The number of unique users of Outlook for web.	Dependent item	ms365.outlook.user_count.web Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].outlookForWeb` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, POP3 applications	The number of unique users of other POP3 applications.	Dependent item	ms365.outlook.user_count.pop3app Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].pop3App` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, IMAP4 applications	The number of unique users of other IMAP4 applications.	Dependent item	ms365.outlook.user_count.imap4app Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].imap4App` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: User count, SMTP applications	The number of unique users of other SMTP applications.	Dependent item	ms365.outlook.user_count.smtpapp Preprocessing JSON Path: `$.metrics.getEmailAppUsageUserCounts[6].smtpApp` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Mailbox, report date	The date of the report of the unique user count per app.	Dependent item	ms365.outlook.mailbox.report_date Preprocessing JSON Path: `$.metrics.getMailboxUsageMailboxCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Mailbox, total	The total number of user mailboxes in your organization.	Dependent item	ms365.outlook.mailbox.total Preprocessing JSON Path: `$.metrics.getMailboxUsageMailboxCounts[0].total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Mailbox, active	The number of active user mailboxes in your organization. A mailbox is considered active if the user has sent or read any emails.	Dependent item	ms365.outlook.mailbox.active Preprocessing JSON Path: `$.metrics.getMailboxUsageMailboxCounts[0].active` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Mailbox, active in %	Percentage of active user mailboxes in your organization. A mailbox is considered active if the user has sent or read any emails.	Dependent item	ms365.outlook.mailbox.active.percentage Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Outlook: Mailbox, storage, report date	The date of the mailbox storage report.	Dependent item	ms365.outlook.storage.report_date Preprocessing JSON Path: `$.metrics.getMailboxUsageStorage[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Outlook: Mailbox, storage used	The amount of mailbox storage used in your organization.	Dependent item	ms365.outlook.storage.used Preprocessing JSON Path: `$.metrics.getMailboxUsageStorage[0].storageUsedInBytes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Users, report date	The date of the report of the number of active OneDrive users.	Dependent item	ms365.onedrive.users.report_date Preprocessing JSON Path: `$.metrics.getOneDriveActivityUserCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Users, viewed or edited	The number of users who have viewed or edited OneDrive files.	Dependent item	ms365.onedrive.users.viewedoredited Preprocessing JSON Path: `$.metrics.getOneDriveActivityUserCounts[0].viewedOrEdited` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Users, synced	The number of users who have synced OneDrive files.	Dependent item	ms365.onedrive.users.synced Preprocessing JSON Path: `$.metrics.getOneDriveActivityUserCounts[0].synced` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Users, shared internally	The number of users who have shared OneDrive files internally.	Dependent item	ms365.onedrive.users.shared_internally Preprocessing JSON Path: `$.metrics.getOneDriveActivityUserCounts[0].sharedInternally` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Users, shared externally	The number of users who have shared OneDrive files externally.	Dependent item	ms365.onedrive.users.shared_externally Preprocessing JSON Path: `$.metrics.getOneDriveActivityUserCounts[0].sharedExternally` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Files, activity, report date	The date of report of the number of active OneDrive files.	Dependent item	ms365.onedrive.files.report_date Preprocessing JSON Path: `$.metrics.getOneDriveActivityFileCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Files, viewed or edited	The number of viewed or edited OneDrive files.	Dependent item	ms365.onedrive.files.viewedoredited Preprocessing JSON Path: `$.metrics.getOneDriveActivityFileCounts[0].viewedOrEdited` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Files, synced	The number of synced OneDrive files.	Dependent item	ms365.onedrive.files.synced Preprocessing JSON Path: `$.metrics.getOneDriveActivityFileCounts[0].synced` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Files, shared internally	The number of internally shared OneDrive files.	Dependent item	ms365.onedrive.files.shared_internally Preprocessing JSON Path: `$.metrics.getOneDriveActivityFileCounts[0].sharedInternally` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Files, shared externally	The number of externally shared OneDrive files.	Dependent item	ms365.onedrive.files.shared_externally Preprocessing JSON Path: `$.metrics.getOneDriveActivityFileCounts[0].sharedExternally` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Business sites, report date	The date of the number of OneDrive for Business sites report.	Dependent item	ms365.onedrive.sites.report_date Preprocessing JSON Path: `$.metrics.getOneDriveUsageAccountCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Business sites, total	The number of OneDrive for Business sites.	Dependent item	ms365.onedrive.sites.total Preprocessing JSON Path: `$.metrics.getOneDriveUsageAccountCounts[0].total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Business sites, active	The number of active OneDrive for Business sites. Any site on which users have viewed, modified, uploaded, downloaded, shared, or synced files is considered an active site.	Dependent item	ms365.onedrive.sites.active Preprocessing JSON Path: `$.metrics.getOneDriveUsageAccountCounts[0].active` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: File count, report date	The date of the OneDrive file count report.	Dependent item	ms365.onedrive.filecount.reportdate Preprocessing JSON Path: `$.metrics.getOneDriveUsageFileCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: File count, total	The total number of files across all sites.	Dependent item	ms365.onedrive.file_count.total Preprocessing JSON Path: `$.metrics.getOneDriveUsageFileCounts[0].total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: File count, active	The total number of active files across all sites. A file is considered active if it has been saved, synced, modified, or shared.	Dependent item	ms365.onedrive.file_count.active Preprocessing JSON Path: `$.metrics.getOneDriveUsageFileCounts[0].active` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Storage, report date	The date of the report of the amount of storage used in OneDrive for Business.	Dependent item	ms365.onedrive.storage.report_date Preprocessing JSON Path: `$.metrics.getOneDriveUsageStorage[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
OneDrive: Storage, total	The total amount of storage used in OneDrive for Business.	Dependent item	ms365.onedrive.storage.total Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Files, activity, report date	The date of the report of the number of active SharePoint files.	Dependent item	ms365.sharepoint.files.report_date Preprocessing JSON Path: `$.metrics.getSharePointActivityFileCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Files, viewed or edited	The number of viewed or edited SharePoint files.	Dependent item	ms365.sharepoint.files.viewedoredited Preprocessing JSON Path: `$.metrics.getSharePointActivityFileCounts[0].viewedOrEdited` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Files, synced	The number of files synced to a SharePoint site.	Dependent item	ms365.sharepoint.files.synced Preprocessing JSON Path: `$.metrics.getSharePointActivityFileCounts[0].synced` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Files, shared internally	The number of internally shared SharePoint files.	Dependent item	ms365.sharepoint.files.shared_internally Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Files, shared externally	The number of externally shared SharePoint files.	Dependent item	ms365.sharepoint.files.shared_externally Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Users, report date	The date of the report of the number of active SharePoint users.	Dependent item	ms365.sharepoint.usercount.reportdate Preprocessing JSON Path: `$.metrics.getSharePointActivityUserCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Users, viewed or edited	The number of users who have viewed or edited SharePoint files.	Dependent item	ms365.sharepoint.usercount.viewedor_edited Preprocessing JSON Path: `$.metrics.getSharePointActivityUserCounts[0].viewedOrEdited` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Users, synced	The number of users who have synced SharePoint files.	Dependent item	ms365.sharepoint.user_count.synced Preprocessing JSON Path: `$.metrics.getSharePointActivityUserCounts[0].synced` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Users, shared internally	The number of users who have shared SharePoint files internally.	Dependent item	ms365.sharepoint.usercount.sharedinternally Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Users, shared externally	The number of users who have shared SharePoint files externally.	Dependent item	ms365.sharepoint.usercount.sharedexternally Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Users, pages visited	The number of users who have visited unique pages.	Dependent item	ms365.sharepoint.usercount.visitedpage Preprocessing JSON Path: `$.metrics.getSharePointActivityUserCounts[0].visitedPage` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Pages, visited, report date	The date of the report of the number of pages visited.	Dependent item	ms365.sharepoint.pagesvisited.reportdate Preprocessing JSON Path: `$.metrics.getSharePointActivityPages[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Pages, visited	The number of unique pages visited by users.	Dependent item	ms365.sharepoint.pages_visited.count Preprocessing JSON Path: `$.metrics.getSharePointActivityPages[0].visitedPageCount` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: File count, report date	The date of the SharePoint file count report.	Dependent item	ms365.sharepoint.filecount.reportdate Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageFileCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: File count, total	The total number of files across all sites.	Dependent item	ms365.sharepoint.file_count.total Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageFileCounts[0].total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: File count, active	The total number of active files across all sites. A file is considered active if it has been saved, synced, modified, or shared.	Dependent item	ms365.sharepoint.file_count.active Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageFileCounts[0].active` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Sites, report date	The date of the report of the number of SharePoint sites.	Dependent item	ms365.sharepoint.site.report_date Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageSiteCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Sites, total	The number of SharePoint sites.	Dependent item	ms365.sharepoint.sites.total Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageSiteCounts[0].total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Sites, active	The number of active SharePoint sites.	Dependent item	ms365.sharepoint.sites.active Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageSiteCounts[0].active` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Storage, report date	The date of the report of the amount of storage used in SharePoint.	Dependent item	ms365.sharepoint.storage.report_date Preprocessing JSON Path: `$.metrics.getSharePointSiteUsageStorage[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Storage, total	The total amount of storage used in SharePoint.	Dependent item	ms365.sharepoint.storage.total Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Pages, viewed, report date	The date of the report of the number of pages viewed.	Dependent item	ms365.sharepoint.pagesviewed.reportdate Preprocessing JSON Path: `$.metrics.getSharePointSiteUsagePages[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
SharePoint: Pages, view count	The number of pages viewed across all sites.	Dependent item	ms365.sharepoint.pages_viewed.count Preprocessing JSON Path: `$.metrics.getSharePointSiteUsagePages[0].pageViewCount` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, report date	The date of the active user count report.	Dependent item	ms365.apps.users.report_date Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, Office 365	The number of daily Office 365 users.	Dependent item	ms365.apps.users.office365 Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].office365` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, Exchange	The number of daily Exchange users.	Dependent item	ms365.apps.users.exchange Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].exchange` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, OneDrive	The number of daily OneDrive users.	Dependent item	ms365.apps.users.onedrive Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].oneDrive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, SharePoint	The number of daily SharePoint users.	Dependent item	ms365.apps.users.sharepoint Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].sharePoint` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, Skype for Business	The number of daily Skype for Business users.	Dependent item	ms365.apps.users.skype Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].skypeForBusiness` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, Yammer	The number of daily Yammer users.	Dependent item	ms365.apps.users.yammer Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].yammer` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Users, Teams	The number of daily Teams users.	Dependent item	ms365.apps.users.teams Preprocessing JSON Path: `$.metrics.getOffice365ActiveUserCounts[6].teams` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, report date	The date of the report of the user count by activity.	Dependent item	ms365.apps.activity.report_date Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Exchange active users	The number of active Exchange users during the week before the report date.	Dependent item	ms365.apps.activity.exchange.users.active Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].exchangeActive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Exchange inactive users	The number of inactive Exchange users during the week before the report date.	Dependent item	ms365.apps.activity.exchange.users.inactive Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].exchangeInactive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, OneDrive active users	The number of active OneDrive users during the week before the report date.	Dependent item	ms365.apps.activity.onedrive.users.active Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].oneDriveActive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, OneDrive inactive users	The number of inactive OneDrive users during the week before the report date.	Dependent item	ms365.apps.activity.onedrive.users.inactive Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].oneDriveInactive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, SharePoint active users	The number of active SharePoint users during the week before the report date.	Dependent item	ms365.apps.activity.sharepoint.users.active Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].sharePointActive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, SharePoint inactive users	The number of inactive SharePoint users during the week before the report date.	Dependent item	ms365.apps.activity.sharepoint.users.inactive Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Skype for Business active users	The number of active Skype for Business users during the week before the report date.	Dependent item	ms365.apps.activity.skypeforbusiness.users.active Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Skype for Business inactive users	The number of inactive Skype for Business users during the week before the report date.	Dependent item	ms365.apps.activity.skypeforbusiness.users.inactive Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Yammer active users	The number of active Yammer users during the week before the report date.	Dependent item	ms365.apps.activity.yammer.users.active Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].yammerActive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Yammer inactive users	The number of inactive Yammer users during the week before the report date.	Dependent item	ms365.apps.activity.yammer.users.inactive Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].yammerInactive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Teams active users	The number of active Teams users during the week before the report date.	Dependent item	ms365.apps.activity.teams.users.active Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].teamsActive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Teams inactive users	The number of inactive Teams users during the week before the report date.	Dependent item	ms365.apps.activity.teams.users.inactive Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].teamsInactive` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Office 365 active users	The number of active Office 365 users during the week before the report date.	Dependent item	ms365.apps.activity.office365.users.active Preprocessing JSON Path: `$.metrics.getOffice365ServicesUserCounts[0].office365Active` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Activity, Office 365 inactive users	The number of inactive Office 365 users during the week before the report date.	Dependent item	ms365.apps.activity.office365.users.inactive Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, user count report date	The date of the report of the number of active users for each app.	Dependent item	ms365.apps.office.usercount.reportdate Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].reportDate` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, Outlook user count	The number of active Outlook users.	Dependent item	ms365.apps.office.user_count.outlook Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].outlook` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, Word user count	The number of active Word users.	Dependent item	ms365.apps.office.user_count.word Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].word` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, Excel user count	The number of active Excel users.	Dependent item	ms365.apps.office.user_count.excel Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].excel` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, PowerPoint user count	The number of active PowerPoint users.	Dependent item	ms365.apps.office.user_count.powerpoint Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].powerPoint` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, OneNote user count	The number of active OneNote users.	Dependent item	ms365.apps.office.user_count.onenote Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].oneNote` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Office, Teams user count	The number of active Teams users.	Dependent item	ms365.apps.office.user_count.teams Preprocessing JSON Path: `$.metrics.getM365AppUserCounts[0].userCounts[0].teams` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Platform, user count report date	The date of the report of the number of active users per platform.	Dependent item	ms365.apps.platform.usercount.reportdate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Platform, Windows user count	The number of active users on the Windows platform.	Dependent item	ms365.apps.platform.user_count.windows Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Platform, Mac user count	The number of active users on the Mac platform.	Dependent item	ms365.apps.platform.user_count.mac Preprocessing JSON Path: `$.metrics.getM365AppPlatformUserCounts[0].userCounts[0].mac` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Platform, mobile user count	The number of active users on the mobile platform.	Dependent item	ms365.apps.platform.user_count.mobile Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Apps: Platform, web user count	The number of active users on the web platform.	Dependent item	ms365.apps.platform.user_count.web Preprocessing JSON Path: `$.metrics.getM365AppPlatformUserCounts[0].userCounts[0].web` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`

Triggers

Name	Description	Expression
Microsoft 365: Services: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Microsoft 365 reports by HTTP/ms365.services.errors))>0`\|Average
Microsoft 365: Teams: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Microsoft 365 reports by HTTP/ms365.teams.errors))>0`\|Average
Microsoft 365: Outlook: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Microsoft 365 reports by HTTP/ms365.outlook.errors))>0`\|Average
Microsoft 365: OneDrive: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Microsoft 365 reports by HTTP/ms365.onedrive.errors))>0`\|Average
Microsoft 365: SharePoint: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Microsoft 365 reports by HTTP/ms365.sharepoint.errors))>0`\|Average
Microsoft 365: Apps: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Microsoft 365 reports by HTTP/ms365.apps.errors))>0`\|Average

LLD rule Microsoft cloud service discovery

Name Description Type Key and additional info

Microsoft cloud service discovery

The list of Microsoft cloud services subscribed by a tenant.

Dependent item

ms365.service.discovery

Preprocessing

JSON Path: $.services.value
Discard unchanged with heartbeat: 6h

Item prototypes for Microsoft cloud service discovery

Name Description Type Key and additional info

Services: {#NAME} health status

Overall service health status of [{#NAME}].

More information about health status values can be found here:

https://learn.microsoft.com/en-us/graph/api/resources/servicehealthissue?view=graph-rest-beta#servicehealthstatus-values

Dependent item

ms365.service.health[{#NAME}]

Preprocessing

JSON Path: $.services.value[?(@.service == '{#NAME}')].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

Trigger prototypes for Microsoft cloud service discovery

Name	Description	Expression	Severity	Dependencies and additional info
Microsoft 365: Services: {#NAME} is degraded.	The service has the "Degraded" health status. An issue is confirmed that may affect the use of a service or feature. You might see this status if a service is performing slower than usual, there are intermittent interruptions, or if a feature isn't working.	`last(/Microsoft 365 reports by HTTP/ms365.service.health[{#NAME}])=6`\|Warning	Manual close: Yes
Microsoft 365: Services: {#NAME} is interrupted.	The service has the "Interruption" health status. An issue was determined that affects users being able to access the service. In this case, the issue is significant and can be reproduced consistently.	`last(/Microsoft 365 reports by HTTP/ms365.service.health[{#NAME}])=7`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_memcached

View README Download JSON

Memcached by Zabbix agent 2

Overview

This template is designed for the effortless deployment of Memcached monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Memcached 1.4, 1.5, 1.6

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Memcached monitoring plugin.

Test availability: zabbix_get -s memcached-host -k memcached.ping

Macros used

Name	Description	Default
{$MEMCACHED.CONN.URI}	Connection string in the URI format (password is not used). This param overwrites a value configured in the "Plugins.Memcached.Uri" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:11211"	`tcp://localhost:11211`
{$MEMCACHED.CONN.THROTTLED.MAX.WARN}	Maximum number of throttled connections per second	`1`
{$MEMCACHED.CONN.QUEUED.MAX.WARN}	Maximum number of queued connections per second	`1`
{$MEMCACHED.CONN.PRC.MAX.WARN}	Maximum percentage of connected clients	`80`
{$MEMCACHED.MEM.PUSED.MAX.WARN}	Maximum percentage of memory used	`90`

Items

Name	Description	Type	Key and additional info
Get status		Zabbix agent	memcached.stats["{$MEMCACHED.CONN.URI}"]
Ping		Zabbix agent	memcached.ping["{$MEMCACHED.CONN.URI}"] Preprocessing Discard unchanged with heartbeat: `10m`
Max connections	Max number of concurrent connections	Dependent item	memcached.connections.max Preprocessing JSON Path: `$.max_connections` Discard unchanged with heartbeat: `30m`
Maximum number of bytes	Maximum number of bytes allowed in cache. You can adjust this setting via a config file or the command line while starting your Memcached server.	Dependent item	memcached.config.limit_maxbytes Preprocessing JSON Path: `$.limit_maxbytes` Discard unchanged with heartbeat: `30m`
CPU sys	System CPU consumed by the Memcached server	Dependent item	memcached.cpu.sys Preprocessing JSON Path: `$.rusage_system`
CPU user	User CPU consumed by the Memcached server	Dependent item	memcached.cpu.user Preprocessing JSON Path: `$.rusage_user`
Queued connections per second	Number of times that memcached has hit its connections limit and disabled its listener	Dependent item	memcached.connections.queued.rate Preprocessing JSON Path: `$.listen_disabled_num` Change per second
New connections per second	Number of connections opened per second	Dependent item	memcached.connections.rate Preprocessing JSON Path: `$.total_connections` Change per second
Throttled connections	Number of times a client connection was throttled. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation.	Dependent item	memcached.connections.throttled.rate Preprocessing JSON Path: `$.conn_yields` Change per second
Connection structures	Number of connection structures allocated by the server	Dependent item	memcached.connections.structures Preprocessing JSON Path: `$.connection_structures`
Open connections	The number of clients presently connected	Dependent item	memcached.connections.current Preprocessing JSON Path: `$.curr_connections`
Commands: FLUSH per second	The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn't take place in production, so check your debug scripts.	Dependent item	memcached.commands.flush.rate Preprocessing JSON Path: `$.cmd_flush` Change per second
Commands: GET per second	Number of GET requests received by server per second.	Dependent item	memcached.commands.get.rate Preprocessing JSON Path: `$.cmd_get` Change per second
Commands: SET per second	Number of SET requests received by server per second.	Dependent item	memcached.commands.set.rate Preprocessing JSON Path: `$.cmd_set` Change per second
Process id	PID of the server process	Dependent item	memcached.process_id Preprocessing JSON Path: `$.pid` Discard unchanged with heartbeat: `1d`
Memcached version	Version of the Memcached server	Dependent item	memcached.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `1d`
Uptime	Number of seconds since Memcached server start	Dependent item	memcached.uptime Preprocessing JSON Path: `$.uptime`
Bytes used	Current number of bytes used to store items.	Dependent item	memcached.stats.bytes Preprocessing JSON Path: `$.bytes`
Written bytes per second	The network's read rate per second in B/sec	Dependent item	memcached.stats.bytes_written.rate Preprocessing JSON Path: `$.bytes_written` Change per second
Read bytes per second	The network's read rate per second in B/sec	Dependent item	memcached.stats.bytes_read.rate Preprocessing JSON Path: `$.bytes_read` Change per second
Hits per second	Number of successful GET requests (items requested and found) per second.	Dependent item	memcached.stats.hits.rate Preprocessing JSON Path: `$.get_hits` Change per second
Misses per second	Number of missed GET requests (items requested but not found) per second.	Dependent item	memcached.stats.misses.rate Preprocessing JSON Path: `$.get_misses` Change per second
Evictions per second	"An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated. The item is selected with a pseudo-LRU mechanism. A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again."	Dependent item	memcached.stats.evictions.rate Preprocessing JSON Path: `$.evictions` Change per second
New items per second	Number of new items stored per second.	Dependent item	memcached.stats.total_items.rate Preprocessing JSON Path: `$.total_items` Change per second
Current number of items stored	Current number of items stored by this instance.	Dependent item	memcached.stats.curr_items Preprocessing JSON Path: `$.curr_items`
Threads	Number of worker threads requested	Dependent item	memcached.stats.threads Preprocessing JSON Path: `$.threads`

Triggers

Name	Description	Expression	Severity
Memcached: Service is down		`last(/Memcached by Zabbix agent 2/memcached.ping["{$MEMCACHED.CONN.URI}"])=0`\|Average	Manual close: Yes
Memcached: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Memcached by Zabbix agent 2/memcached.cpu.sys,30m)=1`\|Warning	Manual close: Yes Depends on: Memcached: Service is down
Memcached: Too many queued connections	The max number of connections is reached and a new connection had to wait in the queue as a result.	`min(/Memcached by Zabbix agent 2/memcached.connections.queued.rate,5m)>{$MEMCACHED.CONN.QUEUED.MAX.WARN}`\|Warning
Memcached: Too many throttled connections	Number of times a client connection was throttled is too high. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation.	`min(/Memcached by Zabbix agent 2/memcached.connections.throttled.rate,5m)>{$MEMCACHED.CONN.THROTTLED.MAX.WARN}`\|Warning
Memcached: Total number of connected clients is too high	When the number of connections reaches the value of the "max_connections" parameter, new connections will be rejected.	`min(/Memcached by Zabbix agent 2/memcached.connections.current,5m)/last(/Memcached by Zabbix agent 2/memcached.connections.max)*100>{$MEMCACHED.CONN.PRC.MAX.WARN}`\|Warning
Memcached: Version has changed	The Memcached version has changed. Acknowledge to close the problem manually.	`last(/Memcached by Zabbix agent 2/memcached.version,#1)<>last(/Memcached by Zabbix agent 2/memcached.version,#2) and length(last(/Memcached by Zabbix agent 2/memcached.version))>0`\|Info	Manual close: Yes
Memcached: Service has been restarted	Uptime is less than 10 minutes.	`last(/Memcached by Zabbix agent 2/memcached.uptime)<10m`\|Info	Manual close: Yes
Memcached: Memory usage is too high		`min(/Memcached by Zabbix agent 2/memcached.stats.bytes,5m)/last(/Memcached by Zabbix agent 2/memcached.config.limit_maxbytes)*100>{$MEMCACHED.MEM.PUSED.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_mantisbt_http

View README Download JSON

Mantis BT by HTTP

Overview

This template is designed for the effortless deployment of Mantis BT monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

MantisBT 2.22

Configuration

Setup

Generate the API token in Mantis BT. Use this manual for detailed instructions.
Change values for the {$MANTIS.URL} and {$MANTIS.TOKEN} macros.

Macros used

Name	Description	Default
{$MANTIS.URL}	MantisBT URL.
{$MANTIS.TOKEN}	MantisBT Token.
{$MANTIS.LLD.FILTER.PROJECTS.MATCHES}	Filter of discoverable projects.	`.*`
{$MANTIS.LLD.FILTER.PROJECTS.NOT_MATCHES}	Filter to exclude discovered projects.	`CHANGE_IF_NEEDED`
{$MANTIS.HTTP.PROXY}	Proxy for http requests.

Items

Name	Description	Type	Key and additional info
Get projects	Get projects from Mantis BT.	HTTP agent	mantisbt.get.projects

LLD rule Projects discovery

Name Description Type Key and additional info

Projects discovery

Discovery rule for a Mantis BT projects.

Dependent item

mantisbt.projects.discovery

Preprocessing

JSON Path: $.projects

Item prototypes for Projects discovery

Name	Description	Type	Key and additional info
Project [{#NAME}]: Get issues	Getting project issues.	HTTP agent	mantisbt.get.issues[{#NAME}]
Project [{#NAME}]: Total issues	Count of issues in project.	Dependent item	mantis.project.total_issues[{#NAME}] Preprocessing JSON Path: `$.issues.length()`
Project [{#NAME}]: New issues	Count of issues with 'new' status.	Dependent item	mantis.project.status.new_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='new')].length()`
Project [{#NAME}]: Resolved issues	Count of issues with 'resolved' status.	Dependent item	mantis.project.status.resolved_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='resolved')].length()`
Project [{#NAME}]: Closed issues	Count of issues with 'closed' status.	Dependent item	mantis.project.status.closed_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='closed')].length()`
Project [{#NAME}]: Assigned issues	Count of issues with 'assigned' status.	Dependent item	mantis.project.status.assigned_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='assigned')].length()`
Project [{#NAME}]: Feedback issues	Count of issues with 'feedback' status.	Dependent item	mantis.project.status.feedback_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='feedback')].length()`
Project [{#NAME}]: Acknowledged issues	Count of issues with 'acknowledged' status.	Dependent item	mantis.project.status.acknowledged_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='acknowledged')].length()`
Project [{#NAME}]: Confirmed issues	Count of issues with 'confirmed' status.	Dependent item	mantis.project.status.confirmed_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='confirmed')].length()`
Project [{#NAME}]: Open issues	Count of "open" resolution issues.	Dependent item	mantis.project.resolution.open_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='open')].length()`
Project [{#NAME}]: Fixed issues	Count of "fixed" resolution issues.	Dependent item	mantis.project.resolution.fixed_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='fixed')].length()`
Project [{#NAME}]: Reopened issues	Count of "reopened" resolution issues.	Dependent item	mantis.project.resolution.reopened_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='reopened')].length()`
Project [{#NAME}]: Unable to reproduce issues	Count of "unable to reproduce" resolution issues.	Dependent item	mantis.project.resolution.unabletoreproduce_issues[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Project [{#NAME}]: Not fixable issues	Count of "not fixable" resolution issues.	Dependent item	mantis.project.resolution.notfixableissues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='not fixable')].length()`
Project [{#NAME}]: Duplicate issues	Count of "duplicate" resolution issues.	Dependent item	mantis.project.resolution.duplicate_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='duplicate')].length()`
Project [{#NAME}]: No change required issues	Count of "no change required" resolution issues.	Dependent item	mantis.project.resolution.nochangerequired_issues[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Project [{#NAME}]: Suspended issues	Count of "suspended" resolution issues.	Dependent item	mantis.project.resolution.suspended_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='suspended')].length()`
Project [{#NAME}]: Will not fix issues	Count of "wont fix" resolution issues.	Dependent item	mantis.project.resolution.wontfixissues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='wont fix')].length()`
Project [{#NAME}]: Feature severity issues	Count of "feature" severity issues.	Dependent item	mantis.project.severity.feature_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='feature')].length()`
Project [{#NAME}]: Trivial severity issues	Count of "trivial" severity issues.	Dependent item	mantis.project.severity.trivial_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='trivial')].length()`
Project [{#NAME}]: Text severity issues	Count of "text" severity issues.	Dependent item	mantis.project.severity.text_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='text')].length()`
Project [{#NAME}]: Tweak severity issues	Count of "tweak" severity issues.	Dependent item	mantis.project.severity.tweak_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='tweak')].length()`
Project [{#NAME}]: Minor severity issues	Count of "minor" severity issues.	Dependent item	mantis.project.severity.minor_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='minor')].length()`
Project [{#NAME}]: Major severity issues	Count of "major" severity issues.	Dependent item	mantis.project.severity.major_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='major')].length()`
Project [{#NAME}]: Crash severity issues	Count of "crash" severity issues.	Dependent item	mantis.project.severity.crash_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='crash')].length()`
Project [{#NAME}]: Block severity issues	Count of "block" severity issues.	Dependent item	mantis.project.severity.block_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='block')].length()`
Project [{#NAME}]: None priority issues	Count of "none" priority issues.	Dependent item	mantis.project.priority.none_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='none')].length()`
Project [{#NAME}]: Low priority issues	Count of "low" priority issues.	Dependent item	mantis.project.priority.low_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='low')].length()`
Project [{#NAME}]: Normal priority issues	Count of "normal" priority issues.	Dependent item	mantis.project.priority.normal_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='normal')].length()`
Project [{#NAME}]: High priority issues	Count of "high" priority issues.	Dependent item	mantis.project.priority.high_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='high')].length()`
Project [{#NAME}]: Urgent priority issues	Count of "urgent" priority issues.	Dependent item	mantis.project.priority.urgent_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='urgent')].length()`
Project [{#NAME}]: Immediate priority issues	Count of "immediate" priority issues.	Dependent item	mantis.project.priority.immediate_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='immediate')].length()`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_state

View README Download JSON

Kubernetes cluster state by HTTP

Overview

The template to monitor Kubernetes state. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.

Template Kubernetes cluster state by HTTP - collects metrics by HTTP agent from kube-state-metrics endpoint and Kubernetes API.

Don't forget to change macros {$KUBE.API.URL} and {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Setup

Install the Zabbix Helm Chart in your Kubernetes cluster. Internal service metrics are collected from kube-state-metrics endpoint.

Template needs to use authorization via API token.

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command:

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}. Set {$KUBE.STATE.ENDPOINT.NAME} with Kube state metrics endpoint name. See kubectl -n monitoring get ep. Default: zabbix-kube-state-metrics.

NOTE. If you wish to monitor Controller Manager and Scheduler components, you might need to set the --binding-address option for them to the address where Zabbix proxy can reach them. For example, for clusters created with kubeadm it can be set in the following manifest files (changes will be applied immediately):

/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml

Depending on your Kubernetes distribution, you might need to adjust {$KUBE.CONTROL_PLANE.TAINT} macro (for example, set it to node-role.kubernetes.io/master for OpenShift).

NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.

Also, see the Macros section for a list of macros used to set trigger values.

Set up the macros to filter the metrics of discovered Kubelets by node names:

{$KUBE.LLD.FILTER.KUBELET_NODE.MATCHES}
{$KUBE.LLD.FILTER.KUBELETNODE.NOTMATCHES}

Set up macros to filter metrics by namespace:

{$KUBE.LLD.FILTER.NAMESPACE.MATCHES}
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES}

Set up macros to filter node metrics by nodename:

{$KUBE.LLD.FILTER.NODE.MATCHES}
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}

Note: If you have a large cluster, it is highly recommended to set a filter for discoverable namespaces.

You can use the {$KUBE.KUBELET.FILTER.LABELS} and {$KUBE.KUBELET.FILTER.ANNOTATIONS} macros for advanced filtering of kubelets by node labels and annotations.

Notes about labels and annotations filters:

Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (key1: value, key2: regexp).
ECMAScript syntax is used for regular expressions.
Filters are applied if such label key exists for the entity that is being filtered (it means that if you specify a key in the filter, entities that do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
You can also use the exclamation point symbol (!) to invert the filter (!key: value).

For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*. As a result, the kubelets on nodes 5-25 without the "ingress" role will be discovered.

See the Kubernetes documentation for details about labels and annotations:

You can also set up evaluation periods for replica mismatch triggers (Deployments, ReplicaSets, StatefulSets) with the macro {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}, which supports context and regular expressions. For example, you can create the following macros:

Set the evaluation period for the Deployment "nginx-deployment" in the namespace "default" to the 3 last values:

{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:default:nginx-deployment"} = #3

Set the evaluation period for all Deployments to the 10 last values:

{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"deployment:.*:.*"} = #10 or {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"^deployment.*"} = #10

Set the evaluation period for Deployments, ReplicaSets and StatefulSets in the namespace "default" to 15 minutes:

{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:".*:default:.*"} = 15m

Note that different context macros with regular expressions matching the same string can be applied in an undefined order, and simple context macros (without regular expressions) have higher priority. Read the Important notes section in Zabbix documentation for details.

Macros used

Name	Description	Default
{$KUBE.API.URL}	Kubernetes API endpoint URL in the format ://:	`https://kubernetes.default.svc.cluster.local:443`
{$KUBE.API.READYZ.ENDPOINT}	Kubernetes API readyz endpoint /readyz	`/readyz`
{$KUBE.API.LIVEZ.ENDPOINT}	Kubernetes API livez endpoint /livez	`/livez`
{$KUBE.API.COMPONENTSTATUSES.ENDPOINT}	Kubernetes API componentstatuses endpoint /api/v1/componentstatuses	`/api/v1/componentstatuses`
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$KUBE.STATE.ENDPOINT.NAME}	Kubernetes state endpoint name.	`zabbix-kube-state-metrics`
{$OPENSHIFT.STATE.ENDPOINT.NAME}	OpenShift state endpoint name.	`openshift-state-metrics`
{$KUBE.API_SERVER.SCHEME}	Kubernetes API servers metrics endpoint scheme. Used in ControlPlane LLD.	`https`
{$KUBE.API_SERVER.PORT}	Kubernetes API servers metrics endpoint port. Used in ControlPlane LLD.	`6443`
{$KUBE.CONTROL_PLANE.TAINT}	Taint that applies to control plane nodes. Change if needed. Used in ControlPlane LLD.	`node-role.kubernetes.io/control-plane`
{$KUBE.CONTROLLER_MANAGER.SCHEME}	Kubernetes Controller manager metrics endpoint scheme. Used in ControlPlane LLD.	`https`
{$KUBE.CONTROLLER_MANAGER.PORT}	Kubernetes Controller manager metrics endpoint port. Used in ControlPlane LLD.	`10257`
{$KUBE.SCHEDULER.SCHEME}	Kubernetes Scheduler metrics endpoint scheme. Used in ControlPlane LLD.	`https`
{$KUBE.SCHEDULER.PORT}	Kubernetes Scheduler metrics endpoint port. Used in ControlPlane LLD.	`10259`
{$KUBE.KUBELET.SCHEME}	Kubernetes Kubelet metrics endpoint scheme. Used in Kubelet LLD.	`https`
{$KUBE.KUBELET.PORT}	Kubernetes Kubelet metrics endpoint port. Used in Kubelet LLD.	`10250`
{$KUBE.LLD.FILTER.NAMESPACE.MATCHES}	Filter of discoverable metrics by namespace.	`.*`
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES}	Filter to exclude discovered metrics by namespace.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.NODE.MATCHES}	Filter of discoverable nodes by nodename.	`.*`
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}	Filter to exclude discovered nodes by nodename.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.KUBELET_NODE.MATCHES}	Filter of discoverable Kubelets by nodename.	`.*`
{$KUBE.LLD.FILTER.KUBELETNODE.NOTMATCHES}	Filter to exclude discovered Kubelets by nodename.	`CHANGE_IF_NEEDED`
{$KUBE.KUBELET.FILTER.ANNOTATIONS}	Node annotations to filter Kubelets (regex in values are supported). See the template's README.md for details.
{$KUBE.KUBELET.FILTER.LABELS}	Node labels to filter Kubelets (regex in values are supported). See the template's README.md for details.
{$KUBE.LLD.FILTER.PV.MATCHES}	Filter of discoverable persistent volumes by name.	`.*`
{$KUBE.LLD.FILTER.PV.NOT_MATCHES}	Filter to exclude discovered persistent volumes by name.	`CHANGE_IF_NEEDED`
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}	The evaluation period range which is used for calculation of expressions in trigger prototypes (time period or value range). Can be used with context.	`#5`

Items

Name	Description	Type	Key and additional info
Get state metrics	Collecting Kubernetes metrics from kube-state-metrics.	Script	kube.state.metrics
Control plane LLD	Generation of data for Control plane discovery rules.	Script	kube.control_plane.lld Preprocessing Discard unchanged with heartbeat: `3h`
Node LLD	Generation of data for Kubelet discovery rules.	Script	kube.node.lld Preprocessing Discard unchanged with heartbeat: `3h`
Get component statuses		HTTP agent	kube.componentstatuses Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get readyz		HTTP agent	kube.readyz Preprocessing JavaScript: `The text is too long. Please see the template.`
Get livez		HTTP agent	kube.livez Preprocessing JavaScript: `The text is too long. Please see the template.`
Namespace count	The number of namespaces.	Dependent item	kube.namespace.count Preprocessing Prometheus pattern: `COUNT(kube_namespace_created)` ⛔️Custom on fail: Discard value
CronJob count	Number of cronjobs.	Dependent item	kube.cronjob.count Preprocessing Prometheus pattern: `COUNT(kube_cronjob_created)` ⛔️Custom on fail: Discard value
Job count	Number of jobs (generated by cronjob + job).	Dependent item	kube.job.count Preprocessing Prometheus pattern: `COUNT(kube_job_created)` ⛔️Custom on fail: Discard value
Endpoint count	Number of endpoints.	Dependent item	kube.endpoint.count Preprocessing Prometheus pattern: `COUNT(kube_endpoint_created)` ⛔️Custom on fail: Discard value
Deployment count	The number of deployments.	Dependent item	kube.deployment.count Preprocessing Prometheus pattern: `COUNT(kube_deployment_created)` ⛔️Custom on fail: Discard value
Service count	The number of services.	Dependent item	kube.service.count Preprocessing Prometheus pattern: `COUNT(kube_service_created)` ⛔️Custom on fail: Discard value
StatefulSet count	The number of statefulsets.	Dependent item	kube.statefulset.count Preprocessing Prometheus pattern: `COUNT(kube_statefulset_created)` ⛔️Custom on fail: Discard value
Node count	The number of nodes.	Dependent item	kube.node.count Preprocessing Prometheus pattern: `COUNT(kube_node_created)` ⛔️Custom on fail: Discard value

LLD rule API servers discovery

Name	Description	Type	Key and additional info
API servers discovery		Dependent item	kube.api_servers.discovery

LLD rule Controller manager nodes discovery

Name	Description	Type	Key and additional info
Controller manager nodes discovery		Dependent item	kube.controller_manager.discovery

LLD rule Scheduler servers nodes discovery

Name	Description	Type	Key and additional info
Scheduler servers nodes discovery		Dependent item	kube.scheduler.discovery

LLD rule Kubelet discovery

Name	Description	Type	Key and additional info
Kubelet discovery		Dependent item	kube.kubelet.discovery

LLD rule Daemonset discovery

Name Description Type Key and additional info

Daemonset discovery

Dependent item

kube.daemonset.discovery

Preprocessing

Prometheus to JSON: kube_daemonset_status_number_ready

Item prototypes for Daemonset discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Ready	The number of nodes that should be running the daemon pod and have one or more running and ready.	Dependent item	kube.daemonset.ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Scheduled	The number of nodes that run at least one daemon pod and are supposed to.	Dependent item	kube.daemonset.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Desired	The number of nodes that should be running the daemon pod.	Dependent item	kube.daemonset.desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Misscheduled	The number of nodes that run a daemon pod but are not supposed to.	Dependent item	kube.daemonset.misscheduled[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Updated number scheduled	The total number of nodes that are running updated daemon pod.	Dependent item	kube.daemonset.updated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule PVC discovery

Name Description Type Key and additional info

PVC discovery

Dependent item

kube.pvc.discovery

Preprocessing

Prometheus to JSON: kube_persistentvolumeclaim_info
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for PVC discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase	The current status phase of the persistent volume claim.	Dependent item	kube.pvc.status_phase[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Namespace [{#NAMESPACE}] PVC [{#NAME}] Requested storage	The capacity of storage requested by the persistent volume claim.	Dependent item	kube.pvc.requested.storage[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] PVC status phase: Bound, sum	The total amount of persistent volume claims in the Bound phase.	Dependent item	kube.pvc.status_phase.bound.sum[{#NAMESPACE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] PVC status phase: Lost, sum	The total amount of persistent volume claims in the Lost phase.	Dependent item	kube.pvc.status_phase.lost.sum[{#NAMESPACE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] PVC status phase: Pending, sum	The total amount of persistent volume claims in the Pending phase.	Dependent item	kube.pvc.status_phase.pending.sum[{#NAMESPACE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Trigger prototypes for PVC discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: NS [{#NAMESPACE}] PVC [{#NAME}]: PVC is pending		`count(/Kubernetes cluster state by HTTP/kube.pvc.status_phase[{#NAMESPACE}/{#NAME}],2m,,5)>=2`\|Warning

LLD rule PV discovery

Name Description Type Key and additional info

PV discovery

Dependent item

kube.pv.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for PV discovery

Name	Description	Type	Key and additional info
PV [{#NAME}] Status phase	The current status phase of the persistent volume.	Dependent item	kube.pv.status_phase[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
PV [{#NAME}] Capacity bytes	A capacity of the persistent volume in bytes.	Dependent item	kube.pv.capacity.bytes[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
PV status phase: Pending, sum	The total amount of persistent volumes in the Pending phase.	Dependent item	kube.pv.status_phase.pending.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
PV status phase: Available, sum	The total amount of persistent volumes in the Available phase.	Dependent item	kube.pv.status_phase.available.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
PV status phase: Bound, sum	The total amount of persistent volumes in the Bound phase.	Dependent item	kube.pv.status_phase.bound.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
PV status phase: Released, sum	The total amount of persistent volumes in the Released phase.	Dependent item	kube.pv.status_phase.released.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
PV status phase: Failed, sum	The total amount of persistent volumes in the Failed phase.	Dependent item	kube.pv.status_phase.failed.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Trigger prototypes for PV discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: PV [{#NAME}]: PV has failed		`count(/Kubernetes cluster state by HTTP/kube.pv.status_phase[{#NAME}],2m,,3)>=2`\|Warning

LLD rule Deployment discovery

Name Description Type Key and additional info

Deployment discovery

Dependent item

kube.deployment.discovery

Preprocessing

Prometheus to JSON: kube_deployment_spec_paused
Discard unchanged with heartbeat: 3h

Item prototypes for Deployment discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Paused	Whether the deployment is paused and will not be processed by the deployment controller.	Dependent item	kube.deployment.spec_paused[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas desired	Number of desired pods for a deployment.	Dependent item	kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Rollingupdate max unavailable	Maximum number of unavailable replicas during a rolling update of a deployment.	Dependent item	kube.deployment.rollingupdate.max_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas	The number of replicas per deployment.	Dependent item	kube.deployment.replicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas available	The number of available replicas per deployment.	Dependent item	kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas unavailable	The number of unavailable replicas per deployment.	Dependent item	kube.deployment.replicas_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas updated	The number of updated replicas per deployment.	Dependent item	kube.deployment.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas mismatched	The number of available replicas not matching the desired number of replicas.	Dependent item	kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus to JSON: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for Deployment discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Deployment replicas mismatch	Deployment has not matched the expected number of replicas during the specified trigger evaluation period.	`min(/Kubernetes cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0`\|Warning

LLD rule Endpoint discovery

Name Description Type Key and additional info

Endpoint discovery

Dependent item

kube.endpoint.discovery

Preprocessing

Prometheus to JSON: kube_endpoint_created

Item prototypes for Endpoint discovery

Name Description Type Key and additional info

Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address available

Number of addresses available in endpoint.

Dependent item

kube.endpoint.address_available[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address not ready

Number of addresses not ready in endpoint.

Dependent item

kube.endpoint.addressnotready[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Age

Endpoint age (number of seconds since creation).

Dependent item

kube.endpoint.age[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JavaScript: return (Math.floor(Date.now()/1000)-Number(value));

LLD rule Node discovery

Name Description Type Key and additional info

Node discovery

Dependent item

kube.node.discovery

Preprocessing

Prometheus to JSON: kube_node_info

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Node [{#NAME}]: CPU allocatable	The CPU resources of a node that are available for scheduling.	Dependent item	kube.node.cpu_allocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: Memory allocatable	The memory resources of a node that are available for scheduling.	Dependent item	kube.node.memory_allocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: Pods allocatable	The pods resources of a node that are available for scheduling.	Dependent item	kube.node.pods_allocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: Ephemeral storage allocatable	The allocatable ephemeral storage of a node that is available for scheduling.	Dependent item	kube.node.ephemeralstorageallocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: CPU capacity	The capacity for CPU resources of a node.	Dependent item	kube.node.cpu_capacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: Memory capacity	The capacity for memory resources of a node.	Dependent item	kube.node.memory_capacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: Ephemeral storage capacity	The ephemeral storage capacity of a node.	Dependent item	kube.node.ephemeralstoragecapacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Node [{#NAME}]: Pods capacity	The capacity for pods resources of a node.	Dependent item	kube.node.pods_capacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Pod discovery

Name Description Type Key and additional info

Pod discovery

Dependent item

kube.pod.discovery

Preprocessing

Prometheus to JSON: kube_pod_start_time

Item prototypes for Pod discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Pending	Pod is in pending state.	Dependent item	kube.pod.phase.pending[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Pending"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Succeeded	Pod is in succeeded state.	Dependent item	kube.pod.phase.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Failed	Pod is in failed state.	Dependent item	kube.pod.phase.failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Failed"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Unknown	Pod is in unknown state.	Dependent item	kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Unknown"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Running	Pod is in unknown state.	Dependent item	kube.pod.phase.running[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Running"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers terminated	Describes whether the container is currently in terminated state.	Dependent item	kube.pod.containers_terminated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_terminated{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers waiting	Describes whether the container is currently in waiting state.	Dependent item	kube.pod.containers_waiting[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_waiting{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers ready	Describes whether the containers readiness check succeeded.	Dependent item	kube.pod.containers_ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_ready{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers restarts	The number of container restarts.	Dependent item	kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_restarts_total{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers running	Describes whether the container is currently in running state.	Dependent item	kube.pod.containers_running[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_running{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Ready	Describes whether the pod is ready to serve requests.	Dependent item	kube.pod.ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Scheduled	Describes the status of the scheduling process for the pod.	Dependent item	kube.pod.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Unschedulable	Describes the unschedulable status for the pod.	Dependent item	kube.pod.unschedulable[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_unschedulable{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU limits	The limit on CPU cores to be used by a container.	Dependent item	kube.pod.containers.limits.cpu[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory limits	The limit on memory to be used by a container.	Dependent item	kube.pod.containers.limits.memory[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU requests	The number of requested CPU cores by a container.	Dependent item	kube.pod.containers.requests.cpu[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory requests	The number of requested memory bytes by a container.	Dependent item	kube.pod.containers.requests.memory[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Trigger prototypes for Pod discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is not healthy		`min(/Kubernetes cluster state by HTTP/kube.pod.phase.failed[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.pending[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}],10m)>0`\|High
Kubernetes cluster state: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is crash looping	Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state.	`(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],15m))>1`\|Warning

LLD rule ReplicaSet discovery

Name Description Type Key and additional info

ReplicaSet discovery

Dependent item

kube.replicaset.discovery

Preprocessing

Prometheus to JSON: kube_replicaset_status_replicas
Discard unchanged with heartbeat: 3h

Item prototypes for ReplicaSet discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas	The number of replicas per ReplicaSet.	Dependent item	kube.replicaset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Desired replicas	Number of desired pods for a ReplicaSet.	Dependent item	kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Fully labeled replicas	The number of fully labeled replicas per ReplicaSet.	Dependent item	kube.replicaset.fullylabeledreplicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Ready	The number of ready replicas per ReplicaSet.	Dependent item	kube.replicaset.ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas mismatched	The number of ready replicas not matching the desired number of replicas.	Dependent item	kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus to JSON: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for ReplicaSet discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Namespace [{#NAMESPACE}] RS [{#NAME}]: ReplicaSet mismatch	ReplicaSet has not matched the expected number of replicas during the specified trigger evaluation period.	`min(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"replicaset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.ready[{#NAMESPACE}/{#NAME}])>=0`\|Warning

LLD rule StatefulSet discovery

Name Description Type Key and additional info

StatefulSet discovery

Dependent item

kube.statefulset.discovery

Preprocessing

Prometheus to JSON: kube_statefulset_status_replicas
Discard unchanged with heartbeat: 3h

Item prototypes for StatefulSet discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas	The number of replicas per StatefulSet.	Dependent item	kube.statefulset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Desired replicas	Number of desired pods for a StatefulSet.	Dependent item	kube.statefulset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Current replicas	The number of current replicas per StatefulSet.	Dependent item	kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Ready replicas	The number of ready replicas per StatefulSet.	Dependent item	kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Updated replicas	The number of updated replicas per StatefulSet.	Dependent item	kube.statefulset.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas mismatched	The number of ready replicas not matching the number of replicas.	Dependent item	kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus to JSON: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for StatefulSet discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet is down		`(last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))<>1`\|High
Kubernetes cluster state: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet replicas mismatch	StatefulSet has not matched the number of replicas during the specified trigger evaluation period.	`min(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"statefulset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}])>=0`\|Warning

LLD rule PodDisruptionBudget discovery

Name Description Type Key and additional info

PodDisruptionBudget discovery

Dependent item

kube.pdb.discovery

Preprocessing

Prometheus to JSON: kube_poddisruptionbudget_created

Item prototypes for PodDisruptionBudget discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods healthy	Current number of healthy pods.	Dependent item	kube.pdb.pods_healthy[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods desired	Minimum desired number of healthy pods.	Dependent item	kube.pdb.pods_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Disruptions allowed	Number of pod disruptions that are allowed.	Dependent item	kube.pdb.disruptions_allowed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods total	Total number of pods counted by this disruption budget.	Dependent item	kube.pdb.pods_total[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule CronJob discovery

Name Description Type Key and additional info

CronJob discovery

Dependent item

kube.cronjob.discovery

Preprocessing

Prometheus to JSON: kube_cronjob_created

Item prototypes for CronJob discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Suspend	Suspend flag tells the controller to suspend subsequent executions.	Dependent item	kube.cronjob.spec_suspend[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Active	Active holds pointers to currently running jobs.	Dependent item	kube.cronjob.status_active[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Last schedule	LastScheduleTime keeps information of when was the last time the job was successfully scheduled.	Dependent item	kube.cronjob.lastscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1`
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Next schedule	Next time the cronjob should be scheduled. The time after lastScheduleTime or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed.	Dependent item	kube.cronjob.nextscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1`
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Failed	The number of pods which reached the Failed phase and the reason for failure.	Dependent item	kube.cronjob.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Succeeded	The number of pods which reached the Succeeded phase.	Dependent item	kube.cronjob.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion succeeded	Number of jobs the execution of which has been completed.	Dependent item	kube.cronjob.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion failed	Number of jobs the execution of which has failed.	Dependent item	kube.cronjob.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Job discovery

Name Description Type Key and additional info

Job discovery

Dependent item

kube.job.discovery

Preprocessing

Prometheus to JSON: kube_job_owner{owner_is_controller!="true"}

Item prototypes for Job discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] Job [{#NAME}]: Failed	The number of pods which reached the Failed phase and the reason for failure.	Dependent item	kube.job.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Job [{#NAME}]: Succeeded	The number of pods which reached the Succeeded phase.	Dependent item	kube.job.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion succeeded	Number of jobs the execution of which has been completed.	Dependent item	kube.job.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion failed	Number of jobs the execution of which has failed.	Dependent item	kube.job.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Component statuses discovery

Name Description Type Key and additional info

Component statuses discovery

Dependent item

kube.componentstatuses.discovery

Preprocessing

JSON Path: $.items
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Item prototypes for Component statuses discovery

Name Description Type Key and additional info

Component [{#NAME}]: Healthy

Cluster component healthy.

Dependent item

kube.componentstatuses.healthy[{#NAME}]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Trigger prototypes for Component statuses discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Component [{#NAME}] is unhealthy		`count(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}],#2,"ne","True")=2 and length(last(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}]))>0`\|Warning

LLD rule Readyz discovery

Name Description Type Key and additional info

Readyz discovery

Dependent item

kube.readyz.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Readyz discovery

Name Description Type Key and additional info

Readyz [{#NAME}]: Healthcheck

Result of readyz healthcheck for component.

Dependent item

kube.readyz.healthcheck[{#NAME}]

Preprocessing

JSON Path: $.[?(@.name == "{#NAME}")].value.first()
⛔️Custom on fail: Discard value

Trigger prototypes for Readyz discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Readyz [{#NAME}] is unhealthy		`count(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}],#2,"ne","ok")=2 and length(last(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}]))>0`\|Warning

LLD rule Livez discovery

Name Description Type Key and additional info

Livez discovery

Dependent item

kube.livez.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Livez discovery

Name Description Type Key and additional info

Livez [{#NAME}]: Healthcheck

Result of livez healthcheck for component.

Dependent item

kube.livez.healthcheck[{#NAME}]

Preprocessing

JSON Path: $.[?(@.name == "{#NAME}")].value.first()
⛔️Custom on fail: Discard value

Trigger prototypes for Livez discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Livez [{#NAME}] is unhealthy		`count(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}],#2,"ne","ok")=2 and length(last(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}]))>0`\|Warning

LLD rule OpenShift BuildConfig discovery

Name Description Type Key and additional info

OpenShift BuildConfig discovery

Dependent item

openshift.buildconfig.discovery

Preprocessing

Prometheus to JSON: openshift_buildconfig_created
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift BuildConfig discovery

Name Description Type Key and additional info

Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Created

OpenShift BuildConfig Unix creation timestamp.

Dependent item

openshift.buildconfig.created.time[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Custom multiplier: 1
Discard unchanged with heartbeat: 3h

Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Generation

Sequence number representing a specific generation of the desired state.

Dependent item

openshift.buildconfig.generation[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Latest version

The latest version of BuildConfig.

Dependent item

openshift.buildconfig.status[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

LLD rule OpenShift Build discovery

Name Description Type Key and additional info

OpenShift Build discovery

Dependent item

openshift.build.discovery

Preprocessing

Prometheus to JSON: openshift_build_created
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift Build discovery

Name Description Type Key and additional info

Namespace [{#NAMESPACE}] Build [{#NAME}]: Created

OpenShift Build Unix creation timestamp.

Dependent item

openshift.build.created.time[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Custom multiplier: 1
Discard unchanged with heartbeat: 3h

Namespace [{#NAMESPACE}] Build [{#NAME}]: Generation

Sequence number representing a specific generation of the desired state.

Dependent item

openshift.build.sequence.number[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Namespace [{#NAMESPACE}] Build [{#NAME}]: Status phase

The Build phase.

Dependent item

openshift.build.status_phase[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.

Trigger prototypes for OpenShift Build discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Build [{#NAME}]: Build has failed		`count(/Kubernetes cluster state by HTTP/openshift.build.status_phase[{#NAMESPACE}/{#NAME}],2m,"ge",6)>=2`\|Warning

LLD rule OpenShift ClusterResourceQuota discovery

Name Description Type Key and additional info

OpenShift ClusterResourceQuota discovery

Dependent item

openshift.cluster.resource.quota.discovery

Preprocessing

Prometheus to JSON: openshift_clusterresourcequota_usage
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift ClusterResourceQuota discovery

Name Description Type Key and additional info

Quota [{#NAME}] Resource [{#RESOURCE}]: Type [{#TYPE}]]

Usage about resource quota.

Dependent item

openshift.cluster.resource.quota[{#RESOURCE}/{#NAME}/{#TYPE}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule OpenShift Route discovery

Name Description Type Key and additional info

OpenShift Route discovery

Dependent item

openshift.route.discovery

Preprocessing

Prometheus to JSON: openshift_route_info
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift Route discovery

Name Description Type Key and additional info

Namespace [{#NAMESPACE}] Route [{#NAME}]: Created

OpenShift Route Unix creation timestamp.

Dependent item

openshift.route.created.time[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Custom multiplier: 1
Discard unchanged with heartbeat: 3h

Namespace [{#NAMESPACE}] Route [{#NAME}]: Status

Information about route status.

Dependent item

openshift.route.status[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: openshift_route_status{route="{#NAME}"} == 1 label status
⛔️Custom on fail: Discard value
Boolean to decimal

Trigger prototypes for OpenShift Route discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes cluster state: Route [{#NAME}] with issue: Status is false		`count(/Kubernetes cluster state by HTTP/openshift.route.status[{#NAMESPACE}/{#NAME}],2m,,0)>=2`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_scheduler

View README Download JSON

Kubernetes Scheduler by HTTP

Overview

The template to monitor Kubernetes Scheduler by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes Scheduler by HTTP - collects metrics by HTTP agent from Scheduler /metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes Scheduler 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.SCHEDULER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. You might need to set the --binding-address option for Scheduler to the address where Zabbix proxy can reach it. For example, for clusters created with kubeadm it can be set in the following manifest file (changes will be applied immediately):

/etc/kubernetes/manifests/kube-scheduler.yaml

NOTE. Some metrics may not be collected depending on your Kubernetes Scheduler instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.SCHEDULER.SERVER.URL}	Kubernetes Scheduler metrics endpoint URL.	`https://localhost:10259/metrics`
{$KUBE.API.TOKEN}	API Authorization Token.
{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR}	Maximum number of HTTP client requests failures used for trigger.	`2`
{$KUBE.SCHEDULER.UNSCHEDULABLE}	Maximum number of scheduling failures with 'unschedulable' used for trigger.	`2`
{$KUBE.SCHEDULER.ERROR}	Maximum number of scheduling failures with 'error' used for trigger.	`2`

Items

Name	Description	Type	Key and additional info
Get Scheduler metrics	Get raw metrics from Scheduler instance /metrics endpoint.	HTTP agent	kubernetes.scheduler.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kubernetes.scheduler.processvirtualmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Resident memory, bytes	Resident memory size in bytes.	Dependent item	kubernetes.scheduler.processresidentmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
CPU	Total user and system CPU usage ratio.	Dependent item	kubernetes.scheduler.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second Custom multiplier: `100`
Goroutines	Number of goroutines that currently exist.	Dependent item	kubernetes.scheduler.go_goroutines Preprocessing Prometheus pattern: `SUM(go_goroutines)` ⛔️Custom on fail: Discard value
Go threads	Number of OS threads created.	Dependent item	kubernetes.scheduler.go_threads Preprocessing Prometheus pattern: `VALUE(go_threads)` ⛔️Custom on fail: Discard value
Fds open	Number of open file descriptors.	Dependent item	kubernetes.scheduler.open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Fds max	Maximum allowed open file descriptors.	Dependent item	kubernetes.scheduler.max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
REST Client requests: 2xx, rate	Number of HTTP requests with 2xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_200.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second
REST Client requests: 3xx, rate	Number of HTTP requests with 3xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_300.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
REST Client requests: 4xx, rate	Number of HTTP requests with 4xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_400.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
REST Client requests: 5xx, rate	Number of HTTP requests with 5xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_500.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second
Schedule attempts: scheduled	Number of attempts to schedule pods with result "scheduled" per second.	Dependent item	kubernetes.scheduler.schedulerscheduleattempts.scheduled.rate Preprocessing Prometheus pattern: `SUM(scheduler_schedule_attempts_total{result = "scheduled"})` ⛔️Custom on fail: Discard value Change per second
Schedule attempts: unschedulable	Number of attempts to schedule pods with result "unschedulable" per second.	Dependent item	kubernetes.scheduler.schedulerscheduleattempts.unschedulable.rate Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Schedule attempts: error	Number of attempts to schedule pods with result "error" per second.	Dependent item	kubernetes.scheduler.schedulerscheduleattempts.error.rate Preprocessing Prometheus pattern: `SUM(scheduler_schedule_attempts_total{result = "error"})` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression
Kubernetes Scheduler: Too many REST Client errors	"Kubernetes Scheduler REST Client requests is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.client_http_requests_500.rate,5m)>{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR}`\|Warning
Kubernetes Scheduler: Too many unschedulable pods	Number of attempts to schedule pods with 'unschedulable' result is too high. 'unschedulable' means a pod could not be scheduled.	`min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate,5m)>{$KUBE.SCHEDULER.UNSCHEDULABLE}`\|Warning
Kubernetes Scheduler: Too many schedule attempts with errors	Number of attempts to schedule pods with 'error' result is too high. 'error' means an internal scheduler problem.	`min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.error.rate,5m)>{$KUBE.SCHEDULER.ERROR}`\|Warning

LLD rule Scheduling algorithm histogram

Name Description Type Key and additional info

Scheduling algorithm histogram

Discovery raw data of scheduling algorithm latency.

Dependent item

kubernetes.scheduler.scheduling_algorithm.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Scheduling algorithm histogram

Name	Description	Type	Key and additional info
Scheduling algorithm duration bucket, {#LE}	Scheduling algorithm latency in seconds.	Dependent item	kubernetes.scheduler.schedulingalgorithmduration[{#LE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Scheduling algorithm duration, p90	90 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p90[{#SINGLETON}]
Scheduling algorithm duration, p95	95 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p95[{#SINGLETON}]
Scheduling algorithm duration, p99	99 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p99[{#SINGLETON}]
Scheduling algorithm duration, p50	50 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p50[{#SINGLETON}]

LLD rule Binding histogram

Name Description Type Key and additional info

Binding histogram

Discovery raw data of binding latency.

Dependent item

kubernetes.scheduler.binding.discovery

Preprocessing

Prometheus to JSON: {__name__=~ "scheduler_binding_duration_seconds_*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Binding histogram

Name	Description	Type	Key and additional info
Binding duration bucket, {#LE}	Binding latency in seconds.	Dependent item	kubernetes.scheduler.binding_duration[{#LE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Binding duration, p90	90 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp90[{#SINGLETON}]
Binding duration, p95	99 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp95[{#SINGLETON}]
Binding duration, p99	95 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp99[{#SINGLETON}]
Binding duration, p50	50 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp50[{#SINGLETON}]

LLD rule e2e scheduling histogram

Name Description Type Key and additional info

e2e scheduling histogram

Discovery raw data and percentile items of e2e scheduling latency.

Dependent item

kubernetes.controller.e2e_scheduling.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for e2e scheduling histogram

Name	Description	Type	Key and additional info
["{#RESULT}"]: e2e scheduling seconds bucket, {#LE}	E2e scheduling latency in seconds (scheduling algorithm + binding)	Dependent item	kubernetes.scheduler.e2eschedulingbucket[{#LE},"{#RESULT}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
["{#RESULT}"]: e2e scheduling, p50	50 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp50["{#RESULT}"]
["{#RESULT}"]: e2e scheduling, p90	90 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp90["{#RESULT}"]
["{#RESULT}"]: e2e scheduling, p95	95 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp95["{#RESULT}"]
["{#RESULT}"]: e2e scheduling, p99	95 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp99["{#RESULT}"]

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_nodes

View README Download JSON

Kubernetes nodes by HTTP

Overview

The template to monitor Kubernetes nodes that work without any external scripts. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API. Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F7.0) in your Kubernetes cluster.

Change the values according to the environment in the file $HOME/zabbix_values.yaml.

For example:

## Enables use of Zabbix proxy enabled: false

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.

Set up the macros to filter the metrics of discovered nodes

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Setup

Install the Zabbix Helm Chart in your Kubernetes cluster.

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}. Set {$KUBE.NODES.ENDPOINT.NAME} with Zabbix agent's endpoint name. See kubectl -n monitoring get ep. Default: zabbix-zabbix-helm-chrt-agent.

Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:

{$KUBE.LLD.FILTER.NODE.MATCHES}
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}

Set up macros to filter pod metrics by namespace:

{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}

Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.

You can use the {$KUBE.NODE.FILTER.LABELS}, {$KUBE.POD.FILTER.LABELS}, {$KUBE.NODE.FILTER.ANNOTATIONS} and {$KUBE.POD.FILTER.ANNOTATIONS} macros for advanced filtering of nodes and pods by labels and annotations.

Notes about labels and annotations filters:

Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (key1: value, key2: regexp).
ECMAScript syntax is used for regular expressions.
Filters are applied if such a label key exists for the entity that is being filtered (it means that if you specify a key in a filter, entities which do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
You can also use the exclamation point symbol (!) to invert the filter (!key: value).

For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*. As a result, the nodes 5-25 without the "ingress" role will be discovered.

See the Kubernetes documentation for details about labels and annotations:

Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.

Macros used

Name	Description	Default
{$KUBE.API.URL}	Kubernetes API endpoint URL in the format ://:	`https://kubernetes.default.svc.cluster.local:443`
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$KUBE.NODES.ENDPOINT.NAME}	Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep".	`zabbix-zabbix-helm-chrt-agent`
{$KUBE.LLD.FILTER.NODE.MATCHES}	Filter of discoverable nodes.	`.*`
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}	Filter of discoverable nodes by role.	`.*`
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}	Filter to exclude discovered node by role.	`CHANGE_IF_NEEDED`
{$KUBE.NODE.FILTER.ANNOTATIONS}	Annotations to filter nodes (regex in values are supported). See the template's README.md for details.
{$KUBE.NODE.FILTER.LABELS}	Labels to filter nodes (regex in values are supported). See the template's README.md for details.
{$KUBE.POD.FILTER.ANNOTATIONS}	Annotations to filter pods (regex in values are supported). See the template's README.md for details.
{$KUBE.POD.FILTER.LABELS}	Labels to filter Pods (regex in values are supported). See the template's README.md for details.
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}	Filter of discoverable pods by namespace.	`.*`
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}	Filter to exclude discovered pods by namespace.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Get nodes

Collecting and processing cluster nodes data via Kubernetes API.

Script

kube.nodes

Get nodes check

Data collection check.

Dependent item

kube.nodes.check

Preprocessing

JSON Path: $.error
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: 3h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes nodes: Failed to get nodes		`length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0`\|Warning

LLD rule Node discovery

Name Description Type Key and additional info

Node discovery

Dependent item

kube.node.discovery

Preprocessing

JSON Path: $.nodes..filternode

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Node [{#NAME}]: Get data	Collecting and processing cluster by node [{#NAME}] data via Kubernetes API.	Dependent item	kube.node.get[{#NAME}] Preprocessing JSON Path: `$.nodes..[?(@.metadata.name == "{#NAME}")].first()`
Node [{#NAME}] Addresses: External IP	Typically the IP address of the node that is externally routable (available from outside the cluster).	Dependent item	kube.node.addresses.external_ip[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Addresses: Internal IP	Typically the IP address of the node that is routable only within the cluster.	Dependent item	kube.node.addresses.internal_ip[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Allocatable: CPU	Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.	Dependent item	kube.node.allocatable.cpu[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.cpu`
Node [{#NAME}] Allocatable: Memory	Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.	Dependent item	kube.node.allocatable.memory[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.memory`
Node [{#NAME}] Allocatable: Pods	https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/	Dependent item	kube.node.allocatable.pods[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.pods`
Node [{#NAME}] Capacity: CPU	CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity	Dependent item	kube.node.capacity.cpu[{#NAME}] Preprocessing JSON Path: `$.status.capacity.cpu`
Node [{#NAME}] Capacity: Memory	Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity	Dependent item	kube.node.capacity.memory[{#NAME}] Preprocessing JSON Path: `$.status.capacity.memory`
Node [{#NAME}] Capacity: Pods	https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/	Dependent item	kube.node.capacity.pods[{#NAME}] Preprocessing JSON Path: `$.status.capacity.pods`
Node [{#NAME}] Conditions: Disk pressure	True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.	Dependent item	kube.node.conditions.diskpressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Memory pressure	True if pressure exists on the node memory - that is, if the node memory is low; otherwise False.	Dependent item	kube.node.conditions.memorypressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Network unavailable	True if the network for the node is not correctly configured, otherwise False.	Dependent item	kube.node.conditions.networkunavailable[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: PID pressure	True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False.	Dependent item	kube.node.conditions.pidpressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Ready	True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).	Dependent item	kube.node.conditions.ready[{#NAME}] Preprocessing JSON Path: `$.status.conditions[?(@.type == "Ready")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Info: Architecture	Node architecture.	Dependent item	kube.node.info.architecture[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.architecture` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Container runtime	Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/	Dependent item	kube.node.info.containerruntime[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.containerRuntimeVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Kernel version	Node kernel version.	Dependent item	kube.node.info.kernelversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kernelVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Kubelet version	Version of Kubelet.	Dependent item	kube.node.info.kubeletversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kubeletVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: KubeProxy version	Version of KubeProxy.	Dependent item	kube.node.info.kubeproxyversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kubeProxyVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Operating system	Node operating system.	Dependent item	kube.node.info.operatingsystem[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.operatingSystem` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: OS image	Node OS image.	Dependent item	kube.node.info.osversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kernelVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Roles	Node roles.	Dependent item	kube.node.info.roles[{#NAME}] Preprocessing JSON Path: `$.status.roles` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Limits: CPU	Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.limits.cpu[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Limits: Memory	Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.limits.memory[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Requests: CPU	Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.requests.cpu[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Requests: Memory	Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.requests.memory[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Uptime	Node uptime.	Dependent item	kube.node.uptime[{#NAME}] Preprocessing JSON Path: `$.metadata.creationTimestamp` ⛔️Custom on fail: Discard value JavaScript: `return Math.floor((Date.now() - new Date(value)) / 1000);`
Node [{#NAME}] Used: Pods	Current number of pods on the node.	Dependent item	kube.node.used.pods[{#NAME}] Preprocessing JSON Path: `$.status.podsCount`

Trigger prototypes for Node discovery

Name	Description	Expression	Severity
Kubernetes nodes: Node [{#NAME}] Conditions: Pressure exists on the disk size	True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.	`last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1`\|Warning
Kubernetes nodes: Node [{#NAME}] Conditions: Pressure exists on the node memory	True - pressure exists on the node memory - that is, if the node memory is low; otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1`\|Warning
Kubernetes nodes: Node [{#NAME}] Conditions: Network is not correctly configured	True - the network for the node is not correctly configured, otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1`\|Warning
Kubernetes nodes: Node [{#NAME}] Conditions: Pressure exists on the processes	True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1`\|Warning
Kubernetes nodes: Node [{#NAME}] Conditions: Is not in Ready state	False - if the node is not healthy and is not accepting pods. Unknown - if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).	`last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1`\|Warning
Kubernetes nodes: Node [{#NAME}] Limits: Total CPU limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9`\|Warning	Depends on: Kubernetes nodes: Node [{#NAME}] Limits: Total CPU limits are too high
Kubernetes nodes: Node [{#NAME}] Limits: Total CPU limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1`\|Average
Kubernetes nodes: Node [{#NAME}] Limits: Total memory limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9`\|Warning	Depends on: Kubernetes nodes: Node [{#NAME}] Limits: Total memory limits are too high
Kubernetes nodes: Node [{#NAME}] Limits: Total memory limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1`\|Average
Kubernetes nodes: Node [{#NAME}] Requests: Total CPU requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5`\|Warning	Depends on: Kubernetes nodes: Node [{#NAME}] Requests: Total CPU requests are too high
Kubernetes nodes: Node [{#NAME}] Requests: Total CPU requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8`\|Average
Kubernetes nodes: Node [{#NAME}] Requests: Total memory requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5`\|Warning	Depends on: Kubernetes nodes: Node [{#NAME}] Requests: Total memory requests are too high
Kubernetes nodes: Node [{#NAME}] Requests: Total memory requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8`\|Average
Kubernetes nodes: Node [{#NAME}] has been restarted	Uptime is less than 10 minutes.	`last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10`\|Info
Kubernetes nodes: Node [{#NAME}] Used: Kubelet too many pods	Kubelet is running at capacity.	`last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9`\|Warning

LLD rule Pod discovery

Name Description Type Key and additional info

Pod discovery

Dependent item

kube.pod.discovery

Preprocessing

JSON Path: $.Pods
Discard unchanged with heartbeat: 3h

Item prototypes for Pod discovery

Name	Description	Type	Key and additional info
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}]: Get data	Collecting and processing cluster by node [{#NODE}] data via Kubernetes API.	Dependent item	kube.pod.get[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Containers ready	All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.containers_ready[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "ContainersReady")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Initialized	All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.initialized[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "Initialized")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Ready	The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.ready[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "Ready")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Scheduled	The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.scheduled[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "PodScheduled")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Containers: Restarts	The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection.	Dependent item	kube.pod.containers.restartcount[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.containers.restartCount` ⛔️Custom on fail: Discard value
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Status: Phase	The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase	Dependent item	kube.pod.status.phase[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.phase` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}]: Uptime	Pod uptime.	Dependent item	kube.pod.uptime[{#NAMESPACE}/{#POD}] Preprocessing JSON Path: `$.startTime` ⛔️Custom on fail: Discard value JavaScript: `return Math.floor((Date.now() - new Date(value)) / 1000);`

Trigger prototypes for Pod discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes nodes: Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}]: Pod is crash looping	Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state.	`(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#NAMESPACE}/{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#NAMESPACE}/{#POD}],15m))>1`\|Warning
Kubernetes nodes: Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Status: Kubernetes Pod not healthy	Pod has been in a non-ready state for longer than 10 minutes.	`count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#NAMESPACE}/{#POD}],10m, "regexp","^(1\|4\|5)$")>=9`\|High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_kubelet

View README Download JSON

Kubernetes Kubelet by HTTP

Overview

The template to monitor Kubernetes Kubelet by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes Kubelet by HTTP - collects metrics by HTTP agent from Kubelet /metrics endpoint.

Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.

NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.

NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.KUBELET.URL}	Kubernetes Kubelet instance URL.	`https://localhost:10250`
{$KUBE.KUBELET.METRIC.ENDPOINT}	Kubelet /metrics endpoint.	`/metrics`
{$KUBE.KUBELET.CADVISOR.ENDPOINT}	cAdvisor metrics from Kubelet /metrics/cadvisor endpoint.	`/metrics/cadvisor`
{$KUBE.KUBELET.PODS.ENDPOINT}	Kubelet /pods endpoint.	`/pods`

Items

Name	Description	Type	Key and additional info
Get kubelet metrics	Collecting raw Kubelet metrics from /metrics endpoint.	HTTP agent	kube.kubelet.metrics
Get cadvisor metrics	Collecting raw Kubelet metrics from /metrics/cadvisor endpoint.	HTTP agent	kube.cadvisor.metrics
Get pods	Collecting raw Kubelet metrics from /pods endpoint.	HTTP agent	kube.pods
Pods running	The number of running pods.	Dependent item	kube.kubelet.pods.running Preprocessing JSON Path: `$.items[?(@.status.phase == "Running")].length()`
Containers started	The number of started containers.	Dependent item	kube.kubelet.containers.started Preprocessing JSON Path: `The text is too long. Please see the template.`
Containers ready	The number of ready containers.	Dependent item	kube.kubelet.containers.ready Preprocessing JSON Path: `The text is too long. Please see the template.`
Containers last state terminated	The number of containers that were previously terminated.	Dependent item	kube.kublet.containers.terminated Preprocessing JSON Path: `The text is too long. Please see the template.`
Containers restarts	The number of times the container has been restarted.	Dependent item	kube.kubelet.containers.restarts Preprocessing JSON Path: `$.items[].status.containerStatuses[].restartCount.sum()`
CPU cores, total	The number of cores in this machine (available until kubernetes v1.18).	Dependent item	kube.kubelet.cpu.cores Preprocessing Prometheus pattern: `VALUE(machine_cpu_cores)`
Machine memory, bytes	Resident memory size in bytes.	Dependent item	kube.kubelet.machine.memory Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kube.kubelet.virtual.memory Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
File descriptors, max	Maximum number of open file descriptors.	Dependent item	kube.kubelet.processmaxfds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
File descriptors, open	Number of open file descriptors.	Dependent item	kube.kubelet.processopenfds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`

LLD rule Runtime operations discovery

Name Description Type Key and additional info

Runtime operations discovery

Dependent item

kube.kubelet.runtimeoperationsbucket.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Runtime operations discovery

Name	Description	Type	Key and additional info
[{#OP_TYPE}] Runtime operations bucket: {#LE}	Duration in seconds of runtime operations. Broken down by operation type.	Dependent item	kube.kublet.runtimeopsdurationsecondsbucket[{#LE},"{#OP_TYPE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
[{#OP_TYPE}] Runtime operations total, rate	Cumulative number of runtime operations by operation type.	Dependent item	kube.kublet.runtimeopstotal.rate["{#OP_TYPE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
[{#OP_TYPE}] Operations, p90	90 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp90["{#OP_TYPE}"]
[{#OP_TYPE}] Operations, p95	95 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp95["{#OP_TYPE}"]
[{#OP_TYPE}] Operations, p99	99 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp99["{#OP_TYPE}"]
[{#OP_TYPE}] Operations, p50	50 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp50["{#OP_TYPE}"]

LLD rule Pods discovery

Name Description Type Key and additional info

Pods discovery

Dependent item

kube.kubelet.pods.discovery

Preprocessing

JSON Path: $.items
⛔️Custom on fail: Discard value

Item prototypes for Pods discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Load average, 10s	Pods cpu load average over the last 10 seconds.	Dependent item	kube.pod.containercpuloadaverage10s[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: System seconds, total	System cpu time consumed. It is calculated from the cumulative value using the `Change per second` preprocessing step.	Dependent item	kube.pod.containercpusystemsecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Usage seconds, total	Consumed cpu time. It is calculated from the cumulative value using the `Change per second` preprocessing step.	Dependent item	kube.pod.containercpuusagesecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: User seconds, total	User cpu time consumed. It is calculated from the cumulative value using the `Change per second` preprocessing step.	Dependent item	kube.pod.containercpuusersecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second

LLD rule REST client requests discovery

Name Description Type Key and additional info

REST client requests discovery

Dependent item

kube.kubelet.rest.requests.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for REST client requests discovery

Name Description Type Key and additional info

Host [{#HOST}] Request method [{#METHOD}] Code:[{#CODE}]

Number of HTTP requests, partitioned by status code, method, and host.

Dependent item

kube.kubelet.rest.requests["{#CODE}", "{#HOST}", "{#METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Container memory discovery

Name Description Type Key and additional info

Container memory discovery

Dependent item

kube.kubelet.container.memory.cache.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Container memory discovery

Name	Description	Type	Key and additional info
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory page cache	Number of bytes of page cache memory.	Dependent item	kube.kubelet.container.memory.cache["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory max usage	Maximum memory usage recorded in bytes.	Dependent item	kube.kubelet.container.memory.max_usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: RSS	Size of RSS in bytes.	Dependent item	kube.kubelet.container.memory.rss["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Swap	Container swap usage in bytes.	Dependent item	kube.kubelet.container.memory.swap["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Usage	Current memory usage in bytes, including all memory regardless of when it was accessed.	Dependent item	kube.kubelet.container.memory.usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Working set	Current working set in bytes.	Dependent item	kube.kubelet.container.memory.working_set["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_controller_manager

View README Download JSON

Kubernetes Controller manager by HTTP

Overview

The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes Controller manager by HTTP - collects metrics by HTTP agent from Controller manager /metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes Controller manager 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.CONTROLLER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. You might need to set the --binding-address option for Controller Manager to the address where Zabbix proxy can reach it. For example, for clusters created with kubeadm it can be set in the following manifest file (changes will be applied immediately):

/etc/kubernetes/manifests/kube-controller-manager.yaml

NOTE. Some metrics may not be collected depending on your Kubernetes Controller manager instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.CONTROLLER.SERVER.URL}	Kubernetes Controller manager metrics endpoint URL.	`https://localhost:10257/metrics`
{$KUBE.API.TOKEN}	API Authorization Token
{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR}	Maximum number of HTTP client requests failures used for trigger.	`2`

Items

Name	Description	Type	Key and additional info
Kubernetes Controller: Get Controller metrics	Get raw metrics from Controller instance /metrics endpoint.	HTTP agent	kubernetes.controller.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Leader election status	Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master.	Dependent item	kubernetes.controller.leaderelectionmaster_status Preprocessing Prometheus pattern: `VALUE(leader_election_master_status)` ⛔️Custom on fail: Discard value
Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kubernetes.controller.processvirtualmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Resident memory, bytes	Resident memory size in bytes.	Dependent item	kubernetes.controller.processresidentmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
CPU	Total user and system CPU usage ratio.	Dependent item	kubernetes.controller.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second Custom multiplier: `100`
Goroutines	Number of goroutines that currently exist.	Dependent item	kubernetes.controller.go_goroutines Preprocessing Prometheus pattern: `SUM(go_goroutines)` ⛔️Custom on fail: Discard value
Go threads	Number of OS threads created.	Dependent item	kubernetes.controller.go_threads Preprocessing Prometheus pattern: `VALUE(go_threads)` ⛔️Custom on fail: Discard value
Fds open	Number of open file descriptors.	Dependent item	kubernetes.controller.open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Fds max	Maximum allowed open file descriptors.	Dependent item	kubernetes.controller.max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
REST Client requests: 2xx, rate	Number of HTTP requests with 2xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_200.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second
REST Client requests: 3xx, rate	Number of HTTP requests with 3xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_300.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
REST Client requests: 4xx, rate	Number of HTTP requests with 4xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_400.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
REST Client requests: 5xx, rate	Number of HTTP requests with 5xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_500.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes Controller manager: Too many HTTP client errors	"Kubernetes Controller manager is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes Controller manager by HTTP/kubernetes.controller.client_http_requests_500.rate,5m)>{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR}`\|Warning

LLD rule Workqueue metrics discovery

Name Description Type Key and additional info

Workqueue metrics discovery

Dependent item

kubernetes.controller.workqueue.discovery

Preprocessing

Prometheus to JSON: {__name__=~ "workqueue_*", name =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Workqueue metrics discovery

Name	Description	Type	Key and additional info
["{#NAME}"]: Workqueue adds total, rate	Total number of adds handled by workqueue per second.	Dependent item	kubernetes.controller.workqueueaddstotal["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_adds_total{name = "{#NAME}"})` ⛔️Custom on fail: Discard value Change per second
["{#NAME}"]: Workqueue depth	Current depth of workqueue.	Dependent item	kubernetes.controller.workqueue_depth["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_depth{name = "{#NAME}"})` ⛔️Custom on fail: Discard value
["{#NAME}"]: Workqueue unfinished work, sec	How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.	Dependent item	kubernetes.controller.workqueueunfinishedwork_seconds["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_unfinished_work_seconds{name = "{#NAME}"})` ⛔️Custom on fail: Discard value
["{#NAME}"]: Workqueue retries, rate	Total number of retries handled by workqueue per second.	Dependent item	kubernetes.controller.workqueueretriestotal["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_retries_total{name = "{#NAME}"})` ⛔️Custom on fail: Discard value Change per second
["{#NAME}"]: Workqueue longest running processor, sec	How many seconds has the longest running processor for workqueue been running.	Dependent item	kubernetes.controller.workqueuelongestrunningprocessorseconds["{#NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
["{#NAME}"]: Workqueue work duration, p90	90 percentile of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp90["{#NAME}"]
["{#NAME}"]: Workqueue work duration, p95	95 percentile of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp95["{#NAME}"]
["{#NAME}"]: Workqueue work duration, p99	99 percentile of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp99["{#NAME}"]
["{#NAME}"]: Workqueue work duration, 50p	50 percentiles of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp50["{#NAME}"]
["{#NAME}"]: Workqueue queue duration, p90	90 percentile of how long in seconds an item stays in workqueue before being requested, by queue.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp90["{#NAME}"]
["{#NAME}"]: Workqueue queue duration, p95	95 percentile of how long in seconds an item stays in workqueue before being requested, by queue.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp95["{#NAME}"]
["{#NAME}"]: Workqueue queue duration, p99	99 percentile of how long in seconds an item stays in workqueue before being requested, by queue.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp99["{#NAME}"]
["{#NAME}"]: Workqueue queue duration, 50p	50 percentile of how long in seconds an item stays in workqueue before being requested. If there are no requests for 5 minute, item value will be discarded.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp50["{#NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
["{#NAME}"]: Workqueue duration seconds bucket, {#LE}	How long in seconds processing an item from workqueue takes.	Dependent item	kubernetes.controller.durationsecondsbucket[{#LE},"{#NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
["{#NAME}"]: Queue duration seconds bucket, {#LE}	How long in seconds an item stays in workqueue before being requested.	Dependent item	kubernetes.controller.queuedurationseconds_bucket[{#LE},"{#NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_api_servers

View README Download JSON

Kubernetes API server by HTTP

Overview

The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes API server by HTTP - collects metrics by HTTP agent from API server /metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes API server 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Kubernetes API server instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.API.SERVER.URL}	Kubernetes API server metrics endpoint URL.	`https://localhost:6443/metrics`
{$KUBE.API.TOKEN}	API Authorization Token.
{$KUBE.API.CERT.EXPIRATION}	Number of days for alert of client certificate used for trigger.	`7`
{$KUBE.API.HTTP.CLIENT.ERROR}	Maximum number of HTTP client requests failures used for trigger.	`2`
{$KUBE.API.HTTP.SERVER.ERROR}	Maximum number of HTTP server requests failures used for trigger.	`2`

Items

Name	Description	Type	Key and additional info
Get API instance metrics	Get raw metrics from API instance /metrics endpoint.	HTTP agent	kubernetes.api.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Audit events, total	Accumulated number audit events generated and sent to the audit backend.	Dependent item	kubernetes.api.auditeventtotal Preprocessing Prometheus pattern: `SUM(apiserver_audit_event_total)` ⛔️Custom on fail: Discard value
Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kubernetes.api.processvirtualmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Resident memory, bytes	Resident memory size in bytes.	Dependent item	kubernetes.api.processresidentmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
CPU	Total user and system CPU usage ratio.	Dependent item	kubernetes.api.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second Custom multiplier: `100`
Goroutines	Number of goroutines that currently exist.	Dependent item	kubernetes.api.go_goroutines Preprocessing Prometheus pattern: `SUM(go_goroutines)` ⛔️Custom on fail: Discard value
Go threads	Number of OS threads created.	Dependent item	kubernetes.api.go_threads Preprocessing Prometheus pattern: `VALUE(go_threads)` ⛔️Custom on fail: Discard value
Fds open	Number of open file descriptors.	Dependent item	kubernetes.api.open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Fds max	Maximum allowed open file descriptors.	Dependent item	kubernetes.api.max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
gRPCs client started, rate	Total number of RPCs started per second.	Dependent item	kubernetes.api.grpcclientstarted.rate Preprocessing Prometheus pattern: `SUM(grpc_client_started_total)` ⛔️Custom on fail: Discard value Change per second
gRPCs messages received, rate	Total number of gRPC stream messages received per second.	Dependent item	kubernetes.api.grpcclientmsg_received.rate Preprocessing Prometheus pattern: `SUM(grpc_client_msg_received_total)` ⛔️Custom on fail: Discard value Change per second
gRPCs messages sent, rate	Total number of gRPC stream messages sent per second.	Dependent item	kubernetes.api.grpcclientmsg_sent.rate Preprocessing Prometheus pattern: `SUM(grpc_client_msg_sent_total)` ⛔️Custom on fail: Discard value Change per second
Request terminations, rate	Number of requests which apiserver terminated in self-defense per second.	Dependent item	kubernetes.api.apiserverrequestterminations Preprocessing Prometheus pattern: `SUM(apiserver_request_terminations_total)` ⛔️Custom on fail: Discard value Change per second
TLS handshake errors, rate	Number of requests dropped with 'TLS handshake error from' error per second.	Dependent item	kubernetes.api.apiservertlshandshakeerrorstotal.rate Preprocessing Prometheus pattern: `SUM(apiserver_tls_handshake_errors_total)` ⛔️Custom on fail: Discard value
API server requests: 5xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_500.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second
API server requests: 4xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_400.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
API server requests: 3xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_300.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
API server requests: 0	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_0.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code = "0"})` ⛔️Custom on fail: Discard value Change per second
API server requests: 2xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_200.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second
HTTP requests: 5xx, rate	Number of HTTP requests with 5xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal500.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second
HTTP requests: 4xx, rate	Number of HTTP requests with 4xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal400.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
HTTP requests: 3xx, rate	Number of HTTP requests with 3xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal300.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
HTTP requests: 2xx, rate	Number of HTTP requests with 2xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal200.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes API server: Too many server errors	"Kubernetes API server is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR}`\|Warning
Kubernetes API server: Too many client errors	"Kubernetes API client is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR}`\|Warning

LLD rule Long-running requests

Name Description Type Key and additional info

Long-running requests

Discovery of long-running requests by verb, resource and scope.

Dependent item

kubernetes.api.longrunning_gauge.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Long-running requests

Name Description Type Key and additional info

Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE}

Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way.

Dependent item

kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule Request duration histogram

Name Description Type Key and additional info

Request duration histogram

Discovery raw data and percentile items of request duration.

Dependent item

kubernetes.api.requests_bucket.discovery

Preprocessing

Prometheus to JSON: {__name__=~ "apiserver_request_duration_*", verb =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Request duration histogram

Name	Description	Type	Key and additional info
["{#VERB}"] Requests bucket: {#LE}	Response latency distribution in seconds for each verb.	Dependent item	kubernetes.api.requestdurationseconds_bucket[{#LE},"{#VERB}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
["{#VERB}"] Requests, p90	90 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p90["{#VERB}"]
["{#VERB}"] Requests, p95	95 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p95["{#VERB}"]
["{#VERB}"] Requests, p99	99 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p99["{#VERB}"]
["{#VERB}"] Requests, p50	50 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p50["{#VERB}"]

LLD rule Requests inflight discovery

Name Description Type Key and additional info

Requests inflight discovery

Discovery requests inflight by kind.

Dependent item

kubernetes.api.inflight_requests.discovery

Preprocessing

Prometheus to JSON: apiserver_current_inflight_requests{request_kind =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Requests inflight discovery

Name Description Type Key and additional info

Requests current: {#KIND}

Maximal number of currently used inflight request limit of this apiserver per request kind in last second.

Dependent item

kubernetes.api.currentinflightrequests["{#KIND}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule gRPC completed requests discovery

Name Description Type Key and additional info

gRPC completed requests discovery

Discovery grpc completed requests by grpc code.

Dependent item

kubernetes.api.grpcclienthandled.discovery

Preprocessing

Prometheus to JSON: grpc_client_handled_total{grpc_code =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for gRPC completed requests discovery

Name Description Type Key and additional info

gRPCs completed: {#GRPC_CODE}, rate

Total number of RPCs completed by the client regardless of success or failure per second.

Dependent item

kubernetes.api.grpcclienthandledtotal.rate["{#GRPCCODE}"]

Preprocessing

Prometheus pattern: SUM(grpc_client_handled_total{grpc_code = "{#GRPC_CODE}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Authentication attempts discovery

Name Description Type Key and additional info

Authentication attempts discovery

Discovery authentication attempts by result.

Dependent item

kubernetes.api.authentication_attempts.discovery

Preprocessing

Prometheus to JSON: authentication_attempts{result =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Authentication attempts discovery

Name Description Type Key and additional info

Authentication attempts: {#RESULT}, rate

Authentication attempts by result per second.

Dependent item

kubernetes.api.authentication_attempts.rate["{#RESULT}"]

Preprocessing

Prometheus pattern: SUM(authentication_attempts{result = "{#RESULT}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Authentication requests discovery

Name Description Type Key and additional info

Authentication requests discovery

Discovery authentication attempts by name.

Dependent item

kubernetes.api.authenticateduserrequests.discovery

Preprocessing

Prometheus to JSON: authenticated_user_requests{username =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Authentication requests discovery

Name Description Type Key and additional info

Authenticated requests: {#NAME}, rate

Counter of authenticated requests broken out by username per second.

Dependent item

kubernetes.api.authenticateduserrequests.rate["{#NAME}"]

Preprocessing

Prometheus pattern: VALUE(authenticated_user_requests{result = "{#NAME}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Watchers metrics discovery

Name Description Type Key and additional info

Watchers metrics discovery

Discovery watchers by kind.

Dependent item

kubernetes.api.apiserverregisteredwatchers.discovery

Preprocessing

Prometheus to JSON: apiserver_registered_watchers{kind =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Watchers metrics discovery

Name Description Type Key and additional info

Watchers: {#KIND}

Number of currently registered watchers for a given resource.

Dependent item

kubernetes.api.apiserverregisteredwatchers["{#KIND}"]

Preprocessing

Prometheus pattern: VALUE(apiserver_registered_watchers{kind = "{#KIND}"})
⛔️Custom on fail: Discard value

LLD rule Etcd objects metrics discovery

Name Description Type Key and additional info

Etcd objects metrics discovery

Discovery etcd objects by resource.

Dependent item

kubernetes.api.etcdobjectcounts.discovery

Preprocessing

Prometheus to JSON: etcd_object_counts{resource =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Etcd objects metrics discovery

Name Description Type Key and additional info

etcd objects: {#RESOURCE}

Number of stored objects at the time of last check split by kind.

Dependent item

kubernetes.api.etcdobjectcounts["{#RESOURCE}"]

Preprocessing

Prometheus pattern: VALUE(etcd_object_counts{ resource = "{#RESOURCE}"})
⛔️Custom on fail: Discard value

LLD rule Workqueue metrics discovery

Name Description Type Key and additional info

Workqueue metrics discovery

Discovery workqueue metrics by name.

Dependent item

kubernetes.api.workqueue.discovery

Preprocessing

Prometheus to JSON: workqueue_adds_total{name =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Workqueue metrics discovery

Name Description Type Key and additional info

["{#NAME}"] Workqueue depth

Current depth of workqueue.

Dependent item

kubernetes.api.workqueue_depth["{#NAME}"]

Preprocessing

Prometheus pattern: VALUE(workqueue_depth{name = "{#NAME}"})
⛔️Custom on fail: Discard value

["{#NAME}"] Workqueue adds total, rate

Total number of adds handled by workqueue per second.

Dependent item

kubernetes.api.workqueueaddstotal.rate["{#NAME}"]

Preprocessing

Prometheus pattern: VALUE(workqueue_adds_total{name = "{#NAME}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Client certificate expiration histogram

Name Description Type Key and additional info

Client certificate expiration histogram

Discovery raw data of client certificate expiration

Dependent item

kubernetes.api.certificate_expiration.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Client certificate expiration histogram

Name Description Type Key and additional info

Certificate expiration seconds bucket, {#LE}

Distribution of the remaining lifetime on the certificate used to authenticate a request.

Dependent item

kubernetes.api.clientcertificateexpirationsecondsbucket[{#LE}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Client certificate expiration, p1

1 percentile of the remaining lifetime on the certificate used to authenticate a request.

Calculated

kubernetes.api.clientcertificateexpiration_p1[{#SINGLETON}]

Trigger prototypes for Client certificate expiration histogram

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes API server: Kubernetes client certificate is expiring	A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days.	`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}2460*60`\|Warning	Depends on: Kubernetes API server: Kubernetes client certificate expires soon
Kubernetes API server: Kubernetes client certificate expires soon	A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.	`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 246060`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_kafka_jmx

View README Download JSON

Apache Kafka by JMX

Overview

This template is designed for the effortless deployment of Apache Kafka monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache Kafka 2.6.0

Configuration

Setup

Metrics are collected by JMX.

Enable and configure JMX access to Apache Kafka. See documentation for instructions.
Set the user name and password in host macros {$KAFKA.USER} and {$KAFKA.PASSWORD}.

Macros used

Name	Description	Default
{$KAFKA.USER}		`zabbix`
{$KAFKA.PASSWORD}		`zabbix`
{$KAFKA.TOPIC.MATCHES}	Filter of discoverable topics	`.*`
{$KAFKA.TOPIC.NOT_MATCHES}	Filter to exclude discovered topics	`__consumer_offsets`
{$KAFKA.NETPROCAVG_IDLE.MIN.WARN}	The minimum Network processor average idle percent for trigger expression.	`30`
{$KAFKA.REQUESTHANDLERAVG_IDLE.MIN.WARN}	The minimum Request handler average idle percent for trigger expression.	`30`

Items

Name	Description	Type	Key and additional info
Leader election per second	Number of leader elections per second.	JMX agent	jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"]
Unclean leader election per second	Number of “unclean” elections per second.	JMX agent	jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"] Preprocessing Change per second
Controller state on broker	One indicates that the broker is the controller for the cluster.	JMX agent	jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"] Preprocessing Discard unchanged with heartbeat: `1h`
Ineligible pending replica deletes	The number of ineligible pending replica deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"]
Pending replica deletes	The number of pending replica deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"]
Ineligible pending topic deletes	The number of ineligible pending topic deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"]
Pending topic deletes	The number of pending topic deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"]
Offline log directory count	The number of offline log directories (for example, after a hardware failure).	JMX agent	jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]
Offline partitions count	Number of partitions that don't have an active leader.	JMX agent	jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]
Bytes out per second	The rate at which data is fetched and read from the broker by consumers.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"] Preprocessing Change per second
Bytes in per second	The rate at which data sent from producers is consumed by the broker.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"] Preprocessing Change per second
Messages in per second	The rate at which individual messages are consumed by the broker.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"] Preprocessing Change per second
Bytes rejected per second	The rate at which bytes rejected per second by the broker.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"] Preprocessing Change per second
Client fetch request failed per second	Number of client fetch request failures per second.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"] Preprocessing Change per second
Produce requests failed per second	Number of failed produce requests per second.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"] Preprocessing Change per second
Request handler average idle percent	Indicates the percentage of time that the request handler (IO) threads are not in use.	JMX agent	jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"] Preprocessing Custom multiplier: `100`
Fetch-Consumer response send time, mean	Average time taken, in milliseconds, to send the response.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"]
Fetch-Consumer response send time, p95	The time taken, in milliseconds, to send the response for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"]
Fetch-Consumer response send time, p99	The time taken, in milliseconds, to send the response for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"]
Fetch-Follower response send time, mean	Average time taken, in milliseconds, to send the response.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"]
Fetch-Follower response send time, p95	The time taken, in milliseconds, to send the response for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"]
Fetch-Follower response send time, p99	The time taken, in milliseconds, to send the response for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"]
Produce response send time, mean	Average time taken, in milliseconds, to send the response.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"]
Produce response send time, p95	The time taken, in milliseconds, to send the response for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"]
Produce response send time, p99	The time taken, in milliseconds, to send the response for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"]
Fetch-Consumer request total time, mean	Average time in ms to serve the Fetch-Consumer request.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"]
Fetch-Consumer request total time, p95	Time in ms to serve the Fetch-Consumer request for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"]
Fetch-Consumer request total time, p99	Time in ms to serve the specified Fetch-Consumer for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"]
Fetch-Follower request total time, mean	Average time in ms to serve the Fetch-Follower request.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"]
Fetch-Follower request total time, p95	Time in ms to serve the Fetch-Follower request for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"]
Fetch-Follower request total time, p99	Time in ms to serve the Fetch-Follower request for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"]
Produce request total time, mean	Average time in ms to serve the Produce request.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"]
Produce request total time, p95	Time in ms to serve the Produce requests for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"]
Produce request total time, p99	Time in ms to serve the Produce requests for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"]
Fetch-Consumer request total time, mean	Average time for a request to update metadata.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"]
UpdateMetadata request total time, p95	Time for update metadata requests for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"]
UpdateMetadata request total time, p99	Time for update metadata requests for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"]
Temporary memory size in bytes (Fetch), max	The maximum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"]
Temporary memory size in bytes (Fetch), min	The minimum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"]
Temporary memory size in bytes (Produce), max	The maximum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"]
Temporary memory size in bytes (Produce), avg	The amount of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"]
Temporary memory size in bytes (Produce), min	The minimum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"]
Network processor average idle percent	The average percentage of time that the network processors are idle.	JMX agent	jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"] Preprocessing Custom multiplier: `100`
Requests in producer purgatory	Number of requests waiting in producer purgatory.	JMX agent	jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"]
Requests in fetch purgatory	Number of requests waiting in fetch purgatory.	JMX agent	jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"]
Replication maximum lag	The maximum lag between the time that messages are received by the leader replica and by the follower replicas.	JMX agent	jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"]
Under minimum ISR partition count	The number of partitions under the minimum In-Sync Replica (ISR) count.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"]
Under replicated partitions	The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0).	JMX agent	jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"]
ISR expands per second	The rate at which the number of ISRs in the broker increases.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"] Preprocessing Change per second
ISR shrink per second	Rate of replicas leaving the ISR pool.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"] Preprocessing Change per second
Leader count	The number of replicas for which this broker is the leader.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"]
Partition count	The number of partitions in the broker.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"]
Number of reassigning partitions	The number of reassigning leader partitions on a broker.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"]
Request queue size	The size of the delay queue.	JMX agent	jmx["kafka.server:type=Request","queue-size"]
Version	Current version of broker.	JMX agent	jmx["kafka.server:type=app-info","version"] Preprocessing Discard unchanged with heartbeat: `1h`
Uptime	The service uptime expressed in seconds.	JMX agent	jmx["kafka.server:type=app-info","start-time-ms"] Preprocessing JavaScript: `The text is too long. Please see the template.`
ZooKeeper client request latency	Latency in milliseconds for ZooKeeper requests from broker.	JMX agent	jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"]
ZooKeeper connection status	Connection status of broker's ZooKeeper session.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"] Preprocessing Discard unchanged with heartbeat: `1h`
ZooKeeper disconnect rate	ZooKeeper client disconnect per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"] Preprocessing Change per second
ZooKeeper session expiration rate	ZooKeeper client session expiration per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"] Preprocessing Change per second
ZooKeeper readonly rate	ZooKeeper client readonly per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"] Preprocessing Change per second
ZooKeeper sync rate	ZooKeeper client sync per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"] Preprocessing Change per second

Triggers

Name	Description	Expression	Severity
Apache Kafka: Unclean leader election detected	Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability.	`last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0`\|Average
Apache Kafka: There are offline log directories	The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore.	`last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0`\|Warning
Apache Kafka: One or more partitions have no leader	Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available.	`last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0`\|Warning
Apache Kafka: Request handler average idle percent is too low	The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is.	`max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN}`\|Average
Apache Kafka: Network processor average idle percent is too low	The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is.	`max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN}`\|Average
Apache Kafka: Failed to fetch info data	Zabbix has not received data for items for the last 15 minutes	`nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1`\|Warning
Apache Kafka: There are partitions under the min ISR	The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0`\|Average
Apache Kafka: There are under replicated partitions	The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0`\|Average
Apache Kafka: Version has changed	The Kafka version has changed. Acknowledge to close the problem manually.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0`\|Info	Manual close: Yes
Apache Kafka: Kafka service has been restarted	Uptime is less than 10 minutes.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m`\|Info	Manual close: Yes
Apache Kafka: Broker is not connected to ZooKeeper		`find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0`\|Average

LLD rule Topic Metrics (write)

Name	Description	Type	Key and additional info
Topic Metrics (write)		JMX agent	jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"]

Item prototypes for Topic Metrics (write)

Name

Description

Type

Key and additional info

Kafka {#JMXTOPIC}: Messages in per second

The rate at which individual messages are consumed by topic.

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

Kafka {#JMXTOPIC}: Bytes in per second

The rate at which data sent from producers is consumed by topic.

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

LLD rule Topic Metrics (read)

Name	Description	Type	Key and additional info
Topic Metrics (read)		JMX agent	jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"]

Item prototypes for Topic Metrics (read)

Name

Description

Type

Key and additional info

Kafka {#JMXTOPIC}: Bytes out per second

The rate at which data is fetched and read from the broker by consumers (by topic).

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

LLD rule Topic Metrics (errors)

Name	Description	Type	Key and additional info
Topic Metrics (errors)		JMX agent	jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"]

Item prototypes for Topic Metrics (errors)

Name

Description

Type

Key and additional info

Kafka {#JMXTOPIC}: Bytes rejected per second

Rejected bytes rate by topic.

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_jira_datacenter_jmx

View README Download JSON

Jira Data Center by JMX

Overview

This template is used for monitoring Jira Data Center health. It is designed for standalone operation for on-premises Jira installations.

This template uses a single data source, JMX, which requires JMX RMI setup of your Jira application and Java Gateway setup on the Zabbix side. If you need "Garbage collector" and "Web server" monitoring, add "Generic Java JMX" and "Apache Tomcat by JMX" templates on the same host.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Jira Data Center 9.14.1
Jira Data Center 9.12.4

Configuration

Setup

Metrics are collected by JMX.

Deploy the Zabbix Java Gateway component (instructions).
Enable and configure JMX access to Jira Data Center. See documentation for instructions.
Assign the "Jira Data Center by JMX" template to the host with a JMX interface.
If your Jira installation requires authentication for JMX, set the values in the host macros {$JMX.USERNAME} and {$JMX.PASSWORD}.
(Optional) Set custom macro values and add macros with context for specific metrics following the macro description.
(Optional) Assign the "Generic Java JMX" template for garbage collector monitoring.
(Optional) Assign the "Apache Tomcat by JMX" template for web server monitoring.

Macros used

Name	Description	Default
{$JMX.USER}	User for JMX.
{$JMX.PASSWORD}	Password for JMX.
{$JIRA_DC.LICENSE.USER.CAPACITY.WARN}	User capacity warning threshold (%).	`80`
{$JIRA_DC.DB.CONNECTION.USAGE.WARN}	Warning threshold for database connections usage (%).	`80`
{$JIRA_DC.ISSUE.LATENCY.WARN}	Warning threshold for issue operation latency (in seconds).	`5`
{$JIRA_DC.STORAGE.LATENCY.WARN}	Warning threshold for storage write operation latency (in seconds).	`5`
{$JIRA_DC.INDEXING.LATENCY.WARN}	Warning threshold for indexing operation latency (in seconds).	`5`
{$JIRA_DC.LLD.FILTER.MATCHES.HOMEFOLDERS}	Used for storage metric discovery.	`local\|share`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.HOMEFOLDERS}	Used for storage metric discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.INDEXING}	Used for indexing metric discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.INDEXING}	Used for indexing metric discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.ISSUE}	Used for issue discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.ISSUE}	Used for issue discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.MAIL}	Used for mail server connection metric discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.MAIL}	Used for mail server connection metric discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.LICENSE}	Used for license discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.LICENSE}	Used for license discovery.	`NO MATCH`

Items

Name	Description	Type	Key and additional info
DB: Connections: State	The state of the database connection.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value]
DB: Connections: Failed per minute	The count of database connection failures registered in one minute. Units: fpm - fails per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=failures,name=counter",Count] Preprocessing Simple change:
DB: Pool: Connections: Idle	Idle connection count of the database pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value]
DB: Pool: Connections: Active	Active connection count of the database pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numActive,name=value",Value]
DB: Reads	Database read operations from Jira per second. Units: rps - read operations per second.	JMX agent	jmx["com.atlassian.jira:type=db.reads",invocation.count] Preprocessing Change per second:
DB: Writes	Database write operations from Jira per second. Units: wps - write operations per second.	JMX agent	jmx["com.atlassian.jira:type=db.writes",invocation.count] Preprocessing Change per second:
DB: Connections: Limit	Total allowed database connection count.	JMX agent	jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal]
DB: Connections: Active	Active database connection count.	JMX agent	jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive]
DB: Connections: Latency	The latest measure of latency when querying the database.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=latency,name=value",Value]
License: Users: Get	License data for the discovery rule.	JMX agent	jmx.discovery[attributes,"com.atlassian.jira:type=jira.license"] Preprocessing JavaScript: `The text is too long. Please see the template.`
HTTP: Pool: Connections: Active	The latest measure of the number of active connections in the HTTP connection pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numActive,name=value",Value]
HTTP: Pool: Connections: Idle	The latest measure of the number of idle connections in the HTTP connection pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value]
HTTP: Sessions: Active	The latest measure of the number of active user sessions.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=sessions,category03=active,name=value",Value]
HTTP: Requests per minute	The latest measure of the total number of HTTP requests per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=requests,name=value",Value]
Mail: Queue	The latest measure of the number of items in a mail queue.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value]
Mail: Queue: Error	The latest measure of the number of items in an error mail queue.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value]
Mail: Sent per minute	The latest measure of the number of emails sent by the SMTP server per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numEmailsSentPerMin,name=value",Value]
Mail: Processed per minute	The latest measure of the number of items processed by a mail queue per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItemsProcessedPerMin,name=value",Value]
Mail: Queue: Processing state	The latest indicator of the state of a mail queue job.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value]
Entity: Issues	The number of issues.	JMX agent	jmx["com.atlassian.jira:type=entity.issues.total",Value]
Entity: Attachments	The number of attachments.	JMX agent	jmx["com.atlassian.jira:type=entity.attachments.total",Value]
Entity: Components	The number of components.	JMX agent	jmx["com.atlassian.jira:type=entity.components.total",Value]
Entity: Custom fields	The number of custom fields.	JMX agent	jmx["com.atlassian.jira:type=entity.customfields.total",Value]
Entity: Filters	The number of filters.	JMX agent	jmx["com.atlassian.jira:type=entity.filters.total",Value]
Entity: Versions created	The number of versions created.	JMX agent	jmx["com.atlassian.jira:type=entity.versions.total",Value]
Issue: Search per minute	Issue searches performed per minute.	JMX agent	jmx["com.atlassian.jira:type=issue.search.count",Value] Preprocessing Simple change:
Issue: Created per minute	Issues created per minute.	JMX agent	jmx["com.atlassian.jira:type=issue.created.count",Value] Preprocessing Simple change:
Issue: Updates per minute	Issue updates performed per minute.	JMX agent	jmx["com.atlassian.jira:type=issue.updated.count",Value] Preprocessing Simple change:
Quicksearch: Concurrent searches	The number of concurrent searches that are being performed in real-time by using the quick search.	JMX agent	jmx["com.atlassian.jira:type=quicksearch.concurrent.search",Value] Preprocessing Simple change:

Triggers

Name	Description	Expression	Severity
Jira Data Center: DB: Connection lost	Database connection lost	`max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value],3m)=0`\|Average	Manual close: Yes
Jira Data Center: DB: Pool: Out of idle connections	Fires when out of idle connections in database pool for 5 minutes.	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0`\|Warning	Manual close: Yes
Jira Data Center: DB: Connection usage is near the limit		`100*min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)/last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal])>{$JIRA_DC.DB.CONNECTION.USAGE.WARN}`\|Warning	Manual close: Yes
Jira Data Center: DB: Connection limit reached		`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)=last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal])`\|Warning	Manual close: Yes
Jira Data Center: HTTP: Pool: Out of idle connections	All available connections are utilized. It can cause outages for users as the system is unable to serve their requests.	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0`\|Warning	Manual close: Yes
Jira Data Center: Mail: Queue: Doesn’t empty over an extended period	Might indicate SMTP performance or connection problems.	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],30m)>0`\|Warning	Manual close: Yes Depends on: Jira Data Center: Mail: Queue job is not running
Jira Data Center: Mail: Error queue contains one or more items	A mail queue attempts to resend items up to 10 times. If the operation fails for the 11th time, the items are put into an error mail queue. You can remove items from the error mail queue in one of the following ways: - Manually clear the whole error queue. - Manually resend all items from the error queue to a mail queue. You should pay attention to the cases where an error mail queue item gets back to an error mail queue after you resend the items manually. These cases might indicate permanent performance issues.	`max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value],5m)>0`\|Warning	Manual close: Yes
Jira Data Center: Mail: Queue job is not running	It should be running when its queue is not empty. Might indicate SMTP server connection problems.	`max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value],15m)=0 and min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],15m)>0`\|Average	Manual close: Yes

LLD rule Storage discovery

Name	Description	Type	Key and additional info
Storage discovery	Discovery of the Jira storage metrics.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=home,category01=,category02=write,category03=latency,,name=value"]

Item prototypes for Storage discovery

Name Description Type Key and additional info

Storage [{#JMXCATEGORY01}]: Latency

The median latency of writing a small file (~30 bytes) to {#JMXCATEGORY01}.

JMX agent

jmx["{#JMXOBJ}",Value]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for Storage discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jira Data Center: Storage [{#JMXCATEGORY01}]: Slow performance	Fires when latency grows above the threshold: `{$JIRA_DC.STORAGE.LATENCY.WARN:"{#JMXCATEGORY01}"}`s	`min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Value],5m)>{$JIRA_DC.STORAGE.LATENCY.WARN:"{#JMXCATEGORY01}"}`\|Warning	Manual close: Yes

LLD rule Mail server discovery

Name	Description	Type	Key and additional info
Mail server discovery	Discovery of the Jira connected mail servers.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=mail,category01=,category02=connection,category03=state,name="]

Item prototypes for Mail server discovery

Name Description Type Key and additional info

Mail [{#JMXCATEGORY01},{#JMXNAME}]: Connection state

Shows connection state of Jira to discovered mail server: {#JMXCATEGORY01}-{#JMXNAME}

JMX agent

jmx["{#JMXOBJ}",Connected]

Preprocessing

Boolean to decimal:

Mail [{#JMXCATEGORY01},{#JMXNAME}]: Failures per minute

Count of failed connections to discovered mail server {#JMXCATEGORY01}-{#JMXNAME} per minute

JMX agent

jmx["{#JMXOBJ}",TotalFailures]

Preprocessing

Simple change:

Trigger prototypes for Mail server discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jira Data Center: Mail [{#JMXCATEGORY01}-{#JMXNAME}]: Server disconnected	Trigger is fired when discovered mail server `{#JMXCATEGORY01}-{#JMXNAME}` becomes unavailable	`max(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Connected],5m)=0`\|Average	Manual close: Yes

LLD rule Indexing latency discovery

Name	Description	Type	Key and additional info
Indexing latency discovery	Discovery of the Jira indexing metrics.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=indexing,name=*"]

Item prototypes for Indexing latency discovery

Name Description Type Key and additional info

Indexing [{#JMXNAME}]: Latency

Average time spent on indexing operations.

JMX agent

jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for Indexing latency discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jira Data Center: Indexing [{#JMXNAME}]: Slow performance	Fires when latency grows above the threshold: `{$JIRA_DC.INDEXING.LATENCY.WARN:"{#JMXNAME}"}`s	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean],5m)>{$JIRA_DC.INDEXING.LATENCY.WARN:"{#JMXNAME}"}`\|Warning	Manual close: Yes

LLD rule Issue latency discovery

Name	Description	Type	Key and additional info
Issue latency discovery	Discovery of the Jira issue latency metrics.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=issue,name=*"]

Item prototypes for Issue latency discovery

Name Description Type Key and additional info

Issue [{#JMXNAME}]: Latency

Average time spent on issue {#JMXNAME} operations.

JMX agent

jmx["{#JMXOBJ}",Mean]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for Issue latency discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jira Data Center: Issue [{#JMXNAME}]: Slow operations	Fires when latency grows above the threshold: `{$JIRA_DC.ISSUE.LATENCY.WARN:"{#JMXNAME}"}`s	`min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Mean],5m)>{$JIRA_DC.ISSUE.LATENCY.WARN:"{#JMXNAME}"}`\|Warning	Manual close: Yes

LLD rule License discovery

Name Description Type Key and additional info

License discovery

Discovery of the Jira licenses.

Dependent item

jmx.license.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for License discovery

Name Description Type Key and additional info

License [{#LICENSE.TYPE}]: Users: Current

Current user count for {#LICENSE.TYPE}.

Dependent item

jmx.license.get.user.current["{#LICENSE.TYPE}"]

Preprocessing

JSON Path: $.{#LICENSE.TYPE}.properties.current_user_count

License [{#LICENSE.TYPE}]: Users: Maximum

User count limit for {#LICENSE.TYPE}.

-1 = No limits for the license type.

Dependent item

jmx.license.get.user.max["{#LICENSE.TYPE}"]

Preprocessing

JSON Path: $.{#LICENSE.TYPE}.properties.max_user_count

Trigger prototypes for License discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jira Data Center: License [{#LICENSE.TYPE}]: Low user capacity	Fires when relative user quantity grows above the threshold: `{$JIRA_DC.LICENSE.USER.CAPACITY.WARN:"{#LICENSE.TYPE}"}`%	`last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * (100*last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"])/last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>{$JIRA_DC.LICENSE.USER.CAPACITY.WARN:"{#LICENSE.TYPE}"})`\|Warning	Manual close: Yes Depends on: Jira Data Center: License [{#LICENSE.TYPE}]: User count reached the limit
Jira Data Center: License [{#LICENSE.TYPE}]: User count reached the limit	Fires when user quantity reaches the limit. It won't fire if the limit is disabled (set to `-1`).	`last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * ((last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])-last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"]))<=0)`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_jenkins

View README Download JSON

Jenkins by HTTP

Overview

The template to monitor Apache Jenkins by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Jenkins 2.263.1

Configuration

Setup

Metrics are collected by requests to Metrics API. For common metrics: Install and configure Metrics plugin parameters according official documentations. Do not forget to configure access to the Metrics Servlet by issuing API key and change macro {$JENKINS.API.KEY}.

For monitoring computers and builds: Create API token for monitoring user according official documentations and change macro {$JENKINS.USER}, {$JENKINS.API.TOKEN}. Don't forget to change macros {$JENKINS.URL}.

Macros used

Name	Description	Default
{$JENKINS.URL}	Jenkins URL in the format `<scheme>://<host>:<port>`
{$JENKINS.API.KEY}	API key to access Metrics Servlet
{$JENKINS.USER}	Username for HTTP BASIC authentication	`zabbix`
{$JENKINS.API.TOKEN}	API token for HTTP BASIC authentication.
{$JENKINS.PING.REPLY}	Expected reply to the ping.	`pong`
{$JENKINS.FILE_DESCRIPTORS.MAX.WARN}	Maximum percentage of file descriptors usage alert threshold (for trigger expression).	`85`
{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN}	Minimum job's health score (for trigger expression).	`50`

Items

Name	Description	Type	Key and additional info
Get service metrics		HTTP agent	jenkins.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get healthcheck		HTTP agent	jenkins.healthcheck Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get jobs info		HTTP agent	jenkins.job_info Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get computer info		HTTP agent	jenkins.computer_info Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Disk space check message	The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast.	Dependent item	jenkins.disk_space.message Preprocessing JSON Path: `$['disk-space'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Temporary space check message	The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast.	Dependent item	jenkins.temporary_space.message Preprocessing JSON Path: `$['temporary-space'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Plugins check message	The message of plugins health check.	Dependent item	jenkins.plugins.message Preprocessing JSON Path: `$['plugins'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Thread deadlock check message	The message of thread deadlock health check.	Dependent item	jenkins.thread_deadlock.message Preprocessing JSON Path: `$['thread-deadlock'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Disk space check	Returns FAIL if any of the Jenkins disk space monitors are reporting the disk space as less than the configured threshold.	Dependent item	jenkins.disk_space Preprocessing JSON Path: `$['disk-space'].healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Plugins check	Returns FAIL if any of the Jenkins plugins failed to start.	Dependent item	jenkins.plugins Preprocessing JSON Path: `$.plugins.healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Temporary space check	Returns FAIL if any of the Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold.	Dependent item	jenkins.temporary_space Preprocessing JSON Path: `$['temporary-space'].healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Thread deadlock check	Returns FAIL if there are any deadlocked threads in the Jenkins master JVM.	Dependent item	jenkins.thread_deadlock Preprocessing JSON Path: `$['thread-deadlock'].healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Get gauges	Raw items for gauges metrics.	Dependent item	jenkins.gauges.raw Preprocessing JSON Path: `$.gauges`
Executors count	The number of executors available to Jenkins. This is corresponds to the sum of all the executors of all the online nodes.	Dependent item	jenkins.executor.count Preprocessing JSON Path: `$.['jenkins.executor.count.value'].value` Discard unchanged with heartbeat: `1h`
Executors free	The number of executors available to Jenkins that are not currently in use.	Dependent item	jenkins.executor.free Preprocessing JSON Path: `$.['jenkins.executor.free.value'].value`
Executors in use	The number of executors available to Jenkins that are currently in use.	Dependent item	jenkins.executor.in_use Preprocessing JSON Path: `$.['jenkins.executor.in-use.value'].value`
Nodes count	The number of build nodes available to Jenkins, both online and offline.	Dependent item	jenkins.node.count Preprocessing JSON Path: `$.['jenkins.node.count.value'].value` Discard unchanged with heartbeat: `1h`
Nodes offline	The number of build nodes available to Jenkins but currently offline.	Dependent item	jenkins.node.offline Preprocessing JSON Path: `$.['jenkins.node.offline.value'].value` Discard unchanged with heartbeat: `1h`
Nodes online	The number of build nodes available to Jenkins and currently online.	Dependent item	jenkins.node.online Preprocessing JSON Path: `$.['jenkins.node.online.value'].value` Discard unchanged with heartbeat: `1h`
Plugins active	The number of plugins in the Jenkins instance that started successfully.	Dependent item	jenkins.plugins.active Preprocessing JSON Path: `$.['jenkins.plugins.active'].value` Discard unchanged with heartbeat: `1h`
Plugins failed	The number of plugins in the Jenkins instance that failed to start. A value other than 0 is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the plugin(s) or by resolving the plugin dependency issues.	Dependent item	jenkins.plugins.failed Preprocessing JSON Path: `$.['jenkins.plugins.failed'].value` Discard unchanged with heartbeat: `1h`
Plugins inactive	The number of plugins in the Jenkins instance that are not currently enabled.	Dependent item	jenkins.plugins.inactive Preprocessing JSON Path: `$.['jenkins.plugins.inactive'].value` Discard unchanged with heartbeat: `1h`
Plugins with update	The number of plugins in the Jenkins instance that have a newer version reported as available in the current Jenkins update center metadata held by Jenkins. This value is not indicative of an issue with Jenkins but high values can be used as a trigger to review the plugins with updates with a view to seeing whether those updates potentially contain fixes for issues that could be affecting your Jenkins instance.	Dependent item	jenkins.plugins.with_update Preprocessing JSON Path: `$.['jenkins.plugins.withUpdate'].value` Discard unchanged with heartbeat: `1h`
Projects count	The number of projects.	Dependent item	jenkins.project.count Preprocessing JSON Path: `$.['jenkins.project.count.value'].value` Discard unchanged with heartbeat: `1h`
Jobs count	The number of jobs in Jenkins.	Dependent item	jenkins.job.count.value Preprocessing JSON Path: `$.['jenkins.job.count.value'].value` Discard unchanged with heartbeat: `3h`
Get meters	Raw items for meters metrics.	Dependent item	jenkins.meters.raw Preprocessing JSON Path: `$.meters`
Job scheduled, m1 rate	The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system.	Dependent item	jenkins.job.scheduled.m1.rate Preprocessing JSON Path: `$.['jenkins.job.scheduled'].m1_rate`
Jobs scheduled, m5 rate	The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system.	Dependent item	jenkins.job.scheduled.m5.rate Preprocessing JSON Path: `$.['jenkins.job.scheduled'].m5_rate`
Get timers	Raw items for timers metrics.	Dependent item	jenkins.timers.raw Preprocessing JSON Path: `$.timers`
Job blocked, m1 rate	The rate at which jobs in the build queue enter the blocked state.	Dependent item	jenkins.job.blocked.m1.rate Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].m1_rate`
Job blocked, m5 rate	The rate at which jobs in the build queue enter the blocked state.	Dependent item	jenkins.job.blocked.m5.rate Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].m5_rate`
Job blocked duration, p95	The amount of time which jobs spend in the blocked state.	Dependent item	jenkins.job.blocked.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].p95`
Job blocked duration, median	The amount of time which jobs spend in the blocked state.	Dependent item	jenkins.job.blocked.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].p50`
Job building, m1 rate	The rate at which jobs are built.	Dependent item	jenkins.job.building.m1.rate Preprocessing JSON Path: `$.['jenkins.job.building.duration'].m1_rate`
Job building, m5 rate	The rate at which jobs are built.	Dependent item	jenkins.job.building.m5.rate Preprocessing JSON Path: `$.['jenkins.job.building.duration'].m5_rate`
Job building duration, p95	The amount of time which jobs spend building.	Dependent item	jenkins.job.building.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.building.duration'].p95`
Job building duration, median	The amount of time which jobs spend building.	Dependent item	jenkins.job.building.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.building.duration'].p50`
Job buildable, m1 rate	The rate at which jobs in the build queue enter the buildable state.	Dependent item	jenkins.job.buildable.m1.rate Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].m1_rate`
Job buildable, m5 rate	The rate at which jobs in the build queue enter the buildable state.	Dependent item	jenkins.job.buildable.m5.rate Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].m5_rate`
Job buildable duration, p95	The amount of time which jobs spend in the buildable state.	Dependent item	jenkins.job.buildable.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].p95`
Job buildable duration, median	The amount of time which jobs spend in the buildable state.	Dependent item	jenkins.job.buildable.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].p50`
Job queuing, m1 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.queuing.m1.rate Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].m1_rate`
Job queuing, m5 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.queuing.m5.rate Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].m5_rate`
Job queuing duration, p95	The total time which jobs spend in the build queue.	Dependent item	jenkins.job.queuing.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].p95`
Job queuing duration, median	The total time which jobs spend in the build queue.	Dependent item	jenkins.job.queuing.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].p50`
Job total, m1 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.total.m1.rate Preprocessing JSON Path: `$.['jenkins.job.total.duration'].m1_rate`
Job total, m5 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.total.m5.rate Preprocessing JSON Path: `$.['jenkins.job.total.duration'].m5_rate`
Job total duration, p95	The total time which jobs spend from entering the build queue to completing building.	Dependent item	jenkins.job.total.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.total.duration'].p95`
Job total duration, median	The total time which jobs spend from entering the build queue to completing building.	Dependent item	jenkins.job.total.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.total.duration'].p50`
Job waiting, m1 rate	The rate at which jobs enter the quiet period.	Dependent item	jenkins.job.waiting.m1.rate Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].m1_rate`
Job waiting, m5 rate	The rate at which jobs enter the quiet period.	Dependent item	jenkins.job.waiting.m5.rate Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].m5_rate`
Job waiting duration, p95	The total amount of time that jobs spend in their quiet period.	Dependent item	jenkins.job.waiting.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].p95`
Job waiting duration, median	The total amount of time that jobs spend in their quiet period.	Dependent item	jenkins.job.waiting.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].p50`
Build queue, blocked	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.blocked Preprocessing JSON Path: `$.['jenkins.queue.blocked.value'].value`
Build queue, size	The number of jobs that are in the Jenkins build queue.	Dependent item	jenkins.queue.size Preprocessing JSON Path: `$.['jenkins.queue.size.value'].value`
Build queue, buildable	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.buildable Preprocessing JSON Path: `$.['jenkins.queue.buildable.value'].value`
Build queue, pending	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.pending Preprocessing JSON Path: `$.['jenkins.queue.pending.value'].value`
Build queue, stuck	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.stuck Preprocessing JSON Path: `$.['jenkins.queue.stuck.value'].value`
HTTP active requests, rate	The number of currently active requests against the Jenkins master Web UI.	Dependent item	jenkins.http.active_requests.rate Preprocessing JSON Path: `$.counters.['http.activeRequests'].count` Change per second
HTTP response 400, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/400 status code.	Dependent item	jenkins.http.bad_request.rate Preprocessing JSON Path: `$.['http.responseCodes.badRequest'].count` Change per second
HTTP response 500, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/500 status code.	Dependent item	jenkins.http.server_error.rate Preprocessing JSON Path: `$.['http.responseCodes.serverError'].count` Change per second
HTTP response 503, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/503 status code.	Dependent item	jenkins.http.service_unavailable.rate Preprocessing JSON Path: `$.['http.responseCodes.serviceUnavailable'].count` Change per second
HTTP response 200, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/200 status code.	Dependent item	jenkins.http.ok.rate Preprocessing JSON Path: `$.['http.responseCodes.ok'].count` Change per second
HTTP response other, rate	The rate at which the Jenkins master Web UI is responding to requests with a non-informational status code that is not in the list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, or HTTP/503.	Dependent item	jenkins.http.other.rate Preprocessing JSON Path: `$.['http.responseCodes.other'].count` Change per second
HTTP response 201, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/201 status code.	Dependent item	jenkins.http.created.rate Preprocessing JSON Path: `$.['http.responseCodes.created'].count` Change per second
HTTP response 204, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/204 status code.	Dependent item	jenkins.http.no_content.rate Preprocessing JSON Path: `$.['http.responseCodes.noContent'].count` Change per second
HTTP response 404, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/404 status code.	Dependent item	jenkins.http.not_found.rate Preprocessing JSON Path: `$.['http.responseCodes.notFound'].count` Change per second
HTTP response 304, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/304 status code.	Dependent item	jenkins.http.not_modified.rate Preprocessing JSON Path: `$.['http.responseCodes.notModified'].count` Change per second
HTTP response 403, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/403 status code.	Dependent item	jenkins.http.forbidden.rate Preprocessing JSON Path: `$.['http.responseCodes.forbidden'].count` Change per second
HTTP requests, rate	The rate at which the Jenkins master Web UI is receiving requests.	Dependent item	jenkins.http.requests.rate Preprocessing JSON Path: `$.['http.requests'].count` Change per second
HTTP requests, p95	The time spent generating the corresponding responses.	Dependent item	jenkins.http.requests_p95.rate Preprocessing JSON Path: `$.['http.requests'].p95`
HTTP requests, median	The time spent generating the corresponding responses.	Dependent item	jenkins.http.requests_p50.rate Preprocessing JSON Path: `$.['http.requests'].p50`
Version	Version of Jenkins server.	Dependent item	jenkins.version Preprocessing JSON Path: `$.['jenkins.versions.core'].value` Discard unchanged with heartbeat: `3h`
CPU Load	The system load on the Jenkins master as reported by the JVM's Operating System JMX bean. The calculation of system load is operating system dependent. Typically this is the sum of the number of processes that are currently running plus the number that are waiting to run. This is typically comparable against the number of CPU cores.	Dependent item	jenkins.system.cpu.load Preprocessing JSON Path: `$.['system.cpu.load'].value`
Uptime	The number of seconds since the Jenkins master JVM started.	Dependent item	jenkins.system.uptime Preprocessing JSON Path: `$.['vm.uptime.milliseconds'].value` Custom multiplier: `0.001`
File descriptor ratio	The ratio of used to total file descriptors	Dependent item	jenkins.descriptor.ratio Preprocessing JSON Path: `$.['vm.file.descriptor.ratio'].value` Custom multiplier: `100`
Service ping		HTTP agent	jenkins.ping Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `0` Regular expression: `{$JENKINS.PING.REPLY} 1` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `30m`

Triggers

Name	Description	Expression	Severity
Jenkins: Disk space is too low	Jenkins disk space monitors are reporting the disk space as less than the configured threshold. The message will reference the first node which fails this check. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.disk_space)=0 and length(last(/Jenkins by HTTP/jenkins.disk_space.message))>0`\|Warning
Jenkins: One or more Jenkins plugins failed to start	A failure is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the failing plugin(s) or by resolving the corresponding plugin dependency issues. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.plugins)=0 and length(last(/Jenkins by HTTP/jenkins.plugins.message))>0`\|Info	Manual close: Yes
Jenkins: Temporary space is too low	Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. The message will reference the first node which fails this check. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.temporary_space)=0 and length(last(/Jenkins by HTTP/jenkins.temporary_space.message))>0`\|Warning
Jenkins: There are deadlocked threads in Jenkins master JVM	There are any deadlocked threads in the Jenkins master JVM. Health check message: {{ITEM.LASTVALUE2}.regsub('(.*)',\1)}	`last(/Jenkins by HTTP/jenkins.thread_deadlock)=0 and length(last(/Jenkins by HTTP/jenkins.thread_deadlock.message))>0`\|Warning
Jenkins: Service has no online nodes		`last(/Jenkins by HTTP/jenkins.node.online)=0`\|Average
Jenkins: Version has changed	The Jenkins version has changed. Acknowledge to close the problem manually.	`last(/Jenkins by HTTP/jenkins.version,#1)<>last(/Jenkins by HTTP/jenkins.version,#2) and length(last(/Jenkins by HTTP/jenkins.version))>0`\|Info	Manual close: Yes
Jenkins: Host has been restarted	Uptime is less than 10 minutes.	`last(/Jenkins by HTTP/jenkins.system.uptime)<10m`\|Info	Manual close: Yes
Jenkins: Current number of used files is too high		`min(/Jenkins by HTTP/jenkins.descriptor.ratio,5m)>{$JENKINS.FILE_DESCRIPTORS.MAX.WARN}`\|Warning
Jenkins: Service is down		`last(/Jenkins by HTTP/jenkins.ping)=0`\|Average	Manual close: Yes

LLD rule Jobs discovery

Name Description Type Key and additional info

Jobs discovery

HTTP agent

jenkins.jobs

Preprocessing

JSON Path: $.jobs.[*]

Item prototypes for Jobs discovery

Name	Description	Type	Key and additional info
Job [{#NAME}]: Get job	Raw data for a job.	Dependent item	jenkins.job.get[{#NAME}] Preprocessing JSON Path: `$.jobs.[?(@.name == "{#NAME}")].first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Health score	Represents health of project. A number between 0-100. Job Description: {#DESCRIPTION} Job Url: {#URL}	Dependent item	jenkins.build.health[{#NAME}] Preprocessing JSON Path: `$.healthReport..score.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Build number	Details: {#URL}/lastBuild/	Dependent item	jenkins.last_build.number[{#NAME}] Preprocessing JSON Path: `$.lastBuild.number` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Build duration	Build duration (in seconds).	Dependent item	jenkins.last_build.duration[{#NAME}] Preprocessing JSON Path: `$.lastBuild.duration` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Build timestamp		Dependent item	jenkins.last_build.timestamp[{#NAME}] Preprocessing JSON Path: `$.lastBuild.timestamp` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Build result		Dependent item	jenkins.last_build.result[{#NAME}] Preprocessing JSON Path: `$.lastBuild.result` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Failed Build number	Details: {#URL}/lastFailedBuild/	Dependent item	jenkins.lastfailedbuild.number[{#NAME}] Preprocessing JSON Path: `$.lastFailedBuild.number` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Failed Build duration	Build duration (in seconds).	Dependent item	jenkins.lastfailedbuild.duration[{#NAME}] Preprocessing JSON Path: `$.lastFailedBuild.duration` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Failed Build timestamp		Dependent item	jenkins.lastfailedbuild.timestamp[{#NAME}] Preprocessing JSON Path: `$.lastFailedBuild.timestamp` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Successful Build number	Details: {#URL}/lastSuccessfulBuild/	Dependent item	jenkins.lastsuccessfulbuild.number[{#NAME}] Preprocessing JSON Path: `$.lastSuccessfulBuild.number` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Successful Build duration	Build duration (in seconds).	Dependent item	jenkins.lastsuccessfulbuild.duration[{#NAME}] Preprocessing JSON Path: `$.lastSuccessfulBuild.duration` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Job [{#NAME}]: Last Successful Build timestamp		Dependent item	jenkins.lastsuccessfulbuild.timestamp[{#NAME}] Preprocessing JSON Path: `$.lastSuccessfulBuild.timestamp` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`

Trigger prototypes for Jobs discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jenkins: Job [{#NAME}]: Job is unhealthy		`last(/Jenkins by HTTP/jenkins.build.health[{#NAME}])<{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN}`\|Warning	Manual close: Yes

LLD rule Computers discovery

Name Description Type Key and additional info

Computers discovery

HTTP agent

jenkins.computers

Preprocessing

JSON Path: $.computer.[*]

Item prototypes for Computers discovery

Name	Description	Type	Key and additional info
Computer [{#DISPLAY_NAME}]: Get computer	Raw data for a computer.	Dependent item	jenkins.computer.get[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.computer.[?(@.displayName == "{#DISPLAY_NAME}")].first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Computer [{#DISPLAY_NAME}]: Executors	The maximum number of concurrent builds that Jenkins may perform on this node.	Dependent item	jenkins.computer.numExecutors[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.numExecutors` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: State	Represents the actual online/offline state. Node description: {#DESCRIPTION}	Dependent item	jenkins.computer.state[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.offline` Boolean to decimal Discard unchanged with heartbeat: `1h`
Computer [{#DISPLAY_NAME}]: Offline cause reason	If the computer was offline (either temporarily or not), will return the cause as a string (without user info). Empty string if the system was put offline without given a cause.	Dependent item	jenkins.computer.offline.reason[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.offlineCauseReason` Discard unchanged with heartbeat: `3h`
Computer [{#DISPLAY_NAME}]: Idle	Returns true if all the executors of this computer are idle.	Dependent item	jenkins.computer.idle[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.idle` Boolean to decimal Discard unchanged with heartbeat: `1h`
Computer [{#DISPLAY_NAME}]: Temporarily offline	Returns true if this node is marked temporarily offline.	Dependent item	jenkins.computer.tempoffline[{#DISPLAYNAME}] Preprocessing JSON Path: `$.temporarilyOffline` Boolean to decimal Discard unchanged with heartbeat: `1h`
Computer [{#DISPLAY_NAME}]: Available disk space	The available disk space of $JENKINS_HOME on agent.	Dependent item	jenkins.computer.diskspace[{#DISPLAYNAME}] Preprocessing JSON Path: `$.monitorData['hudson.node_monitors.DiskSpaceMonitor'].size` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: Available temp space	The available disk space of the temporary directory. Java tools and tests/builds often create files in the temporary directory, and may not function properly if there's no available space.	Dependent item	jenkins.computer.tempspace[{#DISPLAYNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: Response time average	The round trip network response time from the master to the agent	Dependent item	jenkins.computer.responsetime[{#DISPLAYNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Computer [{#DISPLAY_NAME}]: Available physical memory	The total physical memory of the system, available bytes.	Dependent item	jenkins.computer.availablephysicalmemory[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: Available swap space	Available swap space in bytes.	Dependent item	jenkins.computer.availableswapspace[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: Total physical memory	Total physical memory of the system, in bytes.	Dependent item	jenkins.computer.totalphysicalmemory[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: Total swap space	Total number of swap space in bytes.	Dependent item	jenkins.computer.totalswapspace[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Computer [{#DISPLAY_NAME}]: Clock difference	The clock difference between the master and nodes.	Dependent item	jenkins.computer.clockdifference[{#DISPLAYNAME}] Preprocessing JSON Path: `$.monitorData['hudson.node_monitors.ClockMonitor'].diff` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`

Trigger prototypes for Computers discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jenkins: Computer [{#DISPLAY_NAME}]: Node is down	Node down with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.computer.state[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0`\|Average	Depends on: Jenkins: Service has no online nodes Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline
Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline	Node is temporarily Offline with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.computer.temp_offline[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_iis_agent_active

View README Download JSON

IIS by Zabbix agent active

Overview

The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Windows Server 2012R2

Configuration

Setup

You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server

Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools

Optionally, it is possible to customize the template:

Set value for the macro {$IIS.QUEUE.MAX.WARN}, if you want to receive alerts when a number of requests in the application pool queue exceeds the threshold.
If you use a non-standard port for the IIS, don't forget to update the macros {$IIS.SERVICE} and {$IIS.PORT}.
Change the value of macro {$IIS.APPPOOL.MONITORED} to "0", if you want to disable all notifications about application pools state.
You can also add additional context macro {$IIS.APPPOOL.MONITORED:} for excluding specific application pools from monitoring.
Change regexp in the macros {$IIS.APPPOOL.MATCHES} and {$IIS.APPPOOL.NOT_MATCHES} used for filtering application pools discovery results.

Macros used

Name	Description	Default
{$IIS.PORT}	Listening port.	`80`
{$IIS.SERVICE}	The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/7.0/manual/config/items/itemtypes/simple_checks	`http`
{$IIS.QUEUE.MAX.WARN}	Maximum application pool's request queue length for trigger expression.
{$IIS.QUEUE.MAX.TIME}	The time during which the queue length may exceed the threshold.	`5m`
{$IIS.APPPOOL.NOT_MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`<CHANGE_IF_NEEDED>`
{$IIS.APPPOOL.MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`.+`
{$IIS.APPPOOL.MONITORED}	Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled.	`1`
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable.	`5m`

Items

Name	Description	Type	Key and additional info
World Wide Web Publishing Service (W3SVC) state	The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service".	Zabbix agent (active)	service.info[W3SVC] Preprocessing Discard unchanged with heartbeat: `10m`
Windows Process Activation Service (WAS) state	Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains.	Zabbix agent (active)	service.info[WAS] Preprocessing Discard unchanged with heartbeat: `10m`
{$IIS.PORT} port ping		Simple check	net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
Uptime	The service uptime expressed in seconds.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Service Uptime"]
Bytes Received per second	The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60]
Bytes Sent per second	The average rate per minute at which data bytes are sent by the service.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60]
Bytes Total per second	The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec).	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60]
Current connections	The number of active connections.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Current Connections"]
Total connection attempts	The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing Discard unchanged with heartbeat: `10m`
Connection attempts per second	The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Anonymous users per second	The number of requests from users over an anonymous connection per second. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60]
NonAnonymous users per second	The number of requests from users over a non-anonymous connection per second. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60]
Method GET requests per second	The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method COPY requests per second	The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method CGI requests per second	The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method DELETE requests per second	The rate of HTTP requests using the DELETE method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method HEAD requests per second	The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method ISAPI requests per second	The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method LOCK requests per second	The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method MKCOL requests per second	The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method MOVE requests per second	The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method OPTIONS requests per second	The rate of HTTP requests using the OPTIONS method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method POST requests per second	Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method PROPFIND requests per second	The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method PROPPATCH requests per second	The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method PUT requests per second	The rate of HTTP requests using the PUT method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method MS-SEARCH requests per second	The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method TRACE requests per second	The rate of HTTP requests using the TRACE method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method TRACE requests per second	The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method Total requests per second	The rate of all HTTP requests received. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method Total Other requests per second	Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Locked errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Not Found errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Files cache hits percentage	The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high.	Zabbix agent (active)	perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
URIs cache hits percentage	The ratio of user-mode URI Cache Hits to total cache requests (since service startup)	Zabbix agent (active)	perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
File cache misses	The total number of unsuccessful lookups in the user-mode file cache since service startup.	Zabbix agent (active)	perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`
URI cache misses	The total number of unsuccessful lookups in the user-mode URI cache since service startup.	Zabbix agent (active)	perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`
Active agent availability	Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - unknown 1 - available 2 - not available	Zabbix internal	zabbix[host,active_agent,available]

Triggers

Name	Description	Expression	Severity
IIS: The World Wide Web Publishing Service (W3SVC) is not running	The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent active/service.info[W3SVC])<>0`\|High	Depends on: IIS: Windows process Activation Service (WAS) is not running
IIS: Windows process Activation Service (WAS) is not running	Windows Process Activation Service (WAS) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent active/service.info[WAS])<>0`\|High
IIS: Port {$IIS.PORT} is down		`last(/IIS by Zabbix agent active/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0`\|Average	Manual close: Yes Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: Service has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent active/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m`\|Info	Manual close: Yes
IIS: Active checks are not available	Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time.	`min(/IIS by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2`\|High

LLD rule Application pools discovery

Name	Description	Type	Key and additional info
Application pools discovery		Zabbix agent (active)	wmi.getall[root\webAdministration, select Name from ApplicationPool]

Item prototypes for Application pools discovery

Name	Description	Type	Key and additional info
{#APPPOOL} Uptime	The web application uptime period since the last restart.	Zabbix agent (active)	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"]
AppPool {#APPPOOL} state	The state of the application pool.	Zabbix agent (active)	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing Discard unchanged with heartbeat: `10m`
AppPool {#APPPOOL} recycles	The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started.	Zabbix agent (active)	perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing Discard unchanged with heartbeat: `10m`
AppPool {#APPPOOL} current queue size	The number of requests in the queue.	Zabbix agent (active)	perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing Discard unchanged with heartbeat: `10m`

Trigger prototypes for Application pools discovery

Name	Description	Expression	Severity
IIS: {#APPPOOL} has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m`\|Info	Manual close: Yes
IIS: Application pool {#APPPOOL} is not in Running state		`last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|High	Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: Application pool {#APPPOOL} has been recycled		`last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|Info
IIS: Request queue of {#APPPOOL} is too large		`min(/IIS by Zabbix agent active/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN}`\|Warning	Depends on: IIS: Application pool {#APPPOOL} is not in Running state

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_iis_agent

View README Download JSON

IIS by Zabbix agent

Overview

The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Windows Server 2012R2

Configuration

Setup

You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server

Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools

Optionally, it is possible to customize the template:

Set value for the macro {$IIS.QUEUE.MAX.WARN}, if you want to receive alerts when a number of requests in the application pool queue exceeds the threshold.
If you use a non-standard port for the IIS, don't forget to update the macros {$IIS.SERVICE} and {$IIS.PORT}.
Change the value of macro {$IIS.APPPOOL.MONITORED} to "0", if you want to disable all notifications about application pools state.
You can also add additional context macro {$IIS.APPPOOL.MONITORED:} for excluding specific application pools from monitoring.
Change regexp in the macros {$IIS.APPPOOL.MATCHES} and {$IIS.APPPOOL.NOT_MATCHES} used for filtering application pools discovery results.

Macros used

Name	Description	Default
{$IIS.PORT}	Listening port.	`80`
{$IIS.SERVICE}	The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/7.0/manual/config/items/itemtypes/simple_checks	`http`
{$IIS.QUEUE.MAX.WARN}	Maximum application pool's request queue length for trigger expression.
{$IIS.QUEUE.MAX.TIME}	The time during which the queue length may exceed the threshold.	`5m`
{$IIS.APPPOOL.NOT_MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`<CHANGE_IF_NEEDED>`
{$IIS.APPPOOL.MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`.+`
{$IIS.APPPOOL.MONITORED}	Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled.	`1`

Items

Name	Description	Type	Key and additional info
World Wide Web Publishing Service (W3SVC) state	The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service".	Zabbix agent	service.info[W3SVC] Preprocessing Discard unchanged with heartbeat: `10m`
Windows Process Activation Service (WAS) state	Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains.	Zabbix agent	service.info[WAS] Preprocessing Discard unchanged with heartbeat: `10m`
{$IIS.PORT} port ping		Simple check	net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
Uptime	The service uptime expressed in seconds.	Zabbix agent	perfcounteren["\Web Service(_Total)\Service Uptime"]
Bytes Received per second	The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes.	Zabbix agent	perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60]
Bytes Sent per second	The average rate per minute at which data bytes are sent by the service.	Zabbix agent	perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60]
Bytes Total per second	The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec).	Zabbix agent	perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60]
Current connections	The number of active connections.	Zabbix agent	perfcounteren["\Web Service(_Total)\Current Connections"]
Total connection attempts	The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined.	Zabbix agent	perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing Discard unchanged with heartbeat: `10m`
Connection attempts per second	The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined.	Zabbix agent	perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Anonymous users per second	The number of requests from users over an anonymous connection per second. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60]
NonAnonymous users per second	The number of requests from users over a non-anonymous connection per second. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60]
Method GET requests per second	The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method COPY requests per second	The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method CGI requests per second	The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method DELETE requests per second	The rate of HTTP requests using the DELETE method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method HEAD requests per second	The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method ISAPI requests per second	The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method LOCK requests per second	The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method MKCOL requests per second	The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method MOVE requests per second	The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method OPTIONS requests per second	The rate of HTTP requests using the OPTIONS method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method POST requests per second	Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method PROPFIND requests per second	The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method PROPPATCH requests per second	The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method PUT requests per second	The rate of HTTP requests using the PUT method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method MS-SEARCH requests per second	The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method TRACE requests per second	The rate of HTTP requests using the TRACE method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method TRACE requests per second	The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method Total requests per second	The rate of all HTTP requests received. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Method Total Other requests per second	Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Locked errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Not Found errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
Files cache hits percentage	The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high.	Zabbix agent	perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
URIs cache hits percentage	The ratio of user-mode URI Cache Hits to total cache requests (since service startup)	Zabbix agent	perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
File cache misses	The total number of unsuccessful lookups in the user-mode file cache since service startup.	Zabbix agent	perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`
URI cache misses	The total number of unsuccessful lookups in the user-mode URI cache since service startup.	Zabbix agent	perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`

Triggers

Name	Description	Expression	Severity
IIS: The World Wide Web Publishing Service (W3SVC) is not running	The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent/service.info[W3SVC])<>0`\|High	Depends on: IIS: Windows process Activation Service (WAS) is not running
IIS: Windows process Activation Service (WAS) is not running	Windows Process Activation Service (WAS) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent/service.info[WAS])<>0`\|High
IIS: Port {$IIS.PORT} is down		`last(/IIS by Zabbix agent/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0`\|Average	Manual close: Yes Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: Service has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m`\|Info	Manual close: Yes

LLD rule Application pools discovery

Name	Description	Type	Key and additional info
Application pools discovery		Zabbix agent	wmi.getall[root\webAdministration, select Name from ApplicationPool]

Item prototypes for Application pools discovery

Name	Description	Type	Key and additional info
{#APPPOOL} Uptime	The web application uptime period since the last restart.	Zabbix agent	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"]
AppPool {#APPPOOL} state	The state of the application pool.	Zabbix agent	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing Discard unchanged with heartbeat: `10m`
AppPool {#APPPOOL} recycles	The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started.	Zabbix agent	perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing Discard unchanged with heartbeat: `10m`
AppPool {#APPPOOL} current queue size	The number of requests in the queue.	Zabbix agent	perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing Discard unchanged with heartbeat: `10m`

Trigger prototypes for Application pools discovery

Name	Description	Expression	Severity
IIS: {#APPPOOL} has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m`\|Info	Manual close: Yes
IIS: Application pool {#APPPOOL} is not in Running state		`last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|High	Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: Application pool {#APPPOOL} has been recycled		`last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|Info
IIS: Request queue of {#APPPOOL} is too large		`min(/IIS by Zabbix agent/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN}`\|Warning	Depends on: IIS: Application pool {#APPPOOL} is not in Running state

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_haproxy_http

View README Download JSON

HAProxy by HTTP

Overview

The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling the HAProxy stats page with HTTP agent.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HAProxy 1.8

Configuration

Setup

Set up the HAProxy stats page.

If you want to use authentication, set the username and password in the stats auth option of the configuration file.

The example configuration of HAProxy:

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    #stats auth Username:Password  # Authentication credentials

Set the hostname or IP address of the HAProxy stats host or container in the {$HAPROXY.STATS.HOST} macro. You can also change the status page port in the {$HAPROXY.STATS.PORT} macro, the status page scheme in the {$HAPROXY.STATS.SCHEME} macro and the status page path in the {$HAPROXY.STATS.PATH} macro if necessary.
If you have enabled authentication in the HAProxy configuration file in step 1, set the username and password in the {$HAPROXY.USERNAME} and {$HAPROXY.PASSWORD} macros.

Macros used

Name	Description	Default
{$HAPROXY.STATS.SCHEME}	The scheme of HAProxy stats page (http/https).	`http`
{$HAPROXY.STATS.HOST}	The hostname or IP address of the HAProxy stats host or container.	`<SET HAPROXY HOST>`
{$HAPROXY.STATS.PORT}	The port of the HAProxy stats host or container.	`8404`
{$HAPROXY.STATS.PATH}	The path of the HAProxy stats page.	`stats`
{$HAPROXY.USERNAME}	The username of the HAProxy stats page.
{$HAPROXY.PASSWORD}	The password of the HAProxy stats page.
{$HAPROXY.RESPONSE_TIME.MAX.WARN}	The HAProxy stats page maximum response time in seconds for trigger expression.	`10s`
{$HAPROXY.FRONT_DREQ.MAX.WARN}	The HAProxy maximum denied requests for trigger expression.	`10`
{$HAPROXY.FRONT_EREQ.MAX.WARN}	The HAProxy maximum number of request errors for trigger expression.	`10`
{$HAPROXY.BACK_QCUR.MAX.WARN}	Maximum number of requests on Backend unassigned in queue for trigger expression.	`10`
{$HAPROXY.BACK_RTIME.MAX.WARN}	Maximum of average Backend response time for trigger expression.	`10s`
{$HAPROXY.BACK_QTIME.MAX.WARN}	Maximum of average time spent in queue on Backend for trigger expression.	`10s`
{$HAPROXY.BACK_ERESP.MAX.WARN}	Maximum of responses with error on Backend for trigger expression.	`10`
{$HAPROXY.SERVER_QCUR.MAX.WARN}	Maximum number of requests on server unassigned in queue for trigger expression.	`10`
{$HAPROXY.SERVER_RTIME.MAX.WARN}	Maximum of average server response time for trigger expression.	`10s`
{$HAPROXY.SERVER_QTIME.MAX.WARN}	Maximum of average time spent in queue on server for trigger expression.	`10s`
{$HAPROXY.SERVER_ERESP.MAX.WARN}	Maximum of responses with error on server for trigger expression.	`10`
{$HAPROXY.FRONT_SUTIL.MAX.WARN}	Maximum of session usage percentage on frontend for trigger expression.	`80`

Items

Name	Description	Type	Key and additional info
Get stats	HAProxy Statistics Report in CSV format	HTTP agent	haproxy.get Preprocessing Regular expression: `# ([\s\S]*)\n \1` CSV to JSON
Get nodes	Array for LLD rules.	Dependent item	haproxy.get.nodes Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get stats page	HAProxy Statistics Report HTML	HTTP agent	haproxy.get_html
Version		Dependent item	haproxy.version Preprocessing Regular expression: `HAProxy version ([^,]*), \1` ⛔️Custom on fail: Set error to: `HAProxy version is not found` Discard unchanged with heartbeat: `1d`
Uptime		Dependent item	haproxy.uptime Preprocessing JavaScript: `The text is too long. Please see the template.`
Service status		Simple check	net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Simple check	net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"]

Triggers

Name	Description	Expression	Severity
HAProxy: Version has changed	HAProxy version has changed. Acknowledge to close the problem manually.	`last(/HAProxy by HTTP/haproxy.version,#1)<>last(/HAProxy by HTTP/haproxy.version,#2) and length(last(/HAProxy by HTTP/haproxy.version))>0`\|Info	Manual close: Yes
HAProxy: Service has been restarted	Uptime is less than 10 minutes.	`last(/HAProxy by HTTP/haproxy.uptime)<10m`\|Info	Manual close: Yes
HAProxy: Service is down		`last(/HAProxy by HTTP/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0`\|Average	Manual close: Yes
HAProxy: Service response time is too high		`min(/HAProxy by HTTP/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: HAProxy: Service is down

LLD rule Backend discovery

Name	Description	Type	Key and additional info
Backend discovery	Discovery backends	Dependent item	haproxy.backend.discovery

Item prototypes for Backend discovery

Name	Description	Type	Key and additional info
Backend {#PXNAME}: Raw data	The raw data of the Backend with the name `{#PXNAME}`	Dependent item	haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Backend {#PXNAME}: Status	Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server.	Dependent item	haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
Backend {#PXNAME}: Responses time	Average backend response time (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
Backend {#PXNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
Backend {#PXNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
Backend {#PXNAME}: Response errors per second	Number of requests whose responses yielded an error	Dependent item	haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
Backend {#PXNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
Backend {#PXNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
Backend {#PXNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
Backend {#PXNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
Backend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
Backend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
Backend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
Backend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
Backend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
Backend {#PXNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
Backend {#PXNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
Backend {#PXNAME}: Number of active servers	Number of active servers.	Dependent item	haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
Backend {#PXNAME}: Number of backup servers	Number of backup servers.	Dependent item	haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
Backend {#PXNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
Backend {#PXNAME}: Weight	Total effective weight.	Dependent item	haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Backend discovery

Name	Description	Expression
HAProxy: backend {#PXNAME}: Server is DOWN	Backend is not available.	`count(/HAProxy by HTTP/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Average
HAProxy: backend {#PXNAME}: Average response time is high	Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN}`\|Warning
HAProxy: backend {#PXNAME}: Number of responses with error is high	Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN}`\|Warning
HAProxy: backend {#PXNAME}: Current number of requests unassigned in queue is high	Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN}`\|Warning
HAProxy: backend {#PXNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN}`\|Warning

LLD rule Frontend discovery

Name	Description	Type	Key and additional info
Frontend discovery	Discovery frontends	Dependent item	haproxy.frontend.discovery

Item prototypes for Frontend discovery

Name	Description	Type	Key and additional info
Frontend {#PXNAME}: Raw data	The raw data of the Frontend with the name `{#PXNAME}`	Dependent item	haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Frontend {#PXNAME}: Status	Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic.	Dependent item	haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `6h`
Frontend {#PXNAME}: Requests rate	HTTP requests per second	Dependent item	haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.req_rate`
Frontend {#PXNAME}: Sessions rate	Number of sessions created per second	Dependent item	haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rate`
Frontend {#PXNAME}: Established sessions	The current number of established sessions.	Dependent item	haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.scur`
Frontend {#PXNAME}: Session limits	The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend.	Dependent item	haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.slim` Discard unchanged with heartbeat: `1h`
Frontend {#PXNAME}: Session utilization	Percentage of sessions used (scur / slim * 100).	Calculated	haproxy.frontend.sutil[{#PXNAME},{#SVNAME}]
Frontend {#PXNAME}: Request errors per second	Number of request errors per second.	Dependent item	haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.ereq` Change per second
Frontend {#PXNAME}: Denied requests per second	Requests denied due to security concerns (ACL-restricted) per second.	Dependent item	haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dreq` Change per second
Frontend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
Frontend {#PXNAME}: Incoming traffic	Number of bits received by the frontend	Dependent item	haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
Frontend {#PXNAME}: Outgoing traffic	Number of bits sent by the frontend	Dependent item	haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second

Trigger prototypes for Frontend discovery

Name	Description	Expression
HAProxy: frontend {#PXNAME}: Session utilization is high	Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box.	`min(/HAProxy by HTTP/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN}`\|Warning
HAProxy: frontend {#PXNAME}: Number of request errors is high	Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN}`\|Warning
HAProxy: frontend {#PXNAME}: Number of requests denied is high	Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN}`\|Warning

LLD rule Server discovery

Name	Description	Type	Key and additional info
Server discovery	Discovery servers	Dependent item	haproxy.server.discovery

Item prototypes for Server discovery

Name	Description	Type	Key and additional info
Server {#PXNAME} {#SVNAME}: Raw data	The raw data of the Server named `{#SVNAME}` and the proxy with the name `{#PXNAME}`	Dependent item	haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#PXNAME} {#SVNAME}: Status		Dependent item	haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
{#PXNAME} {#SVNAME}: Responses time	Average server response time (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
{#PXNAME} {#SVNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
{#PXNAME} {#SVNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
{#PXNAME} {#SVNAME}: Response errors per second	Number of requests whose responses yielded an error.	Dependent item	haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
{#PXNAME} {#SVNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
{#PXNAME} {#SVNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
{#PXNAME} {#SVNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
{#PXNAME} {#SVNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
{#PXNAME} {#SVNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
{#PXNAME} {#SVNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
{#PXNAME} {#SVNAME}: Server is active	Shows whether the server is active (marked with a Y) or a backup (marked with a -).	Dependent item	haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
{#PXNAME} {#SVNAME}: Server is backup	Shows whether the server is a backup (marked with a Y) or active (marked with a -).	Dependent item	haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
{#PXNAME} {#SVNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
{#PXNAME} {#SVNAME}: Weight	Effective weight.	Dependent item	haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`
{#PXNAME} {#SVNAME}: Configured maxqueue	Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit).	Dependent item	haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qlimit` Discard unchanged with heartbeat: `6h` Matches regular expression: `^\d+$` ⛔️Custom on fail: Set value to: `0`
{#PXNAME} {#SVNAME}: Server was selected per second	Number of times that server was selected.	Dependent item	haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.lbtot` Change per second
{#PXNAME} {#SVNAME}: Status of last health check	Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK".	Dependent item	haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.check_status` Discard unchanged with heartbeat: `10m`

Trigger prototypes for Server discovery

Name	Description	Expression	Severity
HAProxy: {#PXNAME} {#SVNAME}: Server is DOWN	Server is not available.	`count(/HAProxy by HTTP/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Average response time is high	Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Number of responses with error is high	Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high	Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Health check error	Please check the server for faults.	`find(/HAProxy by HTTP/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK\|^$)")=0`\|Warning	Depends on: HAProxy: {#PXNAME} {#SVNAME}: Server is DOWN

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_haproxy_agent

View README Download JSON

HAProxy by Zabbix agent

Overview

The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling the HAProxy stats page with Zabbix agent.

Note, that this template doesn't support authentication and redirects (limitations of web.page.get).

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HAProxy 1.8

Configuration

Setup

Set up the HAProxy stats page.

The example configuration of HAProxy:

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s

Set the hostname or IP address of the HAProxy stats host or container in the {$HAPROXY.STATS.HOST} macro. You can also change the status page port in the {$HAPROXY.STATS.PORT} macro, the status page scheme in the {$HAPROXY.STATS.SCHEME} macro and the status page path in the {$HAPROXY.STATS.PATH} macro if necessary.

Macros used

Name	Description	Default
{$HAPROXY.STATS.SCHEME}	The scheme of HAProxy stats page(http/https).	`http`
{$HAPROXY.STATS.HOST}	The hostname or IP address of the HAProxy stats host or container.	`localhost`
{$HAPROXY.STATS.PORT}	The port of the HAProxy stats host or container.	`8404`
{$HAPROXY.STATS.PATH}	The path of HAProxy stats page.	`stats`
{$HAPROXY.RESPONSE_TIME.MAX.WARN}	The HAProxy stats page maximum response time in seconds for trigger expression.	`10s`
{$HAPROXY.FRONT_DREQ.MAX.WARN}	The HAProxy maximum denied requests for trigger expression.	`10`
{$HAPROXY.FRONT_EREQ.MAX.WARN}	The HAProxy maximum number of request errors for trigger expression.	`10`
{$HAPROXY.BACK_QCUR.MAX.WARN}	Maximum number of requests on BACKEND unassigned in queue for trigger expression.	`10`
{$HAPROXY.BACK_RTIME.MAX.WARN}	Maximum of average BACKEND response time for trigger expression.	`10s`
{$HAPROXY.BACK_QTIME.MAX.WARN}	Maximum of average time spent in queue on BACKEND for trigger expression.	`10s`
{$HAPROXY.BACK_ERESP.MAX.WARN}	Maximum of responses with error on BACKEND for trigger expression.	`10`
{$HAPROXY.SERVER_QCUR.MAX.WARN}	Maximum number of requests on server unassigned in queue for trigger expression.	`10`
{$HAPROXY.SERVER_RTIME.MAX.WARN}	Maximum of average server response time for trigger expression.	`10s`
{$HAPROXY.SERVER_QTIME.MAX.WARN}	Maximum of average time spent in queue on server for trigger expression.	`10s`
{$HAPROXY.SERVER_ERESP.MAX.WARN}	Maximum of responses with error on server for trigger expression.	`10`
{$HAPROXY.FRONT_SUTIL.MAX.WARN}	Maximum of session usage percentage on frontend for trigger expression.	`80`

Items

Name	Description	Type	Key and additional info
Get stats	HAProxy Statistics Report in CSV format	Zabbix agent	web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH};csv"] Preprocessing Regular expression: `# ([\s\S]*) \1` CSV to JSON
Get nodes	Array for LLD rules.	Dependent item	haproxy.get.nodes Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get stats page	HAProxy Statistics Report HTML	Zabbix agent	web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH}"]
Version		Dependent item	haproxy.version Preprocessing Regular expression: `HAProxy version ([^,]*), \1` ⛔️Custom on fail: Set error to: `HAProxy version is not found` Discard unchanged with heartbeat: `1d`
Uptime		Dependent item	haproxy.uptime Preprocessing JavaScript: `The text is too long. Please see the template.`
Service status		Zabbix agent	net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Zabbix agent	net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"]

Triggers

Name	Description	Expression	Severity
HAProxy: Version has changed	HAProxy version has changed. Acknowledge to close the problem manually.	`last(/HAProxy by Zabbix agent/haproxy.version,#1)<>last(/HAProxy by Zabbix agent/haproxy.version,#2) and length(last(/HAProxy by Zabbix agent/haproxy.version))>0`\|Info	Manual close: Yes
HAProxy: Service has been restarted	Uptime is less than 10 minutes.	`last(/HAProxy by Zabbix agent/haproxy.uptime)<10m`\|Info	Manual close: Yes
HAProxy: Service is down		`last(/HAProxy by Zabbix agent/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0`\|Average	Manual close: Yes
HAProxy: Service response time is too high		`min(/HAProxy by Zabbix agent/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: HAProxy: Service is down

LLD rule Backend discovery

Name	Description	Type	Key and additional info
Backend discovery	Discovery backends	Dependent item	haproxy.backend.discovery

Item prototypes for Backend discovery

Name	Description	Type	Key and additional info
Backend {#PXNAME}: Raw data	The raw data of the Backend with the name `{#PXNAME}`	Dependent item	haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Backend {#PXNAME}: Status	Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server.	Dependent item	haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
Backend {#PXNAME}: Responses time	Average backend response time (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
Backend {#PXNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
Backend {#PXNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
Backend {#PXNAME}: Response errors per second	Number of requests whose responses yielded an error	Dependent item	haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
Backend {#PXNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
Backend {#PXNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
Backend {#PXNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
Backend {#PXNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
Backend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
Backend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
Backend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
Backend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
Backend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
Backend {#PXNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
Backend {#PXNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
Backend {#PXNAME}: Number of active servers	Number of active servers.	Dependent item	haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
Backend {#PXNAME}: Number of backup servers	Number of backup servers.	Dependent item	haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
Backend {#PXNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
Backend {#PXNAME}: Weight	Total effective weight.	Dependent item	haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Backend discovery

Name	Description	Expression
HAProxy: backend {#PXNAME}: Server is DOWN	Backend is not available.	`count(/HAProxy by Zabbix agent/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Average
HAProxy: backend {#PXNAME}: Average response time is high	Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN}`\|Warning
HAProxy: backend {#PXNAME}: Number of responses with error is high	Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN}`\|Warning
HAProxy: backend {#PXNAME}: Current number of requests unassigned in queue is high	Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN}`\|Warning
HAProxy: backend {#PXNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN}`\|Warning

LLD rule Frontend discovery

Name	Description	Type	Key and additional info
Frontend discovery	Discovery frontends	Dependent item	haproxy.frontend.discovery

Item prototypes for Frontend discovery

Name	Description	Type	Key and additional info
Frontend {#PXNAME}: Raw data	The raw data of the Frontend with the name `{#PXNAME}`	Dependent item	haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Frontend {#PXNAME}: Status	Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic.	Dependent item	haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `6h`
Frontend {#PXNAME}: Requests rate	HTTP requests per second	Dependent item	haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.req_rate`
Frontend {#PXNAME}: Sessions rate	Number of sessions created per second	Dependent item	haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rate`
Frontend {#PXNAME}: Established sessions	The current number of established sessions.	Dependent item	haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.scur`
Frontend {#PXNAME}: Session limits	The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend.	Dependent item	haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.slim` Discard unchanged with heartbeat: `1h`
Frontend {#PXNAME}: Session utilization	Percentage of sessions used (scur / slim * 100).	Calculated	haproxy.frontend.sutil[{#PXNAME},{#SVNAME}]
Frontend {#PXNAME}: Request errors per second	Number of request errors per second.	Dependent item	haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.ereq` Change per second
Frontend {#PXNAME}: Denied requests per second	Requests denied due to security concerns (ACL-restricted) per second.	Dependent item	haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dreq` Change per second
Frontend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
Frontend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
Frontend {#PXNAME}: Incoming traffic	Number of bits received by the frontend	Dependent item	haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
Frontend {#PXNAME}: Outgoing traffic	Number of bits sent by the frontend	Dependent item	haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second

Trigger prototypes for Frontend discovery

Name	Description	Expression
HAProxy: frontend {#PXNAME}: Session utilization is high	Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box.	`min(/HAProxy by Zabbix agent/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN}`\|Warning
HAProxy: frontend {#PXNAME}: Number of request errors is high	Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN}`\|Warning
HAProxy: frontend {#PXNAME}: Number of requests denied is high	Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN}`\|Warning

LLD rule Server discovery

Name	Description	Type	Key and additional info
Server discovery	Discovery servers	Dependent item	haproxy.server.discovery

Item prototypes for Server discovery

Name	Description	Type	Key and additional info
Server {#PXNAME} {#SVNAME}: Raw data	The raw data of the Server named `{#SVNAME}` and the proxy with the name `{#PXNAME}`	Dependent item	haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#PXNAME} {#SVNAME}: Status		Dependent item	haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
{#PXNAME} {#SVNAME}: Responses time	Average server response time (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
{#PXNAME} {#SVNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
{#PXNAME} {#SVNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
{#PXNAME} {#SVNAME}: Response errors per second	Number of requests whose responses yielded an error.	Dependent item	haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
{#PXNAME} {#SVNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
{#PXNAME} {#SVNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
{#PXNAME} {#SVNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
{#PXNAME} {#SVNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
{#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
{#PXNAME} {#SVNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
{#PXNAME} {#SVNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
{#PXNAME} {#SVNAME}: Server is active	Shows whether the server is active (marked with a Y) or a backup (marked with a -).	Dependent item	haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
{#PXNAME} {#SVNAME}: Server is backup	Shows whether the server is a backup (marked with a Y) or active (marked with a -).	Dependent item	haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
{#PXNAME} {#SVNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
{#PXNAME} {#SVNAME}: Weight	Effective weight.	Dependent item	haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`
{#PXNAME} {#SVNAME}: Configured maxqueue	Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit).	Dependent item	haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qlimit` Discard unchanged with heartbeat: `6h` Matches regular expression: `^\d+$` ⛔️Custom on fail: Set value to: `0`
{#PXNAME} {#SVNAME}: Server was selected per second	Number of times that server was selected.	Dependent item	haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.lbtot` Change per second
{#PXNAME} {#SVNAME}: Status of last health check	Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK".	Dependent item	haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.check_status` Discard unchanged with heartbeat: `10m`

Trigger prototypes for Server discovery

Name	Description	Expression	Severity
HAProxy: {#PXNAME} {#SVNAME}: Server is DOWN	Server is not available.	`count(/HAProxy by Zabbix agent/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Average response time is high	Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Number of responses with error is high	Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high	Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN}`\|Warning
HAProxy: {#PXNAME} {#SVNAME}: Health check error	Please check the server for faults.	`find(/HAProxy by Zabbix agent/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK\|^$)")=0`\|Warning	Depends on: HAProxy: {#PXNAME} {#SVNAME}: Server is DOWN

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_hadoop_http

View README Download JSON

Hadoop by HTTP

Overview

The template for monitoring Hadoop over HTTP that works without any external scripts. It collects metrics by polling the Hadoop API remotely using an HTTP agent and JSONPath preprocessing. Zabbix server (or proxy) execute direct requests to ResourceManager, NodeManagers, NameNode, DataNodes APIs. All metrics are collected at once, thanks to the Zabbix bulk data collection.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Hadoop 3.1 and later

Configuration

Setup

You should define the IP address (or FQDN) and Web-UI port for the ResourceManager in {$HADOOP.RESOURCEMANAGER.HOST} and {$HADOOP.RESOURCEMANAGER.PORT} macros and for the NameNode in {$HADOOP.NAMENODE.HOST} and {$HADOOP.NAMENODE.PORT} macros respectively. Macros can be set in the template or overridden at the host level.

Macros used

Name	Description	Default
{$HADOOP.RESOURCEMANAGER.HOST}	The Hadoop ResourceManager host IP address or FQDN.	`ResourceManager`
{$HADOOP.RESOURCEMANAGER.PORT}	The Hadoop ResourceManager Web-UI port.	`8088`
{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN}	The Hadoop ResourceManager API page maximum response time in seconds for trigger expression.	`10s`
{$HADOOP.NAMENODE.HOST}	The Hadoop NameNode host IP address or FQDN.	`NameNode`
{$HADOOP.NAMENODE.PORT}	The Hadoop NameNode Web-UI port.	`9870`
{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN}	The Hadoop NameNode API page maximum response time in seconds for trigger expression.	`10s`
{$HADOOP.CAPACITY_REMAINING.MIN.WARN}	The Hadoop cluster capacity remaining percent for trigger expression.	`20`

Items

Name	Description	Type	Key and additional info
ResourceManager: Service status	Hadoop ResourceManager API port availability.	Simple check	net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
ResourceManager: Service response time	Hadoop ResourceManager API performance.	Simple check	net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"]
Get ResourceManager stats		HTTP agent	hadoop.resourcemanager.get
ResourceManager: Uptime		Dependent item	hadoop.resourcemanager.uptime Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
ResourceManager: Get info		Dependent item	hadoop.resourcemanager.info Preprocessing JSON Path: `$.beans[?(@.name=~'Hadoop:service=ResourceManager,name=*')]` ⛔️Custom on fail: Set value to: `[]`
ResourceManager: RPC queue & processing time	Average time spent on processing RPC requests.	Dependent item	hadoop.resourcemanager.rpcprocessingtime_avg Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Active NMs	Number of Active NodeManagers.	Dependent item	hadoop.resourcemanager.numactivenm Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
ResourceManager: Decommissioning NMs	Number of Decommissioning NodeManagers.	Dependent item	hadoop.resourcemanager.numdecommissioningnm Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
ResourceManager: Decommissioned NMs	Number of Decommissioned NodeManagers.	Dependent item	hadoop.resourcemanager.numdecommissionednm Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Lost NMs	Number of Lost NodeManagers.	Dependent item	hadoop.resourcemanager.numlostnm Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
ResourceManager: Unhealthy NMs	Number of Unhealthy NodeManagers.	Dependent item	hadoop.resourcemanager.numunhealthynm Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Rebooted NMs	Number of Rebooted NodeManagers.	Dependent item	hadoop.resourcemanager.numrebootednm Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Shutdown NMs	Number of Shutdown NodeManagers.	Dependent item	hadoop.resourcemanager.numshutdownnm Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Service status	Hadoop NameNode API port availability.	Simple check	net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
NameNode: Service response time	Hadoop NameNode API performance.	Simple check	net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"]
Get NameNode stats		HTTP agent	hadoop.namenode.get
NameNode: Uptime		Dependent item	hadoop.namenode.uptime Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
NameNode: Get info		Dependent item	hadoop.namenode.info Preprocessing JSON Path: `$.beans[?(@.name=~'Hadoop:service=NameNode,name=*')]` ⛔️Custom on fail: Set value to: `[]`
NameNode: RPC queue & processing time	Average time spent on processing RPC requests.	Dependent item	hadoop.namenode.rpcprocessingtime_avg Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Block Pool Renaming		Dependent item	hadoop.namenode.percentblockpool_used Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Transactions since last checkpoint	Total number of transactions since last checkpoint.	Dependent item	hadoop.namenode.transactionssincelast_checkpoint Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Percent capacity remaining	Available capacity in percent.	Dependent item	hadoop.namenode.percent_remaining Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Capacity remaining	Available capacity.	Dependent item	hadoop.namenode.capacity_remaining Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Corrupt blocks	Number of corrupt blocks.	Dependent item	hadoop.namenode.corrupt_blocks Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Missing blocks	Number of missing blocks.	Dependent item	hadoop.namenode.missing_blocks Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Failed volumes	Number of failed volumes.	Dependent item	hadoop.namenode.volumefailurestotal Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Alive DataNodes	Count of alive DataNodes.	Dependent item	hadoop.namenode.numlivedata_nodes Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Dead DataNodes	Count of dead DataNodes.	Dependent item	hadoop.namenode.numdeaddata_nodes Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Stale DataNodes	DataNodes that do not send a heartbeat within 30 seconds are marked as "stale".	Dependent item	hadoop.namenode.numstaledata_nodes Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Total files	Total count of files tracked by the NameNode.	Dependent item	hadoop.namenode.files_total Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Total load	The current number of concurrent file accesses (read/write) across all DataNodes.	Dependent item	hadoop.namenode.total_load Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Blocks allocable	Maximum number of blocks allocable.	Dependent item	hadoop.namenode.block_capacity Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Total blocks	Count of blocks tracked by NameNode.	Dependent item	hadoop.namenode.blocks_total Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Under-replicated blocks	The number of blocks with insufficient replication.	Dependent item	hadoop.namenode.underreplicatedblocks Preprocessing JSON Path: `The text is too long. Please see the template.`
Get NodeManagers states		HTTP agent	hadoop.nodemanagers.get Preprocessing JavaScript: `The text is too long. Please see the template.`
Get DataNodes states		HTTP agent	hadoop.datanodes.get Preprocessing JavaScript: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
Hadoop: ResourceManager: Service is unavailable		`last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"])=0`\|Average	Manual close: Yes
Hadoop: ResourceManager: Service response time is too high		`min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"],5m)>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Hadoop: ResourceManager: Service is unavailable
Hadoop: ResourceManager: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m`\|Info	Manual close: Yes
Hadoop: ResourceManager: Failed to fetch ResourceManager API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.resourcemanager.uptime,30m)=1`\|Warning	Manual close: Yes Depends on: Hadoop: ResourceManager: Service is unavailable
Hadoop: ResourceManager: Cluster has no active NodeManagers	Cluster is unable to execute any jobs without at least one NodeManager.	`max(/Hadoop by HTTP/hadoop.resourcemanager.num_active_nm,5m)=0`\|High
Hadoop: ResourceManager: Cluster has unhealthy NodeManagers	YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved.	`min(/Hadoop by HTTP/hadoop.resourcemanager.num_unhealthy_nm,15m)>0`\|Average
Hadoop: NameNode: Service is unavailable		`last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"])=0`\|Average	Manual close: Yes
Hadoop: NameNode: Service response time is too high		`min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"],5m)>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Hadoop: NameNode: Service is unavailable
Hadoop: NameNode: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m`\|Info	Manual close: Yes
Hadoop: NameNode: Failed to fetch NameNode API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.namenode.uptime,30m)=1`\|Warning	Manual close: Yes Depends on: Hadoop: NameNode: Service is unavailable
Hadoop: NameNode: Cluster capacity remaining is low	A good practice is to ensure that disk use never exceeds 80 percent capacity.	`max(/Hadoop by HTTP/hadoop.namenode.percent_remaining,15m)<{$HADOOP.CAPACITY_REMAINING.MIN.WARN}`\|Warning
Hadoop: NameNode: Cluster has missing blocks	A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica.	`min(/Hadoop by HTTP/hadoop.namenode.missing_blocks,15m)>0`\|Average
Hadoop: NameNode: Cluster has volume failures	HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice.	`min(/Hadoop by HTTP/hadoop.namenode.volume_failures_total,15m)>0`\|Average
Hadoop: NameNode: Cluster has DataNodes in Dead state	The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes.	`min(/Hadoop by HTTP/hadoop.namenode.num_dead_data_nodes,5m)>0`\|Average

LLD rule Node manager discovery

Name Description Type Key and additional info

Node manager discovery

HTTP agent

hadoop.nodemanager.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Node manager discovery

Name	Description	Type	Key and additional info
Hadoop NodeManager {#HOSTNAME}: Get stats		HTTP agent	hadoop.nodemanager.get[{#HOSTNAME}]
{#HOSTNAME}: RPC queue & processing time	Average time spent on processing RPC requests.	Dependent item	hadoop.nodemanager.rpcprocessingtime_avg[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Container launch avg duration		Dependent item	hadoop.nodemanager.containerlaunchduration_avg[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Threads	The number of JVM threads.	Dependent item	hadoop.nodemanager.jvm.threads[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Garbage collection time	The JVM garbage collection time in milliseconds.	Dependent item	hadoop.nodemanager.jvm.gc_time[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Heap usage	The JVM heap usage in MBytes.	Dependent item	hadoop.nodemanager.jvm.memheapused[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Uptime		Dependent item	hadoop.nodemanager.uptime[{#HOSTNAME}] Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
Hadoop NodeManager {#HOSTNAME}: Get raw info		Dependent item	hadoop.nodemanager.raw_info[{#HOSTNAME}] Preprocessing JSON Path: `$.[?(@.HostName=='{#HOSTNAME}')].first()` ⛔️Custom on fail: Discard value
{#HOSTNAME}: State	State of the node - valid values are: NEW, RUNNING, UNHEALTHY, DECOMMISSIONING, DECOMMISSIONED, LOST, REBOOTED, SHUTDOWN.	Dependent item	hadoop.nodemanager.state[{#HOSTNAME}] Preprocessing JSON Path: `$.State` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Version		Dependent item	hadoop.nodemanager.version[{#HOSTNAME}] Preprocessing JSON Path: `$.NodeManagerVersion` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Number of containers		Dependent item	hadoop.nodemanager.numcontainers[{#HOSTNAME}] Preprocessing JSON Path: `$.NumContainers`
{#HOSTNAME}: Used memory		Dependent item	hadoop.nodemanager.usedmemory[{#HOSTNAME}] Preprocessing JSON Path: `$.UsedMemoryMB`
{#HOSTNAME}: Available memory		Dependent item	hadoop.nodemanager.availablememory[{#HOSTNAME}] Preprocessing JSON Path: `$.AvailableMemoryMB`

Trigger prototypes for Node manager discovery

Name	Description	Expression	Severity
Hadoop: {#HOSTNAME}: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m`\|Info	Manual close: Yes
Hadoop: {#HOSTNAME}: Failed to fetch NodeManager API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}],30m)=1`\|Warning	Manual close: Yes Depends on: Hadoop: {#HOSTNAME}: NodeManager has state {ITEM.VALUE}.
Hadoop: {#HOSTNAME}: NodeManager has state {ITEM.VALUE}.	The state is different from normal.	`last(/Hadoop by HTTP/hadoop.nodemanager.state[{#HOSTNAME}])<>"RUNNING"`\|Average

LLD rule Data node discovery

Name Description Type Key and additional info

Data node discovery

HTTP agent

hadoop.datanode.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Data node discovery

Name	Description	Type	Key and additional info
Hadoop DataNode {#HOSTNAME}: Get stats		HTTP agent	hadoop.datanode.get[{#HOSTNAME}]
{#HOSTNAME}: Remaining	Remaining disk space.	Dependent item	hadoop.datanode.remaining[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Used	Used disk space.	Dependent item	hadoop.datanode.dfs_used[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Number of failed volumes	Number of failed storage volumes.	Dependent item	hadoop.datanode.numfailedvolumes[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Threads	The number of JVM threads.	Dependent item	hadoop.datanode.jvm.threads[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Garbage collection time	The JVM garbage collection time in milliseconds.	Dependent item	hadoop.datanode.jvm.gc_time[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Heap usage	The JVM heap usage in MBytes.	Dependent item	hadoop.datanode.jvm.memheapused[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Uptime		Dependent item	hadoop.datanode.uptime[{#HOSTNAME}] Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
Hadoop DataNode {#HOSTNAME}: Get raw info		Dependent item	hadoop.datanode.raw_info[{#HOSTNAME}] Preprocessing JSON Path: `$.[?(@.HostName=='{#HOSTNAME}')].first()` ⛔️Custom on fail: Discard value
{#HOSTNAME}: Version	DataNode software version.	Dependent item	hadoop.datanode.version[{#HOSTNAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Admin state	Administrative state.	Dependent item	hadoop.datanode.admin_state[{#HOSTNAME}] Preprocessing JSON Path: `$.adminState` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Oper state	Operational state.	Dependent item	hadoop.datanode.oper_state[{#HOSTNAME}] Preprocessing JSON Path: `$.operState` Discard unchanged with heartbeat: `6h`

Trigger prototypes for Data node discovery

Name	Description	Expression	Severity
Hadoop: {#HOSTNAME}: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m`\|Info	Manual close: Yes
Hadoop: {#HOSTNAME}: Failed to fetch DataNode API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}],30m)=1`\|Warning	Manual close: Yes Depends on: Hadoop: {#HOSTNAME}: DataNode has state {ITEM.VALUE}.
Hadoop: {#HOSTNAME}: DataNode has state {ITEM.VALUE}.	The state is different from normal.	`last(/Hadoop by HTTP/hadoop.datanode.oper_state[{#HOSTNAME}])<>"Live"`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_gitlab_http

View README Download JSON

GitLab by HTTP

Overview

This template is designed to monitor GitLab by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template GitLab by HTTP — collects metrics by an HTTP agent from the GitLab /-/metrics endpoint. See https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GitLab 13.5.3 EE

Configuration

Setup

This template works with self-hosted GitLab instances. Internal service metrics are collected from the GitLab /-/metrics endpoint. To access metrics following two methods are available:

Explicitly allow monitoring instance IP address in gitlab whitelist configuration.
Get token from Gitlab Admin -> Monitoring -> Health check page: http://your.gitlab.address/admin/health_check; Use this token in macro {$GITLAB.HEALTH.TOKEN} as variable path, like: ?token=your_token. Remember to change the macros {$GITLAB.URL}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Gitlab instance version and configuration. See Gitlab's documentation for further information about its metric collection.

Macros used

Name	Description	Default
{$GITLAB.URL}	URL of a GitLab instance.	`http://localhost`
{$GITLAB.HEALTH.TOKEN}	The token path for Gitlab health check. Example `?token=your_token`
{$GITLAB.UNICORN.UTILIZATION.MAX.WARN}	The maximum percentage of Unicorn workers utilization for a trigger expression.	`90`
{$GITLAB.PUMA.UTILIZATION.MAX.WARN}	The maximum percentage of Puma thread utilization for a trigger expression.	`90`
{$GITLAB.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures for a trigger expression.	`2`
{$GITLAB.REDIS.FAIL.MAX.WARN}	The maximum number of Redis client exceptions for a trigger expression.	`2`
{$GITLAB.UNICORN.QUEUE.MAX.WARN}	The maximum number of Unicorn queued requests for a trigger expression.	`1`
{$GITLAB.PUMA.QUEUE.MAX.WARN}	The maximum number of Puma queued requests for a trigger expression.	`1`
{$GITLAB.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors for a trigger expression.	`90`

Items

Name	Description	Type	Key and additional info
Get instance metrics		HTTP agent	gitlab.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value Prometheus to JSON
Instance readiness check	The readiness probe checks whether the GitLab instance is ready to accept traffic via Rails Controllers.	HTTP agent	gitlab.readiness Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"master_check":[{"status":"failed"}]}` JSON Path: `$.master_check[0].status` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `30m`
Application server status	Checks whether the application server is running. This probe is used to know if Rails Controllers are not deadlocked due to a multi-threading.	HTTP agent	gitlab.liveness Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Set value to: `{"status": "failed"}` JSON Path: `$.status` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `30m`
Version	Version of the GitLab instance.	Dependent item	gitlab.deployments.version Preprocessing JSON Path: `$[?(@.name=="deployments")].labels.version.first()` Discard unchanged with heartbeat: `3h`
Ruby: First process start time	Minimum UNIX timestamp of ruby processes start time.	Dependent item	gitlab.ruby.processstarttime_seconds.first Preprocessing JSON Path: `$[?(@.name=="ruby_process_start_time_seconds")].value.min()` Discard unchanged with heartbeat: `3h`
Ruby: Last process start time	Maximum UNIX timestamp ruby processes start time.	Dependent item	gitlab.ruby.processstarttime_seconds.last Preprocessing JSON Path: `$[?(@.name=="ruby_process_start_time_seconds")].value.max()` Discard unchanged with heartbeat: `3h`
User logins, total	Counter of how many users have logged in since GitLab was started or restarted.	Dependent item	gitlab.usersessionlogins_total Preprocessing JSON Path: `$[?(@.name=="user_session_logins_total")].value.first()` ⛔️Custom on fail: Discard value
User CAPTCHA logins failed, total	Counter of failed CAPTCHA attempts during login.	Dependent item	gitlab.failedlogincaptcha_total Preprocessing JSON Path: `$[?(@.name=="failed_login_captcha_total")].value.first()` ⛔️Custom on fail: Discard value
User CAPTCHA logins, total	Counter of successful CAPTCHA attempts during login.	Dependent item	gitlab.successfullogincaptcha_total Preprocessing JSON Path: `$[?(@.name=="successful_login_captcha_total")].value.first()` ⛔️Custom on fail: Discard value
Upload file does not exist	Number of times an upload record could not find its file.	Dependent item	gitlab.uploadfiledoesnotexist Preprocessing JSON Path: `$[?(@.name=="upload_file_does_not_exist")].value.first()` ⛔️Custom on fail: Discard value
Pipelines: Processing events, total	Total amount of pipeline processing events.	Dependent item	gitlab.pipeline.processingeventstotal Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Pipelines: Created, total	Counter of pipelines created.	Dependent item	gitlab.pipeline.created_total Preprocessing JSON Path: `$[?(@.name=="pipelines_created_total")].value.sum()` ⛔️Custom on fail: Discard value
Pipelines: Auto DevOps pipelines, total	Counter of completed Auto DevOps pipelines.	Dependent item	gitlab.pipeline.autodevopscompleted.total Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Pipelines: Auto DevOps pipelines, failed	Counter of completed Auto DevOps pipelines with status "failed".	Dependent item	gitlab.pipeline.autodevopscompleted_total.failed Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Pipelines: CI/CD creation duration	The sum of the time in seconds it takes to create a CI/CD pipeline.	Dependent item	gitlab.pipeline.pipeline_creation Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Pipelines: Pipelines: CI/CD creation count	The count of the time it takes to create a CI/CD pipeline.	Dependent item	gitlab.pipeline.pipeline_creation.count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Database: Connection pool, busy	Connections to the main database in use where the owner is still alive.	Dependent item	gitlab.database.connectionpoolbusy Preprocessing JSON Path: `The text is too long. Please see the template.`
Database: Connection pool, current	Current connections to the main database in the pool.	Dependent item	gitlab.database.connectionpoolconnections Preprocessing JSON Path: `The text is too long. Please see the template.`
Database: Connection pool, dead	Connections to the main database in use where the owner is not alive.	Dependent item	gitlab.database.connectionpooldead Preprocessing JSON Path: `The text is too long. Please see the template.`
Database: Connection pool, idle	Connections to the main database not in use.	Dependent item	gitlab.database.connectionpoolidle Preprocessing JSON Path: `The text is too long. Please see the template.`
Database: Connection pool, size	Total connection to the main database pool capacity.	Dependent item	gitlab.database.connectionpoolsize Preprocessing JSON Path: `The text is too long. Please see the template.`
Database: Connection pool, waiting	Threads currently waiting on this queue.	Dependent item	gitlab.database.connectionpoolwaiting Preprocessing JSON Path: `The text is too long. Please see the template.`
Redis: Client requests rate, queues	Number of Redis client requests per second. (Instance: queues)	Dependent item	gitlab.redis.client_requests.queues.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Redis: Client requests rate, cache	Number of Redis client requests per second. (Instance: cache)	Dependent item	gitlab.redis.client_requests.cache.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Redis: Client requests rate, shared_state	Number of Redis client requests per second. (Instance: shared_state)	Dependent item	gitlab.redis.clientrequests.sharedstate.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Redis: Client exceptions rate, queues	Number of Redis client exceptions per second. (Instance: queues)	Dependent item	gitlab.redis.client_exceptions.queues.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Redis: Client exceptions rate, cache	Number of Redis client exceptions per second. (Instance: cache)	Dependent item	gitlab.redis.client_exceptions.cache.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Redis: client exceptions rate, shared_state	Number of Redis client exceptions per second. (Instance: shared_state)	Dependent item	gitlab.redis.clientexceptions.sharedstate.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Cache: Misses rate, total	The cache read miss count.	Dependent item	gitlab.cache.misses_total.rate Preprocessing JSON Path: `$[?(@.name=="gitlab_cache_misses_total")].value.sum()` Change per second
Cache: Operations rate, total	The count of cache operations.	Dependent item	gitlab.cache.operations_total.rate Preprocessing JSON Path: `$[?(@.name=="gitlab_cache_operations_total")].value.sum()` Change per second
Ruby: CPU usage per second	Average CPU time util in seconds.	Dependent item	gitlab.ruby.processcpuseconds.rate Preprocessing JSON Path: `$[?(@.name=="ruby_process_cpu_seconds_total")].value.avg()` ⛔️Custom on fail: Discard value Change per second
Ruby: Running_threads	Number of running Ruby threads.	Dependent item	gitlab.ruby.threads_running Preprocessing JSON Path: `The text is too long. Please see the template.`
Ruby: File descriptors opened, avg	Average number of opened file descriptors.	Dependent item	gitlab.ruby.file_descriptors.avg Preprocessing JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.avg()`
Ruby: File descriptors opened, max	Maximum number of opened file descriptors.	Dependent item	gitlab.ruby.file_descriptors.max Preprocessing JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.max()`
Ruby: File descriptors opened, min	Minimum number of opened file descriptors.	Dependent item	gitlab.ruby.file_descriptors.min Preprocessing JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.min()`
Ruby: File descriptors, max	Maximum number of open file descriptors per process.	Dependent item	gitlab.ruby.processmaxfds Preprocessing JSON Path: `$[?(@.name=="ruby_process_max_fds")].value.avg()`
Ruby: RSS memory, avg	Average RSS Memory usage in bytes.	Dependent item	gitlab.ruby.processresidentmemory_bytes.avg Preprocessing JSON Path: `The text is too long. Please see the template.`
Ruby: RSS memory, min	Minimum RSS Memory usage in bytes.	Dependent item	gitlab.ruby.processresidentmemory_bytes.min Preprocessing JSON Path: `The text is too long. Please see the template.`
Ruby: RSS memory, max	Maximum RSS Memory usage in bytes.	Dependent item	gitlab.ruby.processresidentmemory_bytes.max Preprocessing JSON Path: `The text is too long. Please see the template.`
HTTP requests rate, total	Number of requests received into the system.	Dependent item	gitlab.http.requests.rate Preprocessing JSON Path: `$[?(@.name=="http_requests_total")].value.sum()` Change per second
HTTP requests rate, 5xx	Number of handle failures of requests with HTTP-code 5xx.	Dependent item	gitlab.http.requests.5xx.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
HTTP requests rate, 4xx	Number of handle failures of requests with code 4XX.	Dependent item	gitlab.http.requests.4xx.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Transactions per second	Transactions per second (gitlabtransaction* metrics).	Dependent item	gitlab.transactions.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity
GitLab: Gitlab instance is not able to accept traffic		`last(/GitLab by HTTP/gitlab.readiness)=0`\|High	Depends on: GitLab: Liveness check was failed
GitLab: Liveness check was failed	The application server is not running or Rails Controllers are deadlocked.	`last(/GitLab by HTTP/gitlab.liveness)=0`\|High
GitLab: Version has changed	The GitLab version has changed. Acknowledge to close the problem manually.	`last(/GitLab by HTTP/gitlab.deployments.version,#1)<>last(/GitLab by HTTP/gitlab.deployments.version,#2) and length(last(/GitLab by HTTP/gitlab.deployments.version))>0`\|Info	Manual close: Yes
GitLab: Too many Redis queues client exceptions	"Too many Redis client exceptions during the requests to Redis instance queues."	`min(/GitLab by HTTP/gitlab.redis.client_exceptions.queues.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`\|Warning
GitLab: Too many Redis cache client exceptions	"Too many Redis client exceptions during the requests to Redis instance cache."	`min(/GitLab by HTTP/gitlab.redis.client_exceptions.cache.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`\|Warning
GitLab: Too many Redis shared_state client exceptions	"Too many Redis client exceptions during the requests to Redis instance shared_state."	`min(/GitLab by HTTP/gitlab.redis.client_exceptions.shared_state.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`\|Warning
GitLab: Failed to fetch info data	Zabbix has not received a metrics data for the last 30 minutes	`nodata(/GitLab by HTTP/gitlab.ruby.threads_running,30m)=1`\|Warning	Manual close: Yes Depends on: GitLab: Liveness check was failed
GitLab: Current number of open files is too high		`min(/GitLab by HTTP/gitlab.ruby.file_descriptors.max,5m)/last(/GitLab by HTTP/gitlab.ruby.process_max_fds)*100>{$GITLAB.OPEN.FDS.MAX.WARN}`\|Warning
GitLab: Too many HTTP requests failures	"Too many requests failed on GitLab instance with 5xx HTTP code"	`min(/GitLab by HTTP/gitlab.http.requests.5xx.rate,5m)>{$GITLAB.HTTP.FAIL.MAX.WARN}`\|Warning

LLD rule Unicorn metrics discovery

Name Description Type Key and additional info

Unicorn metrics discovery

DiscoveryUnicorn specific metrics, when Unicorn is used.

HTTP agent

gitlab.unicorn.discovery

Preprocessing

Prometheus to JSON: unicorn_workers
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.

Item prototypes for Unicorn metrics discovery

Name Description Type Key and additional info

Unicorn: Workers

The number of Unicorn workers

Dependent item

gitlab.unicorn.unicorn_workers[{#SINGLETON}]

Preprocessing

JSON Path: $[?(@.name=='unicorn_workers')].value.sum()

Unicorn: Active connections

The number of active Unicorn connections.

Dependent item

gitlab.unicorn.active_connections[{#SINGLETON}]

Preprocessing

JSON Path: $[?(@.name=='unicorn_active_connections')].value.sum()

Unicorn: Queued connections

The number of queued Unicorn connections.

Dependent item

gitlab.unicorn.queued_connections[{#SINGLETON}]

Preprocessing

JSON Path: $[?(@.name=='unicorn_queued_connections')].value.sum()

Trigger prototypes for Unicorn metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
GitLab: Unicorn worker utilization is too high		`min(/GitLab by HTTP/gitlab.unicorn.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.unicorn.unicorn_workers[{#SINGLETON}])*100>{$GITLAB.UNICORN.UTILIZATION.MAX.WARN}`\|Warning
GitLab: Unicorn is queueing requests		`min(/GitLab by HTTP/gitlab.unicorn.queued_connections[{#SINGLETON}],5m)>{$GITLAB.UNICORN.QUEUE.MAX.WARN}`\|Warning

LLD rule Puma metrics discovery

Name Description Type Key and additional info

Puma metrics discovery

Discovery of Puma specific metrics when Puma is used.

HTTP agent

gitlab.puma.discovery

Preprocessing

Prometheus to JSON: puma_workers
JavaScript: The text is too long. Please see the template.

Item prototypes for Puma metrics discovery

Name	Description	Type	Key and additional info
Active connections	Number of puma threads processing a request.	Dependent item	gitlab.puma.active_connections[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_active_connections')].value.sum()`
Workers	Total number of puma workers.	Dependent item	gitlab.puma.workers[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_workers')].value.sum()`
Running workers	The number of booted puma workers.	Dependent item	gitlab.puma.running_workers[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_running_workers')].value.sum()`
Stale workers	The number of old puma workers.	Dependent item	gitlab.puma.stale_workers[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_stale_workers')].value.sum()`
Running threads	The number of running puma threads.	Dependent item	gitlab.puma.running[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_running')].value.sum()`
Queued connections	The number of connections in that puma worker's "todo" set waiting for a worker thread.	Dependent item	gitlab.puma.queued_connections[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_queued_connections')].value.sum()`
Pool capacity	The number of requests the puma worker is capable of taking right now.	Dependent item	gitlab.puma.pool_capacity[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_pool_capacity')].value.sum()`
Max threads	The maximum number of puma worker threads.	Dependent item	gitlab.puma.max_threads[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_max_threads')].value.sum()`
Idle threads	The number of spawned puma threads which are not processing a request.	Dependent item	gitlab.puma.idle_threads[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_idle_threads')].value.sum()`
Killer terminations, total	The number of workers terminated by PumaWorkerKiller.	Dependent item	gitlab.puma.killerterminationstotal[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_killer_terminations_total')].value.sum()` ⛔️Custom on fail: Discard value

Trigger prototypes for Puma metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
GitLab: Puma instance thread utilization is too high		`min(/GitLab by HTTP/gitlab.puma.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.puma.max_threads[{#SINGLETON}])*100>{$GITLAB.PUMA.UTILIZATION.MAX.WARN}`\|Warning
GitLab: Puma is queueing requests		`min(/GitLab by HTTP/gitlab.puma.queued_connections[{#SINGLETON}],15m)>{$GITLAB.PUMA.QUEUE.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_github_http

View README Download JSON

GitHub repository by HTTP

Overview

This template is designed for the effortless deployment of GitHub repository monitoring by Zabbix via GitHub REST API and doesn't require any external scripts.

For more details about GitHub REST API, refer to the official documentation.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GitHub API version 2022-11-28

Configuration

Setup

GitHub limits the number of REST API requests that you can make within a specific amount of time, which also depends on whether you are authenticated or not, the plan, and the token type used. Many REST API endpoints require authentication or return additional information if you are authenticated. Additionally, you can make more requests per hour when you are authenticated.

Additional information is available in the official documentation:

Create an access token for monitoring

One of the simplest ways to send authenticated requests is to use a personal access token - either a classic or a fine-grained one.

Classic personal access token

You can create a new classic personal access token by following the instructions in the official documentation.

For public repositories, no additional permission scopes are required. For monitoring to work on private repositories, the repo scope must be set to have full control of private repositories.

Additional information about OAuth scopes is available in the official documentation.

Note that authenticated users must have admin access to the repository and the repo scope must be set to get information about self-hosted runners.

Fine-grained personal access token

Alternatively, you can use a fine-grained personal access token.

In order to use fine-grained tokens to monitor organization-owned repositories, organizations must opt in to fine-grained personal access tokens and set up a personal access token policy.

The fine-grained token needs to have the following permissions set to provide access to the repository resources:

"Actions" repository permissions (read);
"Administration" repository permissions (read);
"Contents" repository permissions (read);
"Issues" repository permissions (read);
"Metadata" repository permissions (read);
"Pull requests" repository permissions (read).

Set the access token that you've created in step 1 in the {$GITHUB.API.TOKEN} macro
Change the API URL in the {$GITHUB.API.URL} macro if needed (for self-hosted installations)
Set the repository owner name in the {$GITHUB.REPO.OWNER} macro
Set the repository name in the {$GITHUB.REPO.NAME} macro
Set the LLD rule filters if needed (you may want to use them to stay within rate limits as on large repositories, LLD rules may generate a lot of script items):

Filter repository branches by name: {$GITHUB.BRANCH.NAME.MATCHES}, {$GITHUB.BRANCH.NAME.NOT_MATCHES};
Filter repository workflows by name: {$GITHUB.WORKFLOW.NAME.MATCHES}, {$GITHUB.WORKFLOW.NAME.NOT_MATCHES};
Filter repository workflows by state: {$GITHUB.WORKFLOW.STATE.MATCHES}, {$GITHUB.WORKFLOW.STATE.NOT_MATCHES};
Filter self-hosted runners by name: {$GITHUB.RUNNER.NAME.MATCHES}, {$GITHUB.RUNNER.NAME.NOT_MATCHES};
Filter self-hosted runners by OS: {$GITHUB.RUNNER.OS.MATCHES}, {$GITHUB.RUNNER.OS.NOT_MATCHES}.

Note: Update intervals and timeouts for script items can be changed individually via {$GITHUB.INTERVAL} and {$GITHUB.TIMEOUT} macros with context. Depending on the repository being monitored, it can be adjusted if needed (if you are exceeding rate limits, you can increase update intervals for some script items to stay within per hour request limits). But be aware that it may also affect the triggers (check whether the item is used in triggers and adjust thresholds and/or evaluation periods if needed).

Macros used

Name	Description	Default
{$GITHUB.API.URL}	Set the API URL here.	`https://api.github.com/`
{$GITHUB.USER_AGENT}	The user agent that is used in headers for HTTP requests.	`Zabbix/7.0`
{$GITHUB.API_VERSION}	The API version that is used in headers for HTTP requests.	`2022-11-28`
{$GITHUB.REPO.OWNER}	Set the repository owner here.	`<SET THE REPO OWNER>`
{$GITHUB.REPO.NAME}	Set the repository name here.	`<SET THE REPO NAME>`
{$GITHUB.API.TOKEN}	Set the access token here.
{$GITHUB.INTERVAL}	The update interval for the script items that retrieve data from the API. Can be used with context if needed (check the context values in relevant items).	`1m`
{$GITHUB.INTERVAL:regex:"get(tags\|releases\|issues)count"}	The update interval for the script items that retrieve the number of tags, releases, issues, and pull requests (total, open, closed).	`1h`
{$GITHUB.INTERVAL:"get_repo"}	The update interval for the script item that retrieves the repository information.	`15m`
{$GITHUB.INTERVAL:"get_(branches\|workflows)"}	The update interval for the script items that retrieve the branches and workflows. Used only for related metric discovery.	`1h`
{$GITHUB.INTERVAL:"get_runners"}	The update interval for the script item that retrieves the information about self-hosted runners.	`15m`
{$GITHUB.INTERVAL:regex:"getlastrun:.+"}	The update interval for the script items that retrieve the information about the last workflow run results.	`15m`
{$GITHUB.INTERVAL:regex:"getcommitscount:.+"}	The update interval for the script items that retrieve the commits count in discovered branches.	`1h`
{$GITHUB.TIMEOUT}	The timeout threshold for the script items that retrieve data from the API. Can be used with context if needed (check the context values in relevant items).	`15s`
{$GITHUB.HTTP_PROXY}	The HTTP proxy for script items (set if needed). If the macro is empty, then no proxy is used.
{$GITHUB.RESULTSPERPAGE}	The number of results to fetch per page. Can be used with context and adjusted if needed (check the context values in script parameters of relevant items).	`100`
{$GITHUB.WORKFLOW.NAME.MATCHES}	The repository workflow name regex filter to use in workflow-related metric discovery - for including.	`.+`
{$GITHUB.WORKFLOW.NAME.NOT_MATCHES}	The repository workflow name regex filter to use in workflow-related metric discovery - for excluding.	`CHANGE_IF_NEEDED`
{$GITHUB.WORKFLOW.STATE.MATCHES}	The repository workflow state regex filter to use in workflow-related metric discovery - for including.	`active`
{$GITHUB.WORKFLOW.STATE.NOT_MATCHES}	The repository workflow state regex filter to use in workflow-related metric discovery - for excluding.	`CHANGE_IF_NEEDED`
{$GITHUB.BRANCH.NAME.MATCHES}	The repository branch name regex filter to use in branch-related metric discovery - for including.	`.+`
{$GITHUB.BRANCH.NAME.NOT_MATCHES}	The repository branch name regex filter to use in branch-related metric discovery - for excluding.	`CHANGE_IF_NEEDED`
{$GITHUB.RUNNER.NAME.MATCHES}	The repository self-hosted runner name regex filter to use in discovering metrics related to the self-hosted runner - for including.	`.+`
{$GITHUB.RUNNER.NAME.NOT_MATCHES}	The repository self-hosted runner name regex filter to use in discovering metrics related to the self-hosted runner - for excluding.	`CHANGE_IF_NEEDED`
{$GITHUB.RUNNER.OS.MATCHES}	The repository self-hosted runner OS regex filter to use in discovering metrics related to the self-hosted runner - for including.	`.+`
{$GITHUB.RUNNER.OS.NOT_MATCHES}	The repository self-hosted runner OS regex filter to use in discovering metrics related to the self-hosted runner - for excluding.	`CHANGE_IF_NEEDED`
{$GITHUB.REQUESTS.UTIL.WARN}	The threshold percentage of utilized API requests in a Warning trigger expression.	`80`
{$GITHUB.REQUESTS.UTIL.HIGH}	The threshold percentage of utilized API requests in a High trigger expression.	`90`
{$GITHUB.WORKFLOW.STATUS.QUEUED.THRESH}	The time threshold used in the trigger of a workflow run that has been in the queue for too long. Can be used with context if needed.	`1h`
{$GITHUB.WORKFLOW.STATUS.IN_PROGRESS.THRESH}	The time threshold used in the trigger of a workflow run that has been in the queue for too long. Can be used with context if needed.	`24h`

Items

Name	Description	Type	Key and additional info
Get self-hosted runners	Get the self-hosted runners of the repository. Note that admin access to the repository is required to use this endpoint: https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#list-self-hosted-runners-for-a-repository	Script	github.repo.runners.get Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value
Get self-hosted runner check	Carry out a self-hosted runners data collection check.	Dependent item	github.repo.runners.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Number of releases	The number of releases in the repository. Note that this number also includes draft releases. Information about endpoint: https://docs.github.com/en/rest/releases/releases?apiVersion=2022-11-28#list-releases	Script	github.repo.releases.count Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `6h`
Number of tags	The number of tags in the repository. Information about endpoint: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-repository-tags	Script	github.repo.tags.count Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `6h`
Get issue count	Get the count of issues and pull requests in the repository (total, open, closed). Information about endpoint for issues: https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues Information about endpoint for pull requests: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests	Script	github.repo.issues.get Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value
Number of issues	The total number of issues in the repository.	Dependent item	github.repo.issues.total Preprocessing JSON Path: `$.issues.total`
Number of open issues	The number of open issues in the repository.	Dependent item	github.repo.issues.open Preprocessing JSON Path: `$.issues.open`
Number of closed issues	The number of closed issues in the repository.	Dependent item	github.repo.issues.closed Preprocessing JSON Path: `$.issues.closed`
Number of PRs	The total number of pull requests in the repository.	Dependent item	github.repo.pr.total Preprocessing JSON Path: `$.pr.total`
Number of open PRs	The number of open pull requests in the repository.	Dependent item	github.repo.pr.open Preprocessing JSON Path: `$.pr.open`
Number of closed PRs	The number of closed pull requests in the repository.	Dependent item	github.repo.pr.closed Preprocessing JSON Path: `$.pr.closed`
Request limit	API request limit. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28	Dependent item	github.repo.requests.limit Preprocessing JSON Path: `$.headers['x-ratelimit-limit']` Discard unchanged with heartbeat: `3h`
Requests used	The number of used API requests. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28	Dependent item	github.repo.requests.used Preprocessing JSON Path: `$.headers['x-ratelimit-used']`
Request limit utilization, in %	The calculated utilization of the API request limit in %. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28	Dependent item	github.repo.requests.util Preprocessing JavaScript: `The text is too long. Please see the template.`
Get repository	Get the general repository information. If the repository is not a fork, the community profile metrics are also retrieved. Information about endpoint: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#get-a-repository Information about community profile metrics: https://docs.github.com/en/rest/metrics/community?apiVersion=2022-11-28#get-community-profile-metrics	Script	github.repo.repository.get
Get repository data check	Data collection check.	Dependent item	github.repo.repository.get.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`3h`
Repository is a fork	Indicates whether the repository is a fork.	Dependent item	github.repo.repository.is_fork Preprocessing JSON Path: `$.data..fork.first()` Boolean to decimal Discard unchanged with heartbeat: `6h`
Repository size	The size of the repository.	Dependent item	github.repo.repository.size Preprocessing JSON Path: `$.data..size.first()` Custom multiplier: `1024` Discard unchanged with heartbeat: `3h`
Repository stargazers	The number of GitHub users who have starred the repository.	Dependent item	github.repo.repository.stargazers Preprocessing JSON Path: `$.data..stargazers_count.first()` Discard unchanged with heartbeat: `3h`
Repository watchers	The number of GitHub users who are subscribed to the repository.	Dependent item	github.repo.repository.watchers Preprocessing JSON Path: `$.data..subscribers_count.first()` Discard unchanged with heartbeat: `3h`
Repository forks	The number of repository forks.	Dependent item	github.repo.repository.forks.count Preprocessing JSON Path: `$.data..forks_count.first()` Discard unchanged with heartbeat: `3h`
Get workflows	Get the repository workflows. Information about endpoint: https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#list-repository-workflows	Script	github.repo.workflows.get Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value
Get branches	Get the repository branches. Information about endpoint: https://docs.github.com/en/rest/branches/branches?apiVersion=2022-11-28#list-branches	Script	github.repo.branches.get Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
GitHub: No access to repository self-hosted runners	Admin access to the repository is required to use this endpoint: https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#list-self-hosted-runners-for-a-repository	`find(/GitHub repository by HTTP/github.repo.runners.get.check,,"iregexp","Must have admin rights to Repository")=1`\|Average
GitHub: The total number of issues has increased	The total number of issues has increased which means that either a new issue (or multiple) has been opened.	`last(/GitHub repository by HTTP/github.repo.issues.total)>last(/GitHub repository by HTTP/github.repo.issues.total,#2)`\|Warning
GitHub: The total number of PRs has increased	The total number of pull requests has increased which means that either a new pull request (or multiple) has been opened.	`last(/GitHub repository by HTTP/github.repo.pr.total)>last(/GitHub repository by HTTP/github.repo.pr.total,#2)`\|Info
GitHub: API request limit utilization is high	The API request limit utilization is high. It can be lowered by increasing the update intervals for script items (by setting up higher values in corresponding context macros). The trigger will be resolved automatically if the limit usage drops 5% below the trigger threshold. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28	`max(/GitHub repository by HTTP/github.repo.requests.util,1h)>{$GITHUB.REQUESTS.UTIL.WARN}`\|Warning	Depends on: GitHub: API request limit utilization is very high
GitHub: API request limit utilization is very high	The API request limit utilization is very high. It can be lowered by increasing the update intervals for script items (by setting up higher values in corresponding context macros). The trigger will be resolved automatically if the limit usage drops 5% below the trigger threshold. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28	`max(/GitHub repository by HTTP/github.repo.requests.util,1h)>{$GITHUB.REQUESTS.UTIL.HIGH}`\|Average
GitHub: There are errors in requests to API	Errors have been received in response to API requests. Check the latest values for details.	`length(last(/GitHub repository by HTTP/github.repo.repository.get.check))>0`\|Average

LLD rule Workflow discovery

Name Description Type Key and additional info

Workflow discovery

Discovers repository workflows. By default, only the active workflows are discovered.

Information about endpoint:

https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#list-repository-workflows

Dependent item

github.repo.workflows.discovery

Preprocessing

JSON Path: $.data

Item prototypes for Workflow discovery

Name	Description	Type	Key and additional info
Workflow [{#WORKFLOW_NAME}]: Get last run	Get the data about the last workflow run. Information about endpoint: https://docs.github.com/en/rest/actions/workflow-runs?apiVersion=2022-11-28#list-workflow-runs-for-a-workflow	Script	github.repo.workflows.lastrun.get[{#WORKFLOWNAME}] Preprocessing Check for error using a regular expression: `API rate limit exceeded<br>\0` ⛔️Custom on fail: Discard value JSON Path: `$.data.first()`
Workflow [{#WORKFLOW_NAME}]: Last run status	The status of the last workflow run. Possible values: 0 - queued 1 - in_progress 2 - completed 10 - unknown	Dependent item	github.repo.workflows.lastrun.status[{#WORKFLOWNAME}] Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.`
Workflow [{#WORKFLOW_NAME}]: Last run conclusion	The conclusion of the last workflow run. Possible values: 0 - success 1 - failure 2 - neutral 3 - cancelled 4 - skipped 5 - timedout 6 - actionrequired 10 - unknown	Dependent item	github.repo.workflows.lastrun.conclusion[{#WORKFLOWNAME}] Preprocessing JSON Path: `$.conclusion` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Workflow [{#WORKFLOW_NAME}]: Last run start date	The date when the last workflow run was started.	Dependent item	github.repo.workflows.lastrun.startdate[{#WORKFLOW_NAME}] Preprocessing JSON Path: `$.run_started_at` JavaScript: `return Math.floor(new Date(value) / 1000);` Discard unchanged with heartbeat: `3h`
Workflow [{#WORKFLOW_NAME}]: Last run update date	The date when the last workflow run was updated.	Dependent item	github.repo.workflows.lastrun.updatedate[{#WORKFLOW_NAME}] Preprocessing JSON Path: `$.updated_at` JavaScript: `return Math.floor(new Date(value) / 1000);` Discard unchanged with heartbeat: `3h`
Workflow [{#WORKFLOW_NAME}]: Last run duration	The duration of the last workflow run.	Dependent item	github.repo.workflows.lastrun.duration[{#WORKFLOWNAME}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Workflow discovery

Name	Description	Expression
GitHub: Workflow [{#WORKFLOW_NAME}]: The workflow has been in the queue for too long	The last workflow run has been in the "queued" status for too long. This may mean that it has failed to be assigned to a runner. The default threshold is provided as an example and can be adjusted for relevant workflows with context macros.	`last(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}])=0 and changecount(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}],{$GITHUB.WORKFLOW.STATUS.QUEUED.THRESH:"workflow_queued:{#WORKFLOW_NAME}"})=0`\|Warning
GitHub: Workflow [{#WORKFLOW_NAME}]: The workflow has been in progress for too long	The last workflow run has been in the "in_progress" status for too long. The default threshold is provided as an example and can be adjusted for relevant workflows with context macros.	`last(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}])=1 and changecount(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}],{$GITHUB.WORKFLOW.STATUS.IN_PROGRESS.THRESH:"workflow_in_progress:{#WORKFLOW_NAME}"})=0`\|Warning
GitHub: Workflow [{#WORKFLOW_NAME}]: The workflow has failed	The last workflow run has returned a "failure" conclusion.	`last(/GitHub repository by HTTP/github.repo.workflows.last_run.conclusion[{#WORKFLOW_NAME}])=1`\|Warning

LLD rule Branch discovery

Name Description Type Key and additional info

Branch discovery

Discovers repository branches.

Information about endpoint:

https://docs.github.com/en/rest/branches/branches?apiVersion=2022-11-28#list-branches

Dependent item

github.repo.branches.discovery

Preprocessing

JSON Path: $.data

Item prototypes for Branch discovery

Name Description Type Key and additional info

Branch [{#BRANCH_NAME}]: Number of commits

Get the number of commits in the branch.

Information about endpoint:

https://docs.github.com/en/rest/commits/commits?apiVersion=2022-11-28#list-commits

Script

github.repo.branches.commits.total[{#BRANCH_NAME}]

Preprocessing

Check for error using a regular expression: API rate limit exceeded<br>\0
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 6h

LLD rule Self-hosted runner discovery

Name Description Type Key and additional info

Self-hosted runner discovery

Discovers self-hosted runners of the repository.

Note that admin access to the repository is required to use this endpoint:

https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#list-self-hosted-runners-for-a-repository

Dependent item

github.repo.runners.discovery

Preprocessing

JSON Path: $.data
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Self-hosted runner discovery

Name Description Type Key and additional info

Runner [{#RUNNER_NAME}]: Busy

Indicates whether the runner is currently executing a job.

Dependent item

github.repo.runners.busy[{#RUNNER_NAME}]

Preprocessing

JSON Path: $.data[?(@.id == "{#RUNNER_ID}")].busy.first()
Boolean to decimal
Discard unchanged with heartbeat: 1h

Runner [{#RUNNER_NAME}]: Online

Indicates whether the runner is connected to GitHub and is ready to execute jobs.

Dependent item

github.repo.runners.online[{#RUNNER_NAME}]

Preprocessing

JSON Path: $.data[?(@.id == "{#RUNNER_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Self-hosted runner discovery

Name	Description	Expression	Severity	Dependencies and additional info
GitHub: Runner [{#RUNNER_NAME}]: The runner has become offline	The runner was online previously, but is currently not connected to GitHub. This could be because the machine is offline, the self-hosted runner application is not running on the machine, or the self-hosted runner application cannot communicate with GitHub.	`last(/GitHub repository by HTTP/github.repo.runners.online[{#RUNNER_NAME}],#2)=1 and last(/GitHub repository by HTTP/github.repo.runners.online[{#RUNNER_NAME}])=0`\|Warning

LLD rule Discovery of community profile metrics

Name Description Type Key and additional info

Discovery of community profile metrics

Discovers community profile metrics (the repository must not be a fork).

Information about community profile metrics:

https://docs.github.com/en/rest/metrics/community?apiVersion=2022-11-28#get-community-profile-metrics

Dependent item

github.repo.community_profile.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Discovery of community profile metrics

Name Description Type Key and additional info

Health percentage score

The health percentage score is defined as a percentage of how many of the recommended community health files are present.

For more information, see the documentation:

https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/about-community-profiles-for-public-repositories

Dependent item

github.repo.repository.health[{#SINGLETON}]

Preprocessing

JSON Path: $.data..zbx_community_profile.health_percentage.first()
Discard unchanged with heartbeat: 1h

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_generic_java_jmx

View README Download JSON

Generic Java JMX

Overview

Official JMX Template from Zabbix distribution. Could be useful for many Java Applications (JMX).

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Java Applications

Configuration

Setup

Refer to the vendor documentation.

Macros used

Name	Description	Default
{$JMX.NONHEAP.MEM.USAGE.MAX}	A threshold in percent for Non-heap memory utilization trigger.	`85`
{$JMX.NONHEAP.MEM.USAGE.TIME}	The time during which the Non-heap memory utilization may exceed the threshold.	`10m`
{$JMX.HEAP.MEM.USAGE.MAX}	A threshold in percent for Heap memory utilization trigger.	`85`
{$JMX.HEAP.MEM.USAGE.TIME}	The time during which the Heap memory utilization may exceed the threshold.	`10m`
{$JMX.MP.USAGE.MAX}	A threshold in percent for memory pools utilization trigger. Use a context to change the threshold for a specific pool.	`85`
{$JMX.MP.USAGE.TIME}	The time during which the memory pools utilization may exceed the threshold.	`10m`
{$JMX.FILE.DESCRIPTORS.MAX}	A threshold in percent for file descriptors count trigger.	`85`
{$JMX.FILE.DESCRIPTORS.TIME}	The time during which the file descriptors count may exceed the threshold.	`3m`
{$JMX.CPU.LOAD.MAX}	A threshold in percent for CPU utilization trigger.	`85`
{$JMX.CPU.LOAD.TIME}	The time during which the CPU utilization may exceed the threshold.	`5m`
{$JMX.MEM.POOL.NAME.MATCHES}	This macro used in memory pool discovery as a filter.	`Old Gen\|G1\|Perm Gen\|Code Cache\|Tenured Gen`
{$JMX.USER}	JMX username.
{$JMX.PASSWORD}	JMX password.

Items

Name	Description	Type	Key and additional info
ClassLoading: Loaded class count	Displays number of classes that are currently loaded in the Java virtual machine.	JMX agent	jmx["java.lang:type=ClassLoading","LoadedClassCount"] Preprocessing Discard unchanged with heartbeat: `10m`
ClassLoading: Total loaded class count	Displays the total number of classes that have been loaded since the Java virtual machine has started execution.	JMX agent	jmx["java.lang:type=ClassLoading","TotalLoadedClassCount"] Preprocessing Discard unchanged with heartbeat: `10m`
ClassLoading: Unloaded class count	Displays the total number of classes that have been loaded since the Java virtual machine has started execution.	JMX agent	jmx["java.lang:type=ClassLoading","UnloadedClassCount"] Preprocessing Discard unchanged with heartbeat: `10m`
Compilation: Name of the current JIT compiler	Displays the total number of classes unloaded since the Java virtual machine has started execution.	JMX agent	jmx["java.lang:type=Compilation","Name"] Preprocessing Discard unchanged with heartbeat: `30m`
Compilation: Accumulated time spent	Displays the approximate accumulated elapsed time spent in compilation, in seconds.	JMX agent	jmx["java.lang:type=Compilation","TotalCompilationTime"] Preprocessing Custom multiplier: `0.001` Discard unchanged with heartbeat: `10m`
Memory: Heap memory committed	Current heap memory allocated. This amount of memory is guaranteed for the Java virtual machine to use.	JMX agent	jmx["java.lang:type=Memory","HeapMemoryUsage.committed"]
Memory: Heap memory maximum size	Maximum amount of heap that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.	JMX agent	jmx["java.lang:type=Memory","HeapMemoryUsage.max"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Heap memory used	Current memory usage outside the heap.	JMX agent	jmx["java.lang:type=Memory","HeapMemoryUsage.used"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Non-Heap memory committed	Current memory allocated outside the heap. This amount of memory is guaranteed for the Java virtual machine to use.	JMX agent	jmx["java.lang:type=Memory","NonHeapMemoryUsage.committed"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Non-Heap memory maximum size	Maximum amount of non-heap memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.	JMX agent	jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Non-Heap memory used	Current memory usage outside the heap	JMX agent	jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Object pending finalization count	The approximate number of objects for which finalization is pending.	JMX agent	jmx["java.lang:type=Memory","ObjectPendingFinalizationCount"] Preprocessing Discard unchanged with heartbeat: `10m`
OperatingSystem: File descriptors maximum count	This is the number of file descriptors we can have opened in the same process, as determined by the operating system. You can never have more file descriptors than this number.	JMX agent	jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"] Preprocessing Discard unchanged with heartbeat: `10m`
OperatingSystem: File descriptors opened	This is the number of opened file descriptors at the moment, if this reaches the MaxFileDescriptorCount, the application will throw an IOException: Too many open files. This could mean you are opening file descriptors and never closing them.	JMX agent	jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"]
OperatingSystem: Process CPU Load	ProcessCpuLoad represents the CPU load in this process.	JMX agent	jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"] Preprocessing Custom multiplier: `100`
Runtime: JVM uptime		JMX agent	jmx["java.lang:type=Runtime","Uptime"] Preprocessing Custom multiplier: `0.001`
Runtime: JVM name		JMX agent	jmx["java.lang:type=Runtime","VmName"] Preprocessing Discard unchanged with heartbeat: `30m`
Runtime: JVM version		JMX agent	jmx["java.lang:type=Runtime","VmVersion"] Preprocessing Discard unchanged with heartbeat: `30m`
Threading: Daemon thread count	Number of daemon threads running.	JMX agent	jmx["java.lang:type=Threading","DaemonThreadCount"] Preprocessing Discard unchanged with heartbeat: `10m`
Threading: Peak thread count	Maximum number of threads being executed at the same time since the JVM was started or the peak was reset.	JMX agent	jmx["java.lang:type=Threading","PeakThreadCount"]
Threading: Thread count	The number of threads running at the current moment.	JMX agent	jmx["java.lang:type=Threading","ThreadCount"]
Threading: Total started thread count	The number of threads started since the JVM was launched.	JMX agent	jmx["java.lang:type=Threading","TotalStartedThreadCount"]

Triggers

Name	Expression	Severity
Generic Java JMX: Compilation: {HOST.NAME} uses suboptimal JIT compiler	`find(/Generic Java JMX/jmx["java.lang:type=Compilation","Name"],,"like","Client")=1`\|Info	Manual close: Yes
Generic Java JMX: Memory: Heap memory usage is high	`min(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.used"],{$JMX.HEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])*{$JMX.HEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])>0`\|Warning
Generic Java JMX: Memory: Non-Heap memory usage is high	`min(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"],{$JMX.NONHEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])*{$JMX.NONHEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])>0`\|Warning
Generic Java JMX: OperatingSystem: Opened file descriptor count is high	`min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"],{$JMX.FILE.DESCRIPTORS.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"])*{$JMX.FILE.DESCRIPTORS.MAX}/100)`\|Warning
Generic Java JMX: OperatingSystem: Process CPU Load is high	`min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"],{$JMX.CPU.LOAD.TIME})>{$JMX.CPU.LOAD.MAX}`\|Average
Generic Java JMX: Runtime: JVM is not reachable	`nodata(/Generic Java JMX/jmx["java.lang:type=Runtime","Uptime"],5m)=1`\|Average	Manual close: Yes
Generic Java JMX: Runtime: {HOST.NAME} runs suboptimal VM type	`find(/Generic Java JMX/jmx["java.lang:type=Runtime","VmName"],,"like","Server")<>1`\|Info	Manual close: Yes

LLD rule Garbage collector discovery

Name	Description	Type	Key and additional info
Garbage collector discovery	Garbage collectors metrics discovery.	JMX agent	jmx.discovery["beans","java.lang:name=*,type=GarbageCollector"]

Item prototypes for Garbage collector discovery

Name Description Type Key and additional info

GarbageCollector: {#JMXNAME} number of collections per second

Displays the total number of collections that have occurred per second.

JMX agent

jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionCount"]

Preprocessing

Change per second

GarbageCollector: {#JMXNAME} accumulated time spent in collection

Displays the approximate accumulated collection elapsed time, in seconds.

JMX agent

jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionTime"]

Preprocessing

Custom multiplier: 0.001
Discard unchanged with heartbeat: 10m

LLD rule Memory pool discovery

Name	Description	Type	Key and additional info
Memory pool discovery	Memory pools metrics discovery.	JMX agent	jmx.discovery["beans","java.lang:name=*,type=MemoryPool"]

Item prototypes for Memory pool discovery

Name Description Type Key and additional info

Memory pool: {#JMXNAME} committed

Current memory allocated.

JMX agent

jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.committed"]

Preprocessing

Discard unchanged with heartbeat: 10m

Memory pool: {#JMXNAME} maximum size

Maximum amount of memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.

JMX agent

jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"]

Preprocessing

Discard unchanged with heartbeat: 10m

Memory pool: {#JMXNAME} used

Current memory usage.

JMX agent

jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"]

Trigger prototypes for Memory pool discovery

Name	Description	Expression	Severity	Dependencies and additional info
Generic Java JMX: Memory pool: {#JMXNAME} memory usage is high		`min(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"],{$JMX.MP.USAGE.TIME:"{#JMXNAME}"})>(last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])*{$JMX.MP.USAGE.MAX:"{#JMXNAME}"}/100) and last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])>0`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_exchange_active

View README Download JSON

Microsoft Exchange Server 2016 by Zabbix agent active

Overview

Official Template for Microsoft Exchange Server 2016.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Exchange Server 2016 CU18

Configuration

Setup

Metrics are collected by Zabbix agent active.

1. Import the template into Zabbix.

2. Link the imported template to a host with MS Exchange.

Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent active" template.

Macros used

Name	Description	Default
{$MS.EXCHANGE.PERF.INTERVAL}	Update interval for perfcounteren items.	`60`
{$MS.EXCHANGE.DB.FAULTS.TIME}	The time during which the database page faults may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.FAULTS.WARN}	Threshold for database page faults trigger.	`0`
{$MS.EXCHANGE.LOG.STALLS.TIME}	The time during which the log records stalled may exceed the threshold.	`10m`
{$MS.EXCHANGE.LOG.STALLS.WARN}	Threshold for log records stalled trigger.	`100`
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME}	The time during which the active database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}	Threshold for active database read operations latency trigger.	`0.02`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	The time during which the active database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}	Threshold for active database write operations latency trigger.	`0.05`
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME}	The time during which the passive database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}	Threshold for passive database read operations latency trigger.	`0.2`
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	The time during which the passive database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.TIME}	The time during which the RPC requests latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.WARN}	Threshold for RPC requests latency trigger.	`0.05`
{$MS.EXCHANGE.RPC.COUNT.TIME}	The time during which the RPC total requests may exceed the threshold.	`5m`
{$MS.EXCHANGE.RPC.COUNT.WARN}	Threshold for LDAP triggers.	`70`
{$MS.EXCHANGE.LDAP.TIME}	The time during which the LDAP metrics may exceed the threshold.	`5m`
{$MS.EXCHANGE.LDAP.WARN}	Threshold for LDAP triggers.	`0.05`
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable.	`5m`

Items

Name	Description	Type	Key and additional info
Databases total mounted	Shows the number of active database copies on the server.	Zabbix agent (active)	perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing Discard unchanged with heartbeat: `3h`
ActiveSync: ping command pending	Shows the number of ping commands currently pending in the queue.	Zabbix agent (active)	perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}]
ActiveSync: requests per second	Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load.	Zabbix agent (active)	perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
ActiveSync: sync commands per second	Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder.	Zabbix agent (active)	perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Autodiscover: requests per second	Shows the number of Autodiscover service requests processed each second. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Availability Service: availability requests per second	Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring.	Zabbix agent (active)	perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}]
Outlook Web App: current unique users	Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}]
Outlook Web App: requests per second	Shows the number of requests handled by Outlook Web App per second. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MSExchangeWS: requests per second	Shows the number of requests processed each second. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Active agent availability	Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - unknown 1 - available 2 - not available	Zabbix internal	zabbix[host,active_agent,available]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
MS Exchange 2016: Active checks are not available	Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2`\|High

LLD rule Databases discovery

Name Description Type Key and additional info

Databases discovery

Discovery of Exchange databases.

Zabbix agent (active)

perf_instance.discovery["MSExchange Active Manager"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Databases discovery

Name	Description	Type	Key and additional info
Active Manager [{#INSTANCE}]: Database copy role	Database copy active or passive role.	Zabbix agent (active)	perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing Discard unchanged with heartbeat: `3h`
Information Store [{#INSTANCE}]: Database state	Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing Discard unchanged with heartbeat: `3m`
Information Store [{#INSTANCE}]: Active mailboxes count	Number of active mailboxes in this database.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"]
Information Store [{#INSTANCE}]: Page faults per second	Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high.	Zabbix agent (active)	perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log records stalled	Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second.	Zabbix agent (active)	perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log threads waiting	Indicates the number of threads waiting to complete an update of the database by writing their data to the log.	Zabbix agent (active)	perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests per second	Shows the number of RPC operations per second for each database instance.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests latency	RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Information Store [{#INSTANCE}]: RPC requests total	Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations per second	Shows the number of database read operations.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations latency	Shows the average length of time per database read operation. Should be less than 20 ms on average.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database read operations latency	Shows the average length of time per passive database read operation. Should be less than 200ms on average.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Active database write operations per second	Shows the number of database write operations per second for each attached database instance.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database write operations latency	Shows the average length of time per database write operation. Should be less than 50ms on average.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database write operations latency	Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`

Trigger prototypes for Databases discovery

Name	Description	Expression
MS Exchange 2016: Information Store [{#INSTANCE}]: Page faults is too high	Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN}`\|Average
MS Exchange 2016: Information Store [{#INSTANCE}]: Log records stalls is too high	Stalled log records too high. The average value should be less than 10 threads waiting.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN}`\|Average
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests latency is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN}`\|Warning
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests total count is too high	Should be below 70 at all times.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 20ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 200ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	Should be less than 50ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})`\|Warning

LLD rule Web services discovery

Name	Description	Type	Key and additional info
Web services discovery	Discovery of Exchange web services.	Zabbix agent (active)	perfinstanceen.discovery["Web Service"]

Item prototypes for Web services discovery

Name	Description	Type	Key and additional info
Web Service [{#INSTANCE}]: Current connections	Shows the current number of connections established to the each Web Service.	Zabbix agent (active)	perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}]

LLD rule LDAP discovery

Name	Description	Type	Key and additional info
LDAP discovery	Discovery of domain controller.	Zabbix agent (active)	perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"]

Item prototypes for LDAP discovery

Name Description Type Key and additional info

Domain Controller [{#INSTANCE}]: Read time

Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent (active)

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Domain Controller [{#INSTANCE}]: Search time

Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent (active)

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for LDAP discovery

Name	Description	Expression	Severity	Dependencies and additional info
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP read time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP search time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_exchange

View README Download JSON

Microsoft Exchange Server 2016 by Zabbix agent

Overview

Official Template for Microsoft Exchange Server 2016.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Exchange Server 2016 CU18

Configuration

Setup

Metrics are collected by Zabbix agent.

1. Import the template into Zabbix.

2. Link the imported template to a host with MS Exchange.

Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent" template.

Macros used

Name	Description	Default
{$MS.EXCHANGE.PERF.INTERVAL}	Update interval for perfcounteren items.	`60`
{$MS.EXCHANGE.DB.FAULTS.TIME}	The time during which the database page faults may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.FAULTS.WARN}	Threshold for database page faults trigger.	`0`
{$MS.EXCHANGE.LOG.STALLS.TIME}	The time during which the log records stalled may exceed the threshold.	`10m`
{$MS.EXCHANGE.LOG.STALLS.WARN}	Threshold for log records stalled trigger.	`100`
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME}	The time during which the active database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}	Threshold for active database read operations latency trigger.	`0.02`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	The time during which the active database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}	Threshold for active database write operations latency trigger.	`0.05`
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME}	The time during which the passive database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}	Threshold for passive database read operations latency trigger.	`0.2`
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	The time during which the passive database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.TIME}	The time during which the RPC requests latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.WARN}	Threshold for RPC requests latency trigger.	`0.05`
{$MS.EXCHANGE.RPC.COUNT.TIME}	The time during which the RPC total requests may exceed the threshold.	`5m`
{$MS.EXCHANGE.RPC.COUNT.WARN}	Threshold for LDAP triggers.	`70`
{$MS.EXCHANGE.LDAP.TIME}	The time during which the LDAP metrics may exceed the threshold.	`5m`
{$MS.EXCHANGE.LDAP.WARN}	Threshold for LDAP triggers.	`0.05`

Items

Name	Description	Type	Key and additional info
Databases total mounted	Shows the number of active database copies on the server.	Zabbix agent	perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing Discard unchanged with heartbeat: `3h`
ActiveSync: ping command pending	Shows the number of ping commands currently pending in the queue.	Zabbix agent	perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}]
ActiveSync: requests per second	Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load.	Zabbix agent	perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
ActiveSync: sync commands per second	Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder.	Zabbix agent	perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Autodiscover: requests per second	Shows the number of Autodiscover service requests processed each second. Determines current user load.	Zabbix agent	perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Availability Service: availability requests per second	Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring.	Zabbix agent	perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}]
Outlook Web App: current unique users	Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load.	Zabbix agent	perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}]
Outlook Web App: requests per second	Shows the number of requests handled by Outlook Web App per second. Determines current user load.	Zabbix agent	perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MSExchangeWS: requests per second	Shows the number of requests processed each second. Determines current user load.	Zabbix agent	perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]

LLD rule Databases discovery

Name Description Type Key and additional info

Databases discovery

Discovery of Exchange databases.

Zabbix agent

perf_instance.discovery["MSExchange Active Manager"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Databases discovery

Name	Description	Type	Key and additional info
Active Manager [{#INSTANCE}]: Database copy role	Database copy active or passive role.	Zabbix agent	perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing Discard unchanged with heartbeat: `3h`
Information Store [{#INSTANCE}]: Database state	Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing Discard unchanged with heartbeat: `3m`
Information Store [{#INSTANCE}]: Active mailboxes count	Number of active mailboxes in this database.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"]
Information Store [{#INSTANCE}]: Page faults per second	Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high.	Zabbix agent	perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log records stalled	Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second.	Zabbix agent	perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log threads waiting	Indicates the number of threads waiting to complete an update of the database by writing their data to the log.	Zabbix agent	perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests per second	Shows the number of RPC operations per second for each database instance.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests latency	RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Information Store [{#INSTANCE}]: RPC requests total	Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations per second	Shows the number of database read operations.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations latency	Shows the average length of time per database read operation. Should be less than 20 ms on average.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database read operations latency	Shows the average length of time per passive database read operation. Should be less than 200ms on average.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Active database write operations per second	Shows the number of database write operations per second for each attached database instance.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database write operations latency	Shows the average length of time per database write operation. Should be less than 50ms on average.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database write operations latency	Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`

Trigger prototypes for Databases discovery

Name	Description	Expression
MS Exchange 2016: Information Store [{#INSTANCE}]: Page faults is too high	Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN}`\|Average
MS Exchange 2016: Information Store [{#INSTANCE}]: Log records stalls is too high	Stalled log records too high. The average value should be less than 10 threads waiting.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN}`\|Average
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests latency is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN}`\|Warning
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests total count is too high	Should be below 70 at all times.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 20ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 200ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	Should be less than 50ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}`\|Warning
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})`\|Warning

LLD rule Web services discovery

Name	Description	Type	Key and additional info
Web services discovery	Discovery of Exchange web services.	Zabbix agent	perfinstanceen.discovery["Web Service"]

Item prototypes for Web services discovery

Name	Description	Type	Key and additional info
Web Service [{#INSTANCE}]: Current connections	Shows the current number of connections established to the each Web Service.	Zabbix agent	perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}]

LLD rule LDAP discovery

Name	Description	Type	Key and additional info
LDAP discovery	Discovery of domain controller.	Zabbix agent	perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"]

Item prototypes for LDAP discovery

Name Description Type Key and additional info

Domain Controller [{#INSTANCE}]: Read time

Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Domain Controller [{#INSTANCE}]: Search time

Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for LDAP discovery

Name	Description	Expression	Severity	Dependencies and additional info
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP read time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP search time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_etcd_http

View README Download JSON

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

For the users of etcd version <= 3.4 !

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Setup

Make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to the etcd node. Set the hostname or IP address of the etcd host in the {$ETCD.HOST} macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag.

For more details, see the etcd documentation.

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run: zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.HOST}	The hostname or IP address of the `etcd` API endpoint.	`<SET ETCD HOST>`
{$ETCD.PORT}	The port of the `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPCCODE.NOTMATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Service's TCP port state		Simple check	net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Get node metrics		HTTP agent	etcd.get_metrics
Node health		HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Get version		HTTP agent	etcd.get_version
Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity
Etcd: Service is unavailable		`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0`\|Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`\|Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`\|Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`\|Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`\|Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`\|Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`\|Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`\|Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`\|Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`\|Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`\|Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`\|Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

RPCs completed with code {#GRPC.CODE}

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}		`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`\|Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_envoy_proxy_http

View README Download JSON

Envoy Proxy by HTTP

Overview

The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Envoy Proxy by HTTP - collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Envoy Proxy 1.20.2

Configuration

Setup

Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview

Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.

Macros used

Name	Description	Default
{$ENVOY.URL}	Instance URL.	`http://localhost:9901`
{$ENVOY.METRICS.PATH}	The path Zabbix will scrape metrics in prometheus format from.	`/stats/prometheus`
{$ENVOY.CERT.MIN}	Minimum number of days before certificate expiration used for trigger expression.	`7`

Items

Name	Description	Type	Key and additional info
Get node metrics	Get server metrics.	HTTP agent	envoy.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Server state	State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS).	Dependent item	envoy.server.state Preprocessing Prometheus pattern: `VALUE(envoy_server_state)` Discard unchanged with heartbeat: `3h`
Server live	1 if the server is not currently draining, 0 otherwise.	Dependent item	envoy.server.live Preprocessing Prometheus pattern: `VALUE(envoy_server_live)` Discard unchanged with heartbeat: `3h`
Uptime	Current server uptime in seconds.	Dependent item	envoy.server.uptime Preprocessing Prometheus pattern: `VALUE(envoy_server_uptime)` ⛔️Custom on fail: Discard value
Certificate expiration, day before	Number of days until the next certificate being managed will expire.	Dependent item	envoy.server.daysuntilfirstcertexpiring Preprocessing Prometheus pattern: `VALUE(envoy_server_days_until_first_cert_expiring)`
Server concurrency	Number of worker threads.	Dependent item	envoy.server.concurrency Preprocessing Prometheus pattern: `VALUE(envoy_server_concurrency)`
Memory allocated	Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart.	Dependent item	envoy.server.memory_allocated Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_allocated)`
Memory heap size	Current reserved heap size in bytes. New Envoy process heap size on hot restart.	Dependent item	envoy.server.memoryheapsize Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_heap_size)`
Memory physical size	Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart.	Dependent item	envoy.server.memoryphysicalsize Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_physical_size)`
Filesystem, flushed by timer rate	Total number of times internal flush buffers are written to a file due to flush timeout per second.	Dependent item	envoy.filesystem.flushedbytimer.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_flushed_by_timer)` Change per second
Filesystem, write completed rate	Total number of times a file was written per second.	Dependent item	envoy.filesystem.write_completed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_write_completed)` Change per second
Filesystem, write failed rate	Total number of times an error occurred during a file write operation per second.	Dependent item	envoy.filesystem.write_failed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_write_failed)` Change per second
Filesystem, reopen failed rate	Total number of times a file was failed to be opened per second.	Dependent item	envoy.filesystem.reopen_failed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_reopen_failed)` Change per second
Connections, total	Total connections of both new and old Envoy processes.	Dependent item	envoy.server.total_connections Preprocessing Prometheus pattern: `VALUE(envoy_server_total_connections)`
Connections, parent	Total connections of the old Envoy process on hot restart.	Dependent item	envoy.server.parent_connections Preprocessing Prometheus pattern: `VALUE(envoy_server_parent_connections)`
Clusters, warming	Number of currently warming (not active) clusters.	Dependent item	envoy.clustermanager.warmingclusters Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_warming_clusters)`
Clusters, active	Number of currently active (warmed) clusters.	Dependent item	envoy.clustermanager.activeclusters Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_active_clusters)`
Clusters, added rate	Total clusters added (either via static config or CDS) per second.	Dependent item	envoy.clustermanager.clusteradded.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_added)` Change per second
Clusters, modified rate	Total clusters modified (via CDS) per second.	Dependent item	envoy.clustermanager.clustermodified.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_modified)` Change per second
Clusters, removed rate	Total clusters removed (via CDS) per second.	Dependent item	envoy.clustermanager.clusterremoved.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_removed)` Change per second
Clusters, updates rate	Total cluster updates per second.	Dependent item	envoy.clustermanager.clusterupdated.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_updated)` Change per second
Listeners, active	Number of currently active listeners.	Dependent item	envoy.listenermanager.totallisteners_active Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_active)`
Listeners, draining	Number of currently draining listeners.	Dependent item	envoy.listenermanager.totallisteners_draining Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_draining)`
Listener, warming	Number of currently warming listeners.	Dependent item	envoy.listenermanager.totallisteners_warming Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_warming)`
Listener manager, initialized	A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers.	Dependent item	envoy.listenermanager.workersstarted Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_workers_started)` Discard unchanged with heartbeat: `3h`
Listeners, create failure	Total failed listener object additions to workers per second.	Dependent item	envoy.listenermanager.listenercreate_failure.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_create_failure)` Change per second
Listeners, create success	Total listener objects successfully added to workers per second.	Dependent item	envoy.listenermanager.listenercreate_success.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_create_success)` Change per second
Listeners, added	Total listeners added (either via static config or LDS) per second.	Dependent item	envoy.listenermanager.listeneradded.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_added)` Change per second
Listeners, stopped	Total listeners stopped per second.	Dependent item	envoy.listenermanager.listenerstopped.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_stopped)` Change per second

Triggers

Name	Description	Expression	Severity
Envoy Proxy: Server state is not live		`last(/Envoy Proxy by HTTP/envoy.server.state) > 0`\|Average
Envoy Proxy: Service has been restarted	Uptime is less than 10 minutes.	`last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m`\|Info	Manual close: Yes
Envoy Proxy: Failed to fetch metrics data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1`\|Warning	Manual close: Yes
Envoy Proxy: SSL certificate expires soon	Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire.	`last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN}`\|Warning

LLD rule Cluster metrics discovery

Name Description Type Key and additional info

Cluster metrics discovery

Dependent item

envoy.lld.cluster

Preprocessing

Prometheus to JSON: envoy_cluster_membership_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster metrics discovery

Name	Description	Type	Key and additional info
Cluster ["{#CLUSTER_NAME}"]: Membership, total	Current cluster membership total.	Dependent item	envoy.cluster.membershiptotal["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Membership, healthy	Current cluster healthy total (inclusive of both health checking and outlier detection).	Dependent item	envoy.cluster.membershiphealthy["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy	Current cluster unhealthy.	Calculated	envoy.cluster.membershipunhealthy["{#CLUSTERNAME}"]
Cluster ["{#CLUSTER_NAME}"]: Membership, degraded	Current cluster degraded total.	Dependent item	envoy.cluster.membershipdegraded["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Connections, total	Current cluster total connections.	Dependent item	envoy.cluster.upstreamcxtotal["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Connections, active	Current cluster total active connections.	Dependent item	envoy.cluster.upstreamcxactive["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Requests total, rate	Current cluster request total per second.	Dependent item	envoy.cluster.upstreamrqtotal.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate	Current cluster requests that timed out waiting for a response per second.	Dependent item	envoy.cluster.upstreamrqtimeout.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate	Total upstream requests completed per second.	Dependent item	envoy.cluster.upstreamrqcompleted.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq2x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq3x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq4x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq5x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests pending	Total active requests pending a connection pool connection.	Dependent item	envoy.cluster.upstreamrqpendingactive["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Requests active	Total active requests.	Dependent item	envoy.cluster.upstreamrqactive["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate	Total sent connection bytes per second.	Dependent item	envoy.cluster.upstreamcxtxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate	Total received connection bytes per second.	Dependent item	envoy.cluster.upstreamcxrxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second

Trigger prototypes for Cluster metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Envoy Proxy: There are unhealthy clusters		`last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0`\|Average

LLD rule Listeners metrics discovery

Name Description Type Key and additional info

Listeners metrics discovery

Dependent item

envoy.lld.listeners

Preprocessing

Prometheus to JSON: envoy_listener_downstream_cx_active
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Listeners metrics discovery

Name Description Type Key and additional info

Listener ["{#LISTENER_ADDRESS}"]: Connections, active

Total active connections.

Dependent item

envoy.listener.downstreamcxactive["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.

Listener ["{#LISTENER_ADDRESS}"]: Connections, rate

Total connections per second.

Dependent item

envoy.listener.downstreamcxtotal.rate["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
Change per second

Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing

Sockets currently undergoing listener filter processing.

Dependent item

envoy.listener.downstreamprecxactive["{#LISTENERADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.

LLD rule HTTP metrics discovery

Name Description Type Key and additional info

HTTP metrics discovery

Dependent item

envoy.lld.http

Preprocessing

Prometheus to JSON: envoy_http_downstream_rq_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP metrics discovery

Name	Description	Type	Key and additional info
HTTP ["{#CONN_MANAGER}"]: Requests, rate	Total active connections per second.	Dependent item	envoy.http.downstreamrqtotal.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
HTTP ["{#CONN_MANAGER}"]: Requests, active	Total active requests.	Dependent item	envoy.http.downstreamrqactive["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate	Total requests closed due to a timeout on the request path per second.	Dependent item	envoy.http.downstreamrqtimeout["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
HTTP ["{#CONN_MANAGER}"]: Connections, rate	Total connections per second.	Dependent item	envoy.http.downstreamcxtotal["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
HTTP ["{#CONN_MANAGER}"]: Connections, active	Total active connections.	Dependent item	envoy.http.downstreamcxactive["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HTTP ["{#CONN_MANAGER}"]: Bytes in, rate	Total bytes received per second.	Dependent item	envoy.http.downstreamcxrxbytestotal.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
HTTP ["{#CONN_MANAGER}"]: Bytes out, rate	Total bytes sent per second.	Dependent item	envoy.http.downstreamcxtxbytestota.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_elasticsearch_http

View README Download JSON

Elasticsearch Cluster by HTTP

Overview

The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Elasticsearch 6.5, 7.6

Configuration

Setup

Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST} macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Macros used

Name	Description	Default
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.
{$ELASTICSEARCH.HOST}	The hostname or IP address of the Elasticsearch host.	`<SET ELASTICSEARCH HOST>`
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
Service status	Checks if the service is running and accepting TCP connections.	Simple check	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time	Checks performance of the TCP service.	Simple check	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]
Get cluster health	Returns the health status of a cluster.	HTTP agent	es.cluster.get_health
Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	Dependent item	es.cluster.status Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Number of nodes	The number of nodes within the cluster.	Dependent item	es.cluster.numberofnodes Preprocessing JSON Path: `$.number_of_nodes` Discard unchanged with heartbeat: `1h`
Number of data nodes	The number of nodes that are dedicated to data nodes.	Dependent item	es.cluster.numberofdata_nodes Preprocessing JSON Path: `$.number_of_data_nodes` Discard unchanged with heartbeat: `1h`
Number of relocating shards	The number of shards that are under relocation.	Dependent item	es.cluster.relocating_shards Preprocessing JSON Path: `$.relocating_shards`
Number of initializing shards	The number of shards that are under initialization.	Dependent item	es.cluster.initializing_shards Preprocessing JSON Path: `$.initializing_shards`
Number of unassigned shards	The number of shards that are not allocated.	Dependent item	es.cluster.unassigned_shards Preprocessing JSON Path: `$.unassigned_shards`
Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	Dependent item	es.cluster.delayedunassignedshards Preprocessing JSON Path: `$.delayed_unassigned_shards`
Number of pending tasks	The number of cluster-level changes that have not yet been executed.	Dependent item	es.cluster.numberofpending_tasks Preprocessing JSON Path: `$.number_of_pending_tasks`
Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	Dependent item	es.cluster.taskmaxwaitinginqueue Preprocessing JSON Path: `$.task_max_waiting_in_queue_millis` Custom multiplier: `0.001`
Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	Dependent item	es.cluster.inactiveshardspercentasnumber Preprocessing JSON Path: `$.active_shards_percent_as_number` JavaScript: `The text is too long. Please see the template.`
Get cluster stats	Returns cluster statistics.	HTTP agent	es.cluster.get_stats
Cluster uptime	Uptime duration in seconds since JVM has last started.	Dependent item	es.nodes.jvm.max_uptime Preprocessing JSON Path: `$.nodes.jvm.max_uptime_in_millis` Custom multiplier: `0.001`
Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	Dependent item	es.indices.docs.count Preprocessing JSON Path: `$.indices.docs.count` Discard unchanged with heartbeat: `1h`
Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	Dependent item	es.indices.count Preprocessing JSON Path: `$.indices.count` Discard unchanged with heartbeat: `1h`
Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	Dependent item	es.nodes.fs.totalinbytes Preprocessing JSON Path: `$.nodes.fs.total_in_bytes` Discard unchanged with heartbeat: `1h`
Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.freeinbyes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	Dependent item	es.nodes.fs.availableinbytes Preprocessing JSON Path: `$.nodes.fs.available_in_bytes` Discard unchanged with heartbeat: `1h`
Nodes with the data role	The number of selected nodes with the data role.	Dependent item	es.nodes.count.data Preprocessing JSON Path: `$.nodes.count.data` Discard unchanged with heartbeat: `1h`
Nodes with the ingest role	The number of selected nodes with the ingest role.	Dependent item	es.nodes.count.ingest Preprocessing JSON Path: `$.nodes.count.ingest` Discard unchanged with heartbeat: `1h`
Nodes with the master role	The number of selected nodes with the master role.	Dependent item	es.nodes.count.master Preprocessing JSON Path: `$.nodes.count.master` Discard unchanged with heartbeat: `1h`
Get nodes stats	Returns cluster nodes statistics.	HTTP agent	es.nodes.get_stats

Triggers

Name	Description	Expression	Severity
Elasticsearch: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0`\|Average	Manual close: Yes
Elasticsearch: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Elasticsearch: Service is down
Elasticsearch: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`\|Average
Elasticsearch: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`\|High
Elasticsearch: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`\|High
Elasticsearch: The number of nodes within the cluster has decreased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`\|Info	Manual close: Yes
Elasticsearch: The number of nodes within the cluster has increased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`\|Info	Manual close: Yes
Elasticsearch: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`\|Average
Elasticsearch: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`\|Average
Elasticsearch: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`\|Info	Manual close: Yes
Elasticsearch: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`\|High
Elasticsearch: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`\|Disaster

LLD rule Cluster nodes discovery

Name Description Type Key and additional info

Cluster nodes discovery

Discovery ES cluster nodes.

HTTP agent

es.nodes.discovery

Preprocessing

JSON Path: $.nodes.[*]
Discard unchanged with heartbeat: 1d

Item prototypes for Cluster nodes discovery

Name	Description	Type	Key and additional info
ES {#ES.NODE}: Get data	Returns cluster nodes statistics.	Dependent item	es.node.get.data[{#ES.NODE}] Preprocessing JSON Path: `$..[?(@.name=='{#ES.NODE}')].first()`
ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	Dependent item	es.node.fs.total.totalinbytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.total_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.freeinbytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	Dependent item	es.node.fs.total.availableinbytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.available_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	Dependent item	es.node.jvm.uptime[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.uptime_in_millis.first()` Custom multiplier: `0.001`
ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heapmaxin_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_max_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	Dependent item	es.node.jvm.mem.heapusedin_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	Dependent item	es.node.jvm.mem.heapusedpercent[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_percent.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heapcommittedin_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_committed_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	Dependent item	es.node.http.current_open[{#ES.NODE}] Preprocessing JSON Path: `$..http.current_open.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	Dependent item	es.node.http.opened.rate[{#ES.NODE}] Preprocessing JSON Path: `$..http.total_opened.first()` Change per second
ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	Dependent item	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	Dependent item	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.recovery.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	Dependent item	es.node.indices.merges.totalthrottledtime[{#ES.NODE}] Preprocessing JSON Path: `$..indices.merges.total_throttled_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Rate of queries	The number of query operations per second.	Dependent item	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Change per second
ES {#ES.NODE}: Total number of query	The total number of query operations.	Dependent item	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	Dependent item	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	Dependent item	es.node.indices.search.querytimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.query_latency[{#ES.NODE}]
ES {#ES.NODE}: Current query operations	The number of query operations currently running.	Dependent item	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_current.first()`
ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	Dependent item	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Change per second
ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	Dependent item	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	Dependent item	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	Dependent item	es.node.indices.search.fetchtimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.fetch_latency[{#ES.NODE}]
ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	Dependent item	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_current.first()`
ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	Dependent item	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.completed.first()` Change per second
ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	Dependent item	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.active.first()`
ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	Dependent item	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.queue.first()`
ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	Dependent item	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.rejected.first()` Change per second
ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	Dependent item	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.completed.first()` Change per second
ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	Dependent item	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.active.first()`
ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	Dependent item	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.queue.first()`
ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	Dependent item	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.rejected.first()` Change per second
ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.completed.first()` Change per second
ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.active.first()`
ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.queue.first()`
ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.rejected.first()` Change per second
ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	Dependent item	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	Dependent item	es.node.indices.indexing.indextimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available indextotal and indextimeinmillis metrics.	Calculated	es.node.indices.indexing.index_latency[{#ES.NODE}]
ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	Dependent item	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_current.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	Dependent item	es.node.indices.flush.total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	Dependent item	es.node.indices.flush.totaltimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.totaltimein_millis metrics.	Calculated	es.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	Dependent item	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total.first()` Change per second
ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	Dependent item	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total_time_in_millis.first()` Custom multiplier: `0.001` Simple change

Trigger prototypes for Cluster nodes discovery

Name	Description	Expression	Severity
Elasticsearch: ES {#ES.NODE}: Node has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`\|Info	Manual close: Yes
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`\|Warning	Depends on: Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`\|High
Elasticsearch: ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`\|Warning
Elasticsearch: ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`\|Warning
Elasticsearch: ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`\|Warning
Elasticsearch: ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`\|Warning
Elasticsearch: ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`\|Warning
Elasticsearch: ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`\|Warning
Elasticsearch: ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_docker

View README Download JSON

Docker by Zabbix agent 2

Overview

The template to monitor Docker engine by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Docker by Zabbix agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Docker 23.0.3

Configuration

Setup

Setup and configure Zabbix agent 2 compiled with the Docker monitoring plugin. The user by which the Zabbix agent 2 is running should have access permissions to the Docker socket.

Test availability: zabbix_get -s docker-host -k docker.info

Macros used

Name	Description	Default
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES}	Filter of discoverable containers.	`.*`
{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES}	Filter to exclude discovered containers.	`CHANGE_IF_NEEDED`
{$DOCKER.LLD.FILTER.IMAGE.MATCHES}	Filter of discoverable images.	`.*`
{$DOCKER.LLD.FILTER.IMAGE.NOT_MATCHES}	Filter to exclude discovered images.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Ping		Zabbix agent	docker.ping Preprocessing Discard unchanged with heartbeat: `10m`
Get info		Zabbix agent	docker.info
Get containers		Zabbix agent	docker.containers
Get images		Zabbix agent	docker.images
Get data_usage		Zabbix agent	docker.data_usage
Containers total	Total number of containers on this host.	Dependent item	docker.containers.total Preprocessing JSON Path: `$.Containers`
Containers running	Total number of containers running on this host.	Dependent item	docker.containers.running Preprocessing JSON Path: `$.ContainersRunning`
Containers stopped	Total number of containers stopped on this host.	Dependent item	docker.containers.stopped Preprocessing JSON Path: `$.ContainersStopped`
Containers paused	Total number of containers paused on this host.	Dependent item	docker.containers.paused Preprocessing JSON Path: `$.ContainersPaused`
Images total	Number of images with intermediate image layers.	Dependent item	docker.images.total Preprocessing JSON Path: `$.Images`
Storage driver	Docker storage driver. https://docs.docker.com/storage/storagedriver/	Dependent item	docker.driver Preprocessing JSON Path: `$.Driver` Discard unchanged with heartbeat: `1d`
Memory limit enabled		Dependent item	docker.mem_limit.enabled Preprocessing JSON Path: `$.MemoryLimit` Boolean to decimal Discard unchanged with heartbeat: `1d`
Swap limit enabled		Dependent item	docker.swap_limit.enabled Preprocessing JSON Path: `$.SwapLimit` Boolean to decimal Discard unchanged with heartbeat: `1d`
Kernel memory enabled		Dependent item	docker.kernel_mem.enabled Preprocessing JSON Path: `$.KernelMemory` Boolean to decimal Discard unchanged with heartbeat: `1d`
Kernel memory TCP enabled		Dependent item	docker.kernelmemtcp.enabled Preprocessing JSON Path: `$.KernelMemoryTCP` Boolean to decimal Discard unchanged with heartbeat: `1d`
CPU CFS Period enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpucfsperiod.enabled Preprocessing JSON Path: `$.CpuCfsPeriod` Boolean to decimal Discard unchanged with heartbeat: `1d`
CPU CFS Quota enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpucfsquota.enabled Preprocessing JSON Path: `$.CpuCfsQuota` Boolean to decimal Discard unchanged with heartbeat: `1d`
CPU Shares enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpu_shares.enabled Preprocessing JSON Path: `$.CPUShares` Boolean to decimal Discard unchanged with heartbeat: `1d`
CPU Set enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpu_set.enabled Preprocessing JSON Path: `$.CPUSet` Boolean to decimal Discard unchanged with heartbeat: `1d`
Pids limit enabled		Dependent item	docker.pids_limit.enabled Preprocessing JSON Path: `$.PidsLimit` Boolean to decimal Discard unchanged with heartbeat: `1d`
IPv4 Forwarding enabled		Dependent item	docker.ipv4_forwarding.enabled Preprocessing JSON Path: `$.IPv4Forwarding` Boolean to decimal Discard unchanged with heartbeat: `1d`
Debug enabled		Dependent item	docker.debug.enabled Preprocessing JSON Path: `$.Debug` Boolean to decimal Discard unchanged with heartbeat: `1d`
Nfd	Number of used File Descriptors.	Dependent item	docker.nfd Preprocessing JSON Path: `$.NFd`
OomKill disabled		Dependent item	docker.oomkill.disabled Preprocessing JSON Path: `$.OomKillDisable` Boolean to decimal Discard unchanged with heartbeat: `1d`
Goroutines	Number of goroutines.	Dependent item	docker.goroutines Preprocessing JSON Path: `$.NGoroutines`
Logging driver		Dependent item	docker.logging_driver Preprocessing JSON Path: `$.LoggingDriver` Discard unchanged with heartbeat: `1d`
Cgroup driver		Dependent item	docker.cgroup_driver Preprocessing JSON Path: `$.CgroupDriver` Discard unchanged with heartbeat: `1d`
NEvents listener		Dependent item	docker.nevents_listener Preprocessing JSON Path: `$.NEventsListener`
Kernel version		Dependent item	docker.kernel_version Preprocessing JSON Path: `$.KernelVersion` Discard unchanged with heartbeat: `1d`
Operating system		Dependent item	docker.operating_system Preprocessing JSON Path: `$.OperatingSystem` Discard unchanged with heartbeat: `1d`
OS type		Dependent item	docker.os_type Preprocessing JSON Path: `$.OSType` Discard unchanged with heartbeat: `1d`
Architecture		Dependent item	docker.architecture Preprocessing JSON Path: `$.Architecture` Discard unchanged with heartbeat: `1d`
NCPU		Dependent item	docker.ncpu Preprocessing JSON Path: `$.NCPU`
Memory total		Dependent item	docker.mem.total Preprocessing JSON Path: `$.MemTotal`
Docker root dir		Dependent item	docker.root_dir Preprocessing JSON Path: `$.DockerRootDir` Discard unchanged with heartbeat: `1d`
Name		Dependent item	docker.name Preprocessing JSON Path: `$.Name`
Server version		Dependent item	docker.server_version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Default runtime		Dependent item	docker.default_runtime Preprocessing JSON Path: `$.DefaultRuntime` Discard unchanged with heartbeat: `1d`
Live restore enabled		Dependent item	docker.live_restore.enabled Preprocessing JSON Path: `$.LiveRestoreEnabled` Boolean to decimal Discard unchanged with heartbeat: `1d`
Layers size		Dependent item	docker.layers_size Preprocessing JSON Path: `$.LayersSize`
Images size		Dependent item	docker.images_size Preprocessing JSON Path: `$.Images[*].Size.sum()`
Containers size		Dependent item	docker.containers_size Preprocessing JSON Path: `$.Containers[*].SizeRw.sum()`
Volumes size		Dependent item	docker.volumes_size Preprocessing JSON Path: `$.Volumes[*].UsageData.Size.sum()`
Images available	Number of top-level images.	Dependent item	docker.images.top_level Preprocessing JSON Path: `$.length()`

Triggers

Name	Description	Expression	Severity
Docker: Service is down		`last(/Docker by Zabbix agent 2/docker.ping)=0`\|Average	Manual close: Yes
Docker: Failed to fetch info data	Zabbix has not received data for items for the last 30 minutes.	`nodata(/Docker by Zabbix agent 2/docker.name,30m)=1`\|Warning	Manual close: Yes Depends on: Docker: Service is down
Docker: Version has changed	Docker version has changed. Acknowledge to close the problem manually.	`last(/Docker by Zabbix agent 2/docker.server_version,#1)<>last(/Docker by Zabbix agent 2/docker.server_version,#2) and length(last(/Docker by Zabbix agent 2/docker.server_version))>0`\|Info	Manual close: Yes

LLD rule Images discovery

Name	Description	Type	Key and additional info
Images discovery	Discovery of images metrics.	Zabbix agent	docker.images.discovery

Item prototypes for Images discovery

Name Description Type Key and additional info

Image {#NAME}: Created

Dependent item

docker.image.created["{#ID}"]

Preprocessing

JSON Path: $[?(@.Id == "{#ID}")].Created.first()
Discard unchanged with heartbeat: 1d

Image {#NAME}: Size

Dependent item

docker.image.size["{#ID}"]

Preprocessing

JSON Path: $[?(@.Id == "{#ID}")].Size.first()

LLD rule Containers discovery

Name

Description

Type

Key and additional info

Containers discovery

Discovery of containers metrics.

Parameter:

true - Returns all containers

false - Returns only running containers

Zabbix agent

docker.containers.discovery[false]

Item prototypes for Containers discovery

Name	Description	Type	Key and additional info
Container {#NAME}: Get stats	Get container stats based on resource usage.	Zabbix agent	docker.container_stats["{#NAME}"]
Container {#NAME}: CPU total usage per second		Dependent item	docker.containerstats.cpuusage.total.rate["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.total_usage` Change per second Custom multiplier: `1.0E-9`
Container {#NAME}: CPU percent usage		Dependent item	docker.containerstats.cpupct_usage["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.percent_usage`
Container {#NAME}: CPU kernelmode usage per second		Dependent item	docker.containerstats.cpuusage.kernel.rate["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.usage_in_kernelmode` Change per second Custom multiplier: `1.0E-9`
Container {#NAME}: CPU usermode usage per second		Dependent item	docker.containerstats.cpuusage.user.rate["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.usage_in_usermode` Change per second Custom multiplier: `1.0E-9`
Container {#NAME}: Online CPUs		Dependent item	docker.containerstats.onlinecpus["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.online_cpus`
Container {#NAME}: Throttling periods	Number of periods with throttling active.	Dependent item	docker.containerstats.cpuusage.throttling_periods["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.throttling_data.periods`
Container {#NAME}: Throttled periods	Number of periods when the container hits its throttling limit.	Dependent item	docker.containerstats.cpuusage.throttled_periods["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.throttling_data.throttled_periods`
Container {#NAME}: Throttled time	Aggregate time the container was throttled for in nanoseconds.	Dependent item	docker.containerstats.cpuusage.throttled_time["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.throttling_data.throttled_time` Custom multiplier: `1.0E-9`
Container {#NAME}: Memory usage		Dependent item	docker.container_stats.memory.usage["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.usage`
Container {#NAME}: Memory maximum usage		Dependent item	docker.containerstats.memory.maxusage["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.max_usage`
Container {#NAME}: Memory commit bytes		Dependent item	docker.containerstats.memory.commitbytes["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.commitbytes`
Container {#NAME}: Memory commit peak bytes		Dependent item	docker.containerstats.memory.commitpeak_bytes["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.commitpeakbytes`
Container {#NAME}: Memory private working set		Dependent item	docker.containerstats.memory.privateworking_set["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.privateworkingset`
Container {#NAME}: Current PIDs count	Current number of PIDs the container has created.	Dependent item	docker.containerstats.pidsstats.current["{#NAME}"] Preprocessing JSON Path: `$.pids_stats.current`
Container {#NAME}: Networks bytes received per second		Dependent item	docker.networks.rx_bytes["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_bytes.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks packets received per second		Dependent item	docker.networks.rx_packets["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_packets.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks errors received per second		Dependent item	docker.networks.rx_errors["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_errors.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks incoming packets dropped per second		Dependent item	docker.networks.rx_dropped["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_dropped.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks bytes sent per second		Dependent item	docker.networks.tx_bytes["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_bytes.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks packets sent per second		Dependent item	docker.networks.tx_packets["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_packets.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks errors sent per second		Dependent item	docker.networks.tx_errors["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_errors.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks outgoing packets dropped per second		Dependent item	docker.networks.tx_dropped["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_dropped.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Get info	Return low-level information about a container.	Zabbix agent	docker.container_info["{#NAME}",full]
Container {#NAME}: Created		Dependent item	docker.container_info.created["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Image		Dependent item	docker.container_info.image["{#NAME}"] Preprocessing JSON Path: `$[?(@.Names[0] == "{#NAME}")].Image.first()` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Restart count		Dependent item	docker.containerinfo.restartcount["{#NAME}"] Preprocessing JSON Path: `$.RestartCount`
Container {#NAME}: Status		Dependent item	docker.container_info.state.status["{#NAME}"] Preprocessing JSON Path: `$.State.Status` Discard unchanged with heartbeat: `1h`
Container {#NAME}: Health status	Container's `HEALTHCHECK`.	Dependent item	docker.container_info.state.health["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` In range: `1 -> 4` ⛔️Custom on fail: Set value to: `4`
Container {#NAME}: Health failing streak		Dependent item	docker.container_info.state.health.failing["{#NAME}"] Preprocessing JSON Path: `$.State.Health.FailingStreak` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Container {#NAME}: Running		Dependent item	docker.container_info.state.running["{#NAME}"] Preprocessing JSON Path: `$.State.Running` Boolean to decimal
Container {#NAME}: Paused		Dependent item	docker.container_info.state.paused["{#NAME}"] Preprocessing JSON Path: `$.State.Paused` Boolean to decimal
Container {#NAME}: Restarting		Dependent item	docker.container_info.state.restarting["{#NAME}"] Preprocessing JSON Path: `$.State.Restarting` Boolean to decimal
Container {#NAME}: OOMKilled		Dependent item	docker.container_info.state.oomkilled["{#NAME}"] Preprocessing JSON Path: `$.State.OOMKilled` Boolean to decimal
Container {#NAME}: Dead		Dependent item	docker.container_info.state.dead["{#NAME}"] Preprocessing JSON Path: `$.State.Dead` Boolean to decimal
Container {#NAME}: Pid		Dependent item	docker.container_info.state.pid["{#NAME}"] Preprocessing JSON Path: `$.State.Pid` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Exit code		Dependent item	docker.container_info.state.exitcode["{#NAME}"] Preprocessing JSON Path: `$.State.ExitCode` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Error		Dependent item	docker.container_info.state.error["{#NAME}"] Preprocessing JSON Path: `$.State.Error` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Started at		Dependent item	docker.container_info.started["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Finished at	Time at which the container last terminated.	Dependent item	docker.container_info.finished["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`

Trigger prototypes for Containers discovery

Name	Description	Expression	Severity
Docker: Container {#NAME}: Health state container is unhealthy	Container health state is unhealthy.	`count(/Docker by Zabbix agent 2/docker.container_info.state.health["{#NAME}"],2m,,2)>=2`\|High
Docker: Container {#NAME}: Container has been stopped with error code		`last(/Docker by Zabbix agent 2/docker.container_info.state.exitcode["{#NAME}"])>0 and last(/Docker by Zabbix agent 2/docker.container_info.state.running["{#NAME}"])=0`\|Average	Manual close: Yes
Docker: Container {#NAME}: An error has occurred in the container	Container {#NAME} has an error. Acknowledge to close the problem manually.	`last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#1)<>last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#2) and length(last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"]))>0`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_controlm_http

View README Download JSON

Control-M enterprise manager by HTTP

Overview

The template to monitor Control-M by Zabbix that work without any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Control-M 9.21.0

Configuration

Setup

This template is intended to be used on Control-M Enterprise Manager instances.

It monitors:

active SLA services;
discovers Control-M servers using Low Level Discovery;
creates host prototypes for discovered servers with the Control-M server by HTTP template.

To use this template, you must set macros: {$API.TOKEN} and {$API.URI.ENDPOINT}.

To access the API token, use one of the following Control-M interfaces:

{$API.URI.ENDPOINT} - is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, Automation API port and path.

For example, https://monitored.controlm.instance:8443/automation-api.

Macros used

Name	Description	Default
{$API.URI.ENDPOINT}	The API endpoint is a URI - for example, `https://monitored.controlm.instance:8443/automation-api`.	`<set the api uri endpoint here>`
{$API.TOKEN}	A token to use for API connections.	`<set the token here>`

Items

Name	Description	Type	Key and additional info
Get Control-M servers	Gets a list of servers.	HTTP agent	controlm.servers
Get SLA services	Gets all the SLA active services.	HTTP agent	controlm.services

LLD rule Server discovery

Name Description Type Key and additional info

Server discovery

Discovers the Control-M servers.

Dependent item

controlm.server.discovery

Preprocessing

Discard unchanged with heartbeat: 2h

LLD rule SLA services discovery

Name Description Type Key and additional info

SLA services discovery

Discovers the SLA services in the Control-M environment.

Dependent item

controlm.services.discovery

Preprocessing

JSON Path: $.activeServices
⛔️Custom on fail: Set value to: []
Discard unchanged with heartbeat: 1h

Item prototypes for SLA services discovery

Name	Description	Type	Key and additional info
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: stats	Gets the service statistics.	Dependent item	service.stats['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing JSON Path: `$.activeServices.[?(@.serviceName == '{#SERVICE.NAME}')]` ⛔️Custom on fail: Discard value JSON Path: `$.[?(@.serviceJob == '{#SERVICE.JOB}')].first()` ⛔️Custom on fail: Discard value
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status	Gets the service status.	Dependent item	service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'executed'	Gets the number of jobs in the state - `executed`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',executed] Preprocessing JSON Path: `$.statusByJobs.executed` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitCondition'	Gets the number of jobs in the state - `waitCondition`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitCondition] Preprocessing JSON Path: `$.statusByJobs.waitCondition` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitResource'	Gets the number of jobs in the state - `waitResource`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitResource] Preprocessing JSON Path: `$.statusByJobs.waitResource` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitHost'	Gets the number of jobs in the state - `waitHost`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitHost] Preprocessing JSON Path: `$.statusByJobs.waitHost` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitWorkload'	Gets the number of jobs in the state - `waitWorkload`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitWorkload] Preprocessing JSON Path: `$.statusByJobs.waitWorkload` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'completed'	Gets the number of jobs in the state - `completed`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',completed] Preprocessing JSON Path: `$.statusByJobs.completed` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'error'	Gets the number of jobs in the state - `error`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error] Preprocessing JSON Path: `$.statusByJobs.error` Discard unchanged with heartbeat: `1h`

Trigger prototypes for SLA services discovery

Name	Description	Expression	Severity
Control-M: Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}]	The service has encountered an issue.	`last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=0 or last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=10`\|Average	Manual close: Yes
Control-M: Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}]	The service has finished its job late.	`last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=3`\|Warning	Manual close: Yes
Control-M: Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs in 'error' state	There are services present which are in the state - `error`.	`last(/Control-M enterprise manager by HTTP/service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error],#1)>0`\|Average

Control-M server by HTTP

Overview

This template is designed to get metrics from the Control-M server using the Control-M Automation API with HTTP agent.

This template monitors server statistics, discovers jobs and agents using Low Level Discovery.

To use this template, macros {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME} need to be set.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Control-M 9.21.0

Configuration

Setup

This template is primarily intended for using in conjunction with the Control-M enterprise manager by HTTP template in order to create host prototypes.

It monitors:

server statistics;
discovers jobs using Low Level Discovery;
discovers agents using Low Level Discovery.

However, if you wish to monitor the Control-M server separately with this template, you must set the following macros: {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME}.

To access the {$API.TOKEN} macro, use one of the following interfaces:

{$API.URI.ENDPOINT} - is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, the Automation API port and path.

For example, https://monitored.controlm.instance:8443/automation-api.

{$SERVER.NAME} - is the name of the Control-M server to be monitored.

Macros used

Name	Description	Default
{$SERVER.NAME}	The name of the Control-M server.	`<set the server name here>`
{$API.URI.ENDPOINT}	The API endpoint is a URI - for example, `https://monitored.controlm.instance:8443/automation-api`.	`<set the api uri endpoint here>`
{$API.TOKEN}	A token to use for API connections.	`<set the token here>`

Items

Name	Description	Type	Key and additional info
Get Control-M server stats	Gets the statistics of the server.	HTTP agent	controlm.server.stats Preprocessing JSON Path: `$.[?(@.name == '{$SERVER.NAME}')].first()` ⛔️Custom on fail: Set error to: `Could not get server stats.`
Get jobs	Gets the status of jobs.	HTTP agent	controlm.jobs
Get agents	Gets agents for the server.	HTTP agent	controlm.agents
Jobs statistics	Gets the statistics of jobs.	Dependent item	controlm.jobs.statistics Preprocessing JSON Path: `$.['returned', 'total']`
Jobs returned	Gets the count of returned jobs.	Dependent item	controlm.jobs.statistics.returned Preprocessing JSON Path: `$.[0]` Discard unchanged with heartbeat: `1h`
Jobs total	Gets the count of total jobs.	Dependent item	controlm.jobs.statistics.total Preprocessing JSON Path: `$.[1]` Discard unchanged with heartbeat: `1h`
Server state	Gets the metric of the server state.	Dependent item	server.state Preprocessing JSON Path: `$.state` ⛔️Custom on fail: Set error to: `Could not get server state.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Server message	Gets the metric of the server message.	Dependent item	server.message Preprocessing JSON Path: `$.message` ⛔️Custom on fail: Set error to: `Could not get server message.` Discard unchanged with heartbeat: `1h`
Server version	Gets the metric of the server version.	Dependent item	server.version Preprocessing JSON Path: `$.version` ⛔️Custom on fail: Set error to: `Could not get server version.` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Control-M: Server is down	The server is down.	`last(/Control-M server by HTTP/server.state)=0 or last(/Control-M server by HTTP/server.state)=10`\|High
Control-M: Server disconnected	The server is disconnected.	`last(/Control-M server by HTTP/server.message,#1)="Disconnected"`\|High
Control-M: Server error	The server has encountered an error.	`last(/Control-M server by HTTP/server.message,#1)<>"Connected" and last(/Control-M server by HTTP/server.message,#1)<>"Disconnected" and last(/Control-M server by HTTP/server.message,#1)<>""`\|High
Control-M: Server version has changed	The server version has changed. Acknowledge to close the problem manually.	`last(/Control-M server by HTTP/server.version,#1)<>last(/Control-M server by HTTP/server.version,#2) and length(last(/Control-M server by HTTP/server.version))>0`\|Info	Manual close: Yes

LLD rule Jobs discovery

Name Description Type Key and additional info

Jobs discovery

Discovers jobs on the server.

Dependent item

controlm.jobs.discovery

Preprocessing

JSON Path: $.statuses
⛔️Custom on fail: Set value to: []
Discard unchanged with heartbeat: 1h

Item prototypes for Jobs discovery

Name	Description	Type	Key and additional info
Job [{#JOB.ID}]: stats	Gets the statistics of a job.	Dependent item	job.stats['{#JOB.ID}'] Preprocessing JSON Path: `$.statuses.[?(@.jobId == '{#JOB.ID}')].first()` ⛔️Custom on fail: Discard value
Job [{#JOB.ID}]: status	Gets the status of a job.	Dependent item	job.status['{#JOB.ID}'] Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Job [{#JOB.ID}]: number of runs	Gets the number of runs for a job.	Dependent item	job.numberOfRuns['{#JOB.ID}'] Preprocessing JSON Path: `$.numberOfRuns` Discard unchanged with heartbeat: `1h`
Job [{#JOB.ID}]: type	Gets the job type.	Dependent item	job.type['{#JOB.ID}'] Preprocessing JSON Path: `$.type` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Job [{#JOB.ID}]: held status	Gets the held status of a job.	Dependent item	job.held['{#JOB.ID}'] Preprocessing JSON Path: `$.held` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for Jobs discovery

Name	Description	Expression	Severity	Dependencies and additional info
Control-M: Job [{#JOB.ID}]: status [{ITEM.VALUE}]	The job has encountered an issue.	`last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=1 or last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=10`\|Warning	Manual close: Yes

LLD rule Agent discovery

Name Description Type Key and additional info

Agent discovery

Discovers agents on the server.

Dependent item

controlm.agent.discovery

Preprocessing

JSON Path: $.agents
⛔️Custom on fail: Set value to: []
Discard unchanged with heartbeat: 1h

Item prototypes for Agent discovery

Name Description Type Key and additional info

Agent [{#AGENT.NAME}]: stats

Gets the statistics of an agent.

Dependent item

agent.stats['{#AGENT.NAME}']

Preprocessing

JSON Path: $.agents.[?(@.nodeid == '{#AGENT.NAME}')].first()
⛔️Custom on fail: Discard value

Agent [{#AGENT.NAME}]: status

Gets the status of an agent.

Dependent item

agent.status['{#AGENT.NAME}']

Preprocessing

JSON Path: $.status
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Agent [{#AGENT.NAME}]: version

Gets the version number of an agent.

Dependent item

agent.version['{#AGENT.NAME}']

Preprocessing

JSON Path: $.version
⛔️Custom on fail: Set value to: Unknown
Discard unchanged with heartbeat: 1h

Trigger prototypes for Agent discovery

Name	Description	Expression	Severity
Control-M: Agent [{#AGENT.NAME}]: status [{ITEM.VALUE}]	The agent has encountered an issue.	`last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=1 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=10`\|Average	Manual close: Yes
Control-M: Agent [{#AGENT.NAME}}: status disabled	The agent is disabled.	`last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=2 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=3`\|Info	Manual close: Yes
Control-M: Agent [{#AGENT.NAME}]: version has changed	The agent version has changed. Acknowledge to close the problem manually.	`last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)<>last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#2)`\|Info	Manual close: Yes
Control-M: Agent [{#AGENT.NAME}]: unknown version	The agent version is unknown.	`last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)="Unknown"`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

consul_cluster_http

View README Download JSON

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints. More information about metrics you can find in official documentation.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services. In case of Open Source version leave this macro empty.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration. NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODENAME.NOTMATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICENAME.NOTMATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.getclusterserf Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.getcatalogservices Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity
HashiCorp Consul Cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`\|Info	Manual close: Yes
HashiCorp Consul Cluster: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`\|Average
HashiCorp Consul Cluster: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`\|Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodespassing["{#SERVICENAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodeswarning["{#SERVICENAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodescritical["{#SERVICENAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.getservicestats["{#SERVICE_NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

consul_node_http

View README Download JSON

HashiCorp Consul Node by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics. See documentation.
More information about metrics you can find in official documentation.

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values. More information about metrics you can find in official documentation.

This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICENAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOTMATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.getnodeinfo Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Number of check monitors	Number of check monitors on current node.	Dependent item	consul.checkmonitorsnumber Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpusecondstotal.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtualmemorybytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
RSS memory usage	Resident memory size in bytes.	Dependent item	consul.residentmemorybytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Open file descriptors	Number of open file descriptors.	Dependent item	consul.processopenfds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.processmaxfds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.clientrpcfailed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.pushpullnode.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.pushpullnode.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.getlocalservices
Get local services check	Data collection check.	Dependent item	consul.getlocalservices.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity
HashiCorp Consul Node: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`\|Info	Manual close: Yes
HashiCorp Consul Node: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`\|Warning
HashiCorp Consul Node: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`\|Warning	Depends on: HashiCorp Consul Node: Node's health score is critical
HashiCorp Consul Node: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`\|Average
HashiCorp Consul Node: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`\|Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Discover metrics for services that are registered with the local agent.

Dependent item

consul.nodeserviceslld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

["{#SERVICE_NAME}"]: Aggregated status

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregatedstate["{#SERVICEID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICEID}/{#SERVICECHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICEID}/{#SERVICECHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`\|Warning
HashiCorp Consul Node: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`\|Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Discovery HTTP API methods specific metrics.

Dependent item

consul.httpapidiscovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

HTTP request: ["{#HTTP_METHOD}"], p90

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leaderlastcontact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leaderlastcontact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_cloudflare_http

View README Download JSON

Cloudflare by HTTP

Overview

This template is designed for the effortless deployment of Cloudflare monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Cloudflare

Configuration

Setup

1. Create a host, for example mywebsite.com, for a site in your Cloudflare account.

2. Link the template to the host.

3. Customize the values of {$CLOUDFLARE.API.TOKEN}, {$CLOUDFLARE.ZONE_ID} macros.
Cloudflare API Tokens are available in your Cloudflare account under My Profile > API Tokens.
Zone ID is available in your Cloudflare account under Account Home > Site.

Macros used

Name	Description	Default
{$CLOUDFLARE.API.URL}	The URL of Cloudflare API endpoint.	`https://api.cloudflare.com/client/v4`
{$CLOUDFLARE.API.TOKEN}	Your Cloudflare API Token.	`<change>`
{$CLOUDFLARE.ZONE_ID}	Your Cloudflare Site Zone ID.	`<change>`
{$CLOUDFLARE.ERRORS.MAX.WARN}	Maximum responses with errors in %.	`30`
{$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN}	Minimum of cached bandwidth in %.	`50`

Items

Name	Description	Type	Key and additional info
Total bandwidth	The volume of all data.	Dependent item	cloudflare.bandwidth.all Preprocessing JSON Path: `$.bandwidth.all`
Cached bandwidth	The volume of cached data.	Dependent item	cloudflare.bandwidth.cached Preprocessing JSON Path: `$.bandwidth.cached`
Uncached bandwidth	The volume of uncached data.	Dependent item	cloudflare.bandwidth.uncached Preprocessing JSON Path: `$.bandwidth.uncached`
Cache hit ratio of bandwidth	The ratio of the amount cached bandwidth to the bandwidth in percentage.	Dependent item	cloudflare.bandwidth.cachehitratio Preprocessing JSON Path: `$.bandwidth.cache_hit_ratio`
SSL encrypted bandwidth	The volume of encrypted data.	Dependent item	cloudflare.bandwidth.ssl.encrypted Preprocessing JSON Path: `$.bandwidth.encrypted`
Unencrypted bandwidth	The volume of unencrypted data.	Dependent item	cloudflare.bandwidth.ssl.unencrypted Preprocessing JSON Path: `$.bandwidth.unencrypted`
DNS queries	The amount of all DNS queries.	Dependent item	cloudflare.dns.query.all Preprocessing JSON Path: `$.dns.query.all`
Stale DNS queries	The number of stale DNS queries.	Dependent item	cloudflare.dns.query.stale Preprocessing JSON Path: `$.dns.query.stale`
Uncached DNS queries	The number of uncached DNS queries.	Dependent item	cloudflare.dns.query.uncached Preprocessing JSON Path: `$.dns.query.uncached`
Get data	The JSON with result of Cloudflare API request.	Script	cloudflare.get
Total page views	The amount of all pageviews.	Dependent item	cloudflare.pageviews.all Preprocessing JSON Path: `$.pageviews.all`
Total requests	The amount of all requests.	Dependent item	cloudflare.requests.all Preprocessing JSON Path: `$.requests.all`
Cached requests		Dependent item	cloudflare.requests.cached Preprocessing JSON Path: `$.requests.cached`
Uncached requests	The number of uncached requests.	Dependent item	cloudflare.requests.uncached Preprocessing JSON Path: `$.requests.uncached`
Cache hit ratio % over time	The ratio of the amount cached requests to all requests in percentage.	Dependent item	cloudflare.requests.cachehitratio Preprocessing JSON Path: `$.requests.cache_hit_ratio`
Response codes 1xx	The number requests with 1xx response codes.	Dependent item	cloudflare.requests.response_100 Preprocessing JSON Path: `$.requests.response_100`
Response codes 2xx	The number requests with 2xx response codes.	Dependent item	cloudflare.requests.response_200 Preprocessing JSON Path: `$.requests.response_200`
Response codes 3xx	The number requests with 3xx response codes.	Dependent item	cloudflare.requests.response_300 Preprocessing JSON Path: `$.requests.response_300`
Response codes 4xx	The number requests with 4xx response codes.	Dependent item	cloudflare.requests.response_400 Preprocessing JSON Path: `$.requests.response_400`
Response codes 5xx	The number requests with 5xx response codes.	Dependent item	cloudflare.requests.response_500 Preprocessing JSON Path: `$.requests.response_500`
Non-2xx responses ratio	The ratio of the amount requests with non-2xx response codes to all requests in percentage.	Dependent item	cloudflare.requests.others_ratio Preprocessing JSON Path: `$.requests.others_ratio`
2xx responses ratio	The ratio of the amount requests with 2xx response codes to all requests in percentage.	Dependent item	cloudflare.requests.success_ratio Preprocessing JSON Path: `$.requests.success_ratio`
SSL encrypted requests	The number of encrypted requests.	Dependent item	cloudflare.requests.ssl.encrypted Preprocessing JSON Path: `$.requests.encrypted`
Unencrypted requests	The number of unencrypted requests.	Dependent item	cloudflare.requests.ssl.unencrypted Preprocessing JSON Path: `$.requests.unencrypted`
Total threats	The number of all threats.	Dependent item	cloudflare.threats.all Preprocessing JSON Path: `$.threats.all`
Unique visitors	The number of all visitors IPs.	Dependent item	cloudflare.uniques.all Preprocessing JSON Path: `$.uniques.all`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Cloudflare: Cached bandwidth is too low		`max(/Cloudflare by HTTP/cloudflare.bandwidth.cache_hit_ratio,#3) < {$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN}`\|Warning
Cloudflare: Ratio of non-2xx responses is too high	A large number of errors can indicate a malfunction of the site.	`min(/Cloudflare by HTTP/cloudflare.requests.others_ratio,#3) > {$CLOUDFLARE.ERRORS.MAX.WARN}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_certificate_agent2_active

View README Download JSON

Website certificate by Zabbix agent 2 active

Overview

The template to monitor TLS/SSL certificate on the website by Zabbix agent 2 that works without any external scripts. Zabbix agent 2 with the WebCertificate plugin requests certificate using the web.certificate.get key and returns JSON with certificate attributes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Website TLS/SSL certificate

Configuration

Setup

1. Setup and configure zabbix-agent2 with the WebCertificate plugin.

2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]

3. Create a host for the TLS/SSL certificate with Zabbix agent interface.

4. Link the template to the host.

5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.

Macros used

Name	Description	Default
{$CERT.EXPIRY.WARN}	Number of days until the certificate expires.	`7`
{$CERT.WEBSITE.HOSTNAME}	The website DNS name for the connection.	`<Put DNS name>`
{$CERT.WEBSITE.PORT}	The TLS/SSL port number of the website.	`443`
{$CERT.WEBSITE.IP}	The website IP address for the connection.

Items

Name	Description	Type	Key and additional info
Get	Returns the JSON with attributes of a certificate of the requested site.	Zabbix agent (active)	web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing Discard unchanged with heartbeat: `6h`
Validation result	The certificate validation result. Possible values: valid/invalid/valid-but-self-signed	Dependent item	cert.validation Preprocessing JSON Path: `$.result.value`
Last validation status	Last check result message.	Dependent item	cert.message Preprocessing JSON Path: `$.result.message`
Version	The version of the encoded certificate.	Dependent item	cert.version Preprocessing JSON Path: `$.x509.version`
Serial number	The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero.	Dependent item	cert.serial_number Preprocessing JSON Path: `$.x509.serial_number`
Signature algorithm	The algorithm identifier for the algorithm used by the CA to sign the certificate.	Dependent item	cert.signature_algorithm Preprocessing JSON Path: `$.x509.signature_algorithm`
Issuer	The field identifies the entity that has signed and issued the certificate.	Dependent item	cert.issuer Preprocessing JSON Path: `$.x509.issuer`
Valid from	The date on which the certificate validity period begins.	Dependent item	cert.not_before Preprocessing JSON Path: `$.x509.not_before.timestamp`
Expires on	The date on which the certificate validity period ends.	Dependent item	cert.not_after Preprocessing JSON Path: `$.x509.not_after.timestamp`
Subject	The field identifies the entity associated with the public key stored in the subject public key field.	Dependent item	cert.subject Preprocessing JSON Path: `$.x509.subject`
Subject alternative name	The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI).	Dependent item	cert.alternative_names Preprocessing JSON Path: `$.x509.alternative_names`
Public key algorithm	The digital signature algorithm is used to verify the signature of a certificate.	Dependent item	cert.publickeyalgorithm Preprocessing JSON Path: `$.x509.public_key_algorithm`
Fingerprint	The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form.	Dependent item	cert.sha1_fingerprint Preprocessing JSON Path: `$.sha1_fingerprint`

Triggers

Name	Description	Expression	Severity
Certificate: SSL certificate is invalid	SSL certificate has expired or it is issued for another domain.	`find(/Website certificate by Zabbix agent 2 active/cert.validation,,"like","invalid")=1`\|High
Certificate: SSL certificate expires soon	The SSL certificate should be updated or it will become untrusted.	`(last(/Website certificate by Zabbix agent 2 active/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN}`\|Warning	Depends on: Certificate: SSL certificate is invalid
Certificate: Fingerprint has changed	The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Acknowledge to close the problem manually. There could be multiple valid certificates on some installations. In this case, the trigger will have a false positive. You can ignore it or disable the trigger.	`last(/Website certificate by Zabbix agent 2 active/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2 active/cert.sha1_fingerprint,#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_certificate_agent2

View README Download JSON

Website certificate by Zabbix agent 2

Overview

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Website TLS/SSL certificate

Configuration

Setup

1. Setup and configure zabbix-agent2 with the WebCertificate plugin.

2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]

3. Create a host for the TLS/SSL certificate with Zabbix agent interface.

4. Link the template to the host.

5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.

Macros used

Name	Description	Default
{$CERT.EXPIRY.WARN}	Number of days until the certificate expires.	`7`
{$CERT.WEBSITE.HOSTNAME}	The website DNS name for the connection.	`<Put DNS name>`
{$CERT.WEBSITE.PORT}	The TLS/SSL port number of the website.	`443`
{$CERT.WEBSITE.IP}	The website IP address for the connection.

Items

Name	Description	Type	Key and additional info
Get	Returns the JSON with attributes of a certificate of the requested site.	Zabbix agent	web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing Discard unchanged with heartbeat: `6h`
Validation result	The certificate validation result. Possible values: valid/invalid/valid-but-self-signed	Dependent item	cert.validation Preprocessing JSON Path: `$.result.value`
Last validation status	Last check result message.	Dependent item	cert.message Preprocessing JSON Path: `$.result.message`
Version	The version of the encoded certificate.	Dependent item	cert.version Preprocessing JSON Path: `$.x509.version`
Serial number	The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero.	Dependent item	cert.serial_number Preprocessing JSON Path: `$.x509.serial_number`
Signature algorithm	The algorithm identifier for the algorithm used by the CA to sign the certificate.	Dependent item	cert.signature_algorithm Preprocessing JSON Path: `$.x509.signature_algorithm`
Issuer	The field identifies the entity that has signed and issued the certificate.	Dependent item	cert.issuer Preprocessing JSON Path: `$.x509.issuer`
Valid from	The date on which the certificate validity period begins.	Dependent item	cert.not_before Preprocessing JSON Path: `$.x509.not_before.timestamp`
Expires on	The date on which the certificate validity period ends.	Dependent item	cert.not_after Preprocessing JSON Path: `$.x509.not_after.timestamp`
Subject	The field identifies the entity associated with the public key stored in the subject public key field.	Dependent item	cert.subject Preprocessing JSON Path: `$.x509.subject`
Subject alternative name	The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI).	Dependent item	cert.alternative_names Preprocessing JSON Path: `$.x509.alternative_names`
Public key algorithm	The digital signature algorithm is used to verify the signature of a certificate.	Dependent item	cert.publickeyalgorithm Preprocessing JSON Path: `$.x509.public_key_algorithm`
Fingerprint	The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form.	Dependent item	cert.sha1_fingerprint Preprocessing JSON Path: `$.sha1_fingerprint`

Triggers

Name	Description	Expression	Severity
Certificate: SSL certificate is invalid	SSL certificate has expired or it is issued for another domain.	`find(/Website certificate by Zabbix agent 2/cert.validation,,"like","invalid")=1`\|High
Certificate: SSL certificate expires soon	The SSL certificate should be updated or it will become untrusted.	`(last(/Website certificate by Zabbix agent 2/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN}`\|Warning	Depends on: Certificate: SSL certificate is invalid
Certificate: Fingerprint has changed	The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Acknowledge to close the problem manually. There could be multiple valid certificates on some installations. In this case, the trigger will have a false positive. You can ignore it or disable the trigger.	`last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint,#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_ceph_agent2

View README Download JSON

Ceph by Zabbix agent 2

Overview

The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Ceph by Zabbix agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Ceph 14.2

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
Set the {$CEPH.CONNSTRING} such as
Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Zabbix agent configuration file.

Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Macros used

Name	Description	Default
{$CEPH.USER}		`zabbix`
{$CEPH.API.KEY}		`zabbix_pass`
{$CEPH.CONNSTRING}		`https://localhost:8003`

Items

Name	Description	Type	Key and additional info
Get overall cluster status		Zabbix agent	ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get OSD stats		Zabbix agent	ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get OSD dump		Zabbix agent	ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get df		Zabbix agent	ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ping		Zabbix agent	ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] Preprocessing Discard unchanged with heartbeat: `30m`
Number of Monitors	The number of Monitors configured in a Ceph cluster.	Dependent item	ceph.num_mon Preprocessing JSON Path: `$.num_mon` Discard unchanged with heartbeat: `30m`
Overall cluster status	The overall Ceph cluster status, eg 0 - HEALTHOK, 1 - HEALTHWARN or 2 - HEALTH_ERR.	Dependent item	ceph.overall_status Preprocessing JSON Path: `$.overall_status` Discard unchanged with heartbeat: `10m`
Minimum Mon release version	minmonrelease_name	Dependent item	ceph.minmonrelease_name Preprocessing JSON Path: `$.min_mon_release_name` Discard unchanged with heartbeat: `1h`
Ceph Read bandwidth	The global read bytes per second.	Dependent item	ceph.rd_bytes.rate Preprocessing JSON Path: `$.rd_bytes` Change per second
Ceph Write bandwidth	The global write bytes per second.	Dependent item	ceph.wr_bytes.rate Preprocessing JSON Path: `$.wr_bytes` Change per second
Ceph Read operations per sec	The global read operations per second.	Dependent item	ceph.rd_ops.rate Preprocessing JSON Path: `$.rd_ops` Change per second
Ceph Write operations per sec	The global write operations per second.	Dependent item	ceph.wr_ops.rate Preprocessing JSON Path: `$.wr_ops` Change per second
Total bytes available	The total bytes available in a Ceph cluster.	Dependent item	ceph.totalavailbytes Preprocessing JSON Path: `$.total_avail_bytes`
Total bytes	The total (RAW) capacity of a Ceph cluster in bytes.	Dependent item	ceph.total_bytes Preprocessing JSON Path: `$.total_bytes`
Total bytes used	The total bytes used in a Ceph cluster.	Dependent item	ceph.totalusedbytes Preprocessing JSON Path: `$.total_used_bytes`
Total number of objects	The total number of objects in a Ceph cluster.	Dependent item	ceph.total_objects Preprocessing JSON Path: `$.total_objects`
Number of Placement Groups	The total number of Placement Groups in a Ceph cluster.	Dependent item	ceph.num_pg Preprocessing JSON Path: `$.num_pg` Discard unchanged with heartbeat: `10m`
Number of Placement Groups in Temporary state	The total number of Placement Groups in a pg_temp state	Dependent item	ceph.numpgtemp Preprocessing JSON Path: `$.num_pg_temp`
Number of Placement Groups in Active state	The total number of Placement Groups in an active state.	Dependent item	ceph.pg_states.active Preprocessing JSON Path: `$.pg_states.active`
Number of Placement Groups in Clean state	The total number of Placement Groups in a clean state.	Dependent item	ceph.pg_states.clean Preprocessing JSON Path: `$.pg_states.clean`
Number of Placement Groups in Peering state	The total number of Placement Groups in a peering state.	Dependent item	ceph.pg_states.peering Preprocessing JSON Path: `$.pg_states.peering`
Number of Placement Groups in Scrubbing state	The total number of Placement Groups in a scrubbing state.	Dependent item	ceph.pg_states.scrubbing Preprocessing JSON Path: `$.pg_states.scrubbing`
Number of Placement Groups in Undersized state	The total number of Placement Groups in an undersized state.	Dependent item	ceph.pg_states.undersized Preprocessing JSON Path: `$.pg_states.undersized`
Number of Placement Groups in Backfilling state	The total number of Placement Groups in a backfill state.	Dependent item	ceph.pg_states.backfilling Preprocessing JSON Path: `$.pg_states.backfilling`
Number of Placement Groups in degraded state	The total number of Placement Groups in a degraded state.	Dependent item	ceph.pg_states.degraded Preprocessing JSON Path: `$.pg_states.degraded`
Number of Placement Groups in inconsistent state	The total number of Placement Groups in an inconsistent state.	Dependent item	ceph.pg_states.inconsistent Preprocessing JSON Path: `$.pg_states.inconsistent`
Number of Placement Groups in Unknown state	The total number of Placement Groups in an unknown state.	Dependent item	ceph.pg_states.unknown Preprocessing JSON Path: `$.pg_states.unknown`
Number of Placement Groups in remapped state	The total number of Placement Groups in a remapped state.	Dependent item	ceph.pg_states.remapped Preprocessing JSON Path: `$.pg_states.remapped`
Number of Placement Groups in recovering state	The total number of Placement Groups in a recovering state.	Dependent item	ceph.pg_states.recovering Preprocessing JSON Path: `$.pg_states.recovering`
Number of Placement Groups in backfill_toofull state	The total number of Placement Groups in a backfill_toofull state.	Dependent item	ceph.pgstates.backfilltoofull Preprocessing JSON Path: `$.pg_states.backfill_toofull`
Number of Placement Groups in backfill_wait state	The total number of Placement Groups in a backfill_wait state.	Dependent item	ceph.pgstates.backfillwait Preprocessing JSON Path: `$.pg_states.backfill_wait`
Number of Placement Groups in recovery_wait state	The total number of Placement Groups in a recovery_wait state.	Dependent item	ceph.pgstates.recoverywait Preprocessing JSON Path: `$.pg_states.recovery_wait`
Number of Pools	The total number of pools in a Ceph cluster.	Dependent item	ceph.num_pools Preprocessing JSON Path: `$.num_pools`
Number of OSDs	The number of the known storage daemons in a Ceph cluster.	Dependent item	ceph.num_osd Preprocessing JSON Path: `$.num_osd` Discard unchanged with heartbeat: `10m`
Number of OSDs in state: UP	The total number of the online storage daemons in a Ceph cluster.	Dependent item	ceph.numosdup Preprocessing JSON Path: `$.num_osd_up` Discard unchanged with heartbeat: `10m`
Number of OSDs in state: IN	The total number of the participating storage daemons in a Ceph cluster.	Dependent item	ceph.numosdin Preprocessing JSON Path: `$.num_osd_in` Discard unchanged with heartbeat: `10m`
Ceph OSD avg fill	The average fill of OSDs.	Dependent item	ceph.osd_fill.avg Preprocessing JSON Path: `$.osd_fill.avg`
Ceph OSD max fill	The percentage of the most filled OSD.	Dependent item	ceph.osd_fill.max Preprocessing JSON Path: `$.osd_fill.max`
Ceph OSD min fill	The percentage fill of the minimum filled OSD.	Dependent item	ceph.osd_fill.min Preprocessing JSON Path: `$.osd_fill.min`
Ceph OSD max PGs	The maximum amount of Placement Groups on OSDs.	Dependent item	ceph.osd_pgs.max Preprocessing JSON Path: `$.osd_pgs.max`
Ceph OSD min PGs	The minimum amount of Placement Groups on OSDs.	Dependent item	ceph.osd_pgs.min Preprocessing JSON Path: `$.osd_pgs.min`
Ceph OSD avg PGs	The average amount of Placement Groups on OSDs.	Dependent item	ceph.osd_pgs.avg Preprocessing JSON Path: `$.osd_pgs.avg`
Ceph OSD Apply latency Avg	The average apply latency of OSDs.	Dependent item	ceph.osdlatencyapply.avg Preprocessing JSON Path: `$.osd_latency_apply.avg`
Ceph OSD Apply latency Max	The maximum apply latency of OSDs.	Dependent item	ceph.osdlatencyapply.max Preprocessing JSON Path: `$.osd_latency_apply.max`
Ceph OSD Apply latency Min	The minimum apply latency of OSDs.	Dependent item	ceph.osdlatencyapply.min Preprocessing JSON Path: `$.osd_latency_apply.min`
Ceph OSD Commit latency Avg	The average commit latency of OSDs.	Dependent item	ceph.osdlatencycommit.avg Preprocessing JSON Path: `$.osd_latency_commit.avg`
Ceph OSD Commit latency Max	The maximum commit latency of OSDs.	Dependent item	ceph.osdlatencycommit.max Preprocessing JSON Path: `$.osd_latency_commit.max`
Ceph OSD Commit latency Min	The minimum commit latency of OSDs.	Dependent item	ceph.osdlatencycommit.min Preprocessing JSON Path: `$.osd_latency_commit.min`
Ceph backfill full ratio	The backfill full ratio setting of the Ceph cluster as configured on OSDMap.	Dependent item	ceph.osdbackfillfullratio Preprocessing JSON Path: `$.osd_backfillfull_ratio` Discard unchanged with heartbeat: `10m`
Ceph full ratio	The full ratio setting of the Ceph cluster as configured on OSDMap.	Dependent item	ceph.osdfullratio Preprocessing JSON Path: `$.osd_full_ratio` Discard unchanged with heartbeat: `10m`
Ceph nearfull ratio	The near full ratio setting of the Ceph cluster as configured on OSDMap.	Dependent item	ceph.osdnearfullratio Preprocessing JSON Path: `$.osd_nearfull_ratio` Discard unchanged with heartbeat: `10m`

Triggers

Name	Description	Expression	Severity
Ceph: Can not connect to cluster	The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues).	`last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0`\|Average
Ceph: Cluster in ERROR state		`last(/Ceph by Zabbix agent 2/ceph.overall_status)=2`\|Average	Manual close: Yes
Ceph: Cluster in WARNING state		`last(/Ceph by Zabbix agent 2/ceph.overall_status)=1`\|Warning	Manual close: Yes Depends on: Ceph: Cluster in ERROR state
Ceph: Minimum monitor release version has changed	A Ceph version has changed. Acknowledge to close the problem manually.	`last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0`\|Info	Manual close: Yes

LLD rule OSD

Name	Description	Type	Key and additional info
OSD		Zabbix agent	ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for OSD

Name	Description	Type	Key and additional info
[osd.{#OSDNAME}] OSD in		Dependent item	ceph.osd[{#OSDNAME},in] Preprocessing JSON Path: `$.osds.{#OSDNAME}.in` Discard unchanged with heartbeat: `10m`
[osd.{#OSDNAME}] OSD up		Dependent item	ceph.osd[{#OSDNAME},up] Preprocessing JSON Path: `$.osds.{#OSDNAME}.up` Discard unchanged with heartbeat: `10m`
[osd.{#OSDNAME}] OSD PGs		Dependent item	ceph.osd[{#OSDNAME},num_pgs] Preprocessing JSON Path: `$.osds.{#OSDNAME}.num_pgs` ⛔️Custom on fail: Discard value
[osd.{#OSDNAME}] OSD fill		Dependent item	ceph.osd[{#OSDNAME},fill] Preprocessing JSON Path: `$.osds.{#OSDNAME}.osd_fill` ⛔️Custom on fail: Discard value
[osd.{#OSDNAME}] OSD latency apply	The time taken to flush an update to disks.	Dependent item	ceph.osd[{#OSDNAME},latency_apply] Preprocessing JSON Path: `$.osds.{#OSDNAME}.osd_latency_apply` ⛔️Custom on fail: Discard value
[osd.{#OSDNAME}] OSD latency commit	The time taken to commit an operation to the journal.	Dependent item	ceph.osd[{#OSDNAME},latency_commit] Preprocessing JSON Path: `$.osds.{#OSDNAME}.osd_latency_commit` ⛔️Custom on fail: Discard value

Trigger prototypes for OSD

Name	Description	Expression	Severity
Ceph: OSD osd.{#OSDNAME} is down	OSD osd.{#OSDNAME} is marked "down" in the osdmap. The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network.	`last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0`\|Average
Ceph: OSD osd.{#OSDNAME} is full		`min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100`\|Average
Ceph: Ceph OSD osd.{#OSDNAME} is near full		`min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100`\|Warning	Depends on: Ceph: OSD osd.{#OSDNAME} is full

LLD rule Pool

Name	Description	Type	Key and additional info
Pool		Zabbix agent	ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for Pool

Name	Description	Type	Key and additional info
[{#POOLNAME}] Pool Used	The total bytes used in a pool.	Dependent item	ceph.pool["{#POOLNAME}",bytes_used] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].bytes_used`
[{#POOLNAME}] Max available	The maximum available space in the given pool.	Dependent item	ceph.pool["{#POOLNAME}",max_avail] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].max_avail`
[{#POOLNAME}] Pool RAW Used	Bytes used in pool including the copies made.	Dependent item	ceph.pool["{#POOLNAME}",stored_raw] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].stored_raw`
[{#POOLNAME}] Pool Percent Used	The percentage of the storage used per pool.	Dependent item	ceph.pool["{#POOLNAME}",percent_used] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].percent_used`
[{#POOLNAME}] Pool objects	The number of objects in the pool.	Dependent item	ceph.pool["{#POOLNAME}",objects] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].objects`
[{#POOLNAME}] Pool Read bandwidth	The read rate per pool (bytes per second).	Dependent item	ceph.pool["{#POOLNAME}",rd_bytes.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].rd_bytes` Change per second
[{#POOLNAME}] Pool Write bandwidth	The write rate per pool (bytes per second).	Dependent item	ceph.pool["{#POOLNAME}",wr_bytes.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].wr_bytes` Change per second
[{#POOLNAME}] Pool Read operations	The read rate per pool (operations per second).	Dependent item	ceph.pool["{#POOLNAME}",rd_ops.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].rd_ops` Change per second
[{#POOLNAME}] Pool Write operations	The write rate per pool (operations per second).	Dependent item	ceph.pool["{#POOLNAME}",wr_ops.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].wr_ops` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_aranet_http

View README Download JSON

Aranet Cloud

Overview

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Aranet Cloud

Configuration

Setup

Refer to the vendor documentation.

Macros used

Name	Description	Default
{$ARANET.API.ENDPOINT}	Aranet Cloud API endpoint.	`https://aranet.cloud/api`
{$ARANET.API.USERNAME}	Aranet Cloud username.	`<PUT YOUR USERNAME>`
{$ARANET.API.PASSWORD}	Aranet Cloud password.	`<PUT YOUR PASSWORD>`
{$ARANET.API.SPACE_NAME}	Aranet Cloud organization name.	`<PUT YOUR SPACE NAME>`
{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES}	Filter of discoverable sensors by name.	`.+`
{$ARANET.LLD.FILTER.SENSORNAME.NOTMATCHES}	Filter to exclude discoverable sensors by name.	`CHANGE_IF_NEEDED`
{$ARANET.LLD.FILTER.SENSOR_ID.MATCHES}	Filter of discoverable sensors by id.	`.+`
{$ARANET.LLD.FILTER.GATEWAY_NAME.MATCHES}	Filter of discoverable sensors by gateway name.	`.+`
{$ARANET.LLD.FILTER.GATEWAYNAME.NOTMATCHES}	Filter to exclude discoverable sensors by gateway name.	`CHANGE_IF_NEEDED`
{$ARANET.LLD.FILTER.GATEWAY_ID.MATCHES}	Filter of discoverable sensors by gateway id.	`.+`
{$ARANET.BATT.VOLTAGE.MIN.WARN}	Battery voltage warning threshold.	`1`
{$ARANET.BATT.VOLTAGE.MIN.CRIT}	Battery voltage critical threshold.	`2`
{$ARANET.HUMIDITY.MIN.WARN}	Minimum humidity threshold.	`20`
{$ARANET.HUMIDITY.MAX.WARN}	Maximum humidity threshold.	`70`
{$ARANET.CO2.MAX.WARN}	CO2 warning threshold.	`600`
{$ARANET.CO2.MAX.CRIT}	CO2 critical threshold.	`1000`
{$ARANET.LAST_UPDATE.MAX.WARN}	Data update delay threshold.	`1h`

Items

Name Description Type Key and additional info

Sensors discovery

Discovery for Aranet Cloud sensors

Dependent item

aranet.sensor.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 15m

Get data Script aranet.get_data

LLD rule Temperature discovery

Name	Description	Type	Key and additional info
Temperature discovery	Discovery for Aranet Cloud temperature sensors	Dependent item	aranet.temp.discovery

Item prototypes for Temperature discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.temp["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Humidity discovery

Name	Description	Type	Key and additional info
Humidity discovery	Discovery for Aranet Cloud humidity sensors	Dependent item	aranet.humidity.discovery

Item prototypes for Humidity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.humidity["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for Humidity discovery

Name	Description	Expression	Severity	Dependencies and additional info
Aranet: {#METRIC}: Low humidity on "[{#GATEWAYNAME}] {#SENSORNAME}"		`max(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.HUMIDITY.MIN.WARN:"{#SENSOR_NAME}"}`\|Warning	Depends on: Aranet: {#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}"
Aranet: {#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}"		`min(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.HUMIDITY.MAX.WARN:"{#SENSOR_NAME}"}`\|High

LLD rule RSSI discovery

Name	Description	Type	Key and additional info
RSSI discovery	Discovery for Aranet Cloud RSSI sensors	Dependent item	aranet.rssi.discovery

Item prototypes for RSSI discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.rssi["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Battery voltage discovery

Name	Description	Type	Key and additional info
Battery voltage discovery	Discovery for Aranet Cloud Battery voltage sensors	Dependent item	aranet.battery.voltage.discovery

Item prototypes for Battery voltage discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.battery.voltage["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for Battery voltage discovery

Name	Description	Expression	Severity	Dependencies and additional info
Aranet: {#METRIC}: Low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}"		`max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.WARN:"{#SENSOR_NAME}"}`\|Warning	Depends on: Aranet: {#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}"
Aranet: {#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}"		`max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.CRIT:"{#SENSOR_NAME}"}`\|High

LLD rule CO2 discovery

Name	Description	Type	Key and additional info
CO2 discovery	Discovery for Aranet Cloud CO2 sensors	Dependent item	aranet.co2.discovery

Item prototypes for CO2 discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.co2["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for CO2 discovery

Name	Description	Expression	Severity	Dependencies and additional info
Aranet: {#METRIC}: High CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}"		`min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.WARN:"{#SENSOR_NAME}"}`\|Warning	Depends on: Aranet: {#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}"
Aranet: {#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}"		`min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.CRIT:"{#SENSOR_NAME}"}`\|High

LLD rule Atmospheric pressure discovery

Name	Description	Type	Key and additional info
Atmospheric pressure discovery	Discovery for Aranet Cloud atmospheric pressure sensors	Dependent item	aranet.pressure.discovery

Item prototypes for Atmospheric pressure discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.pressure["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Voltage discovery

Name	Description	Type	Key and additional info
Voltage discovery	Discovery for Aranet Cloud Voltage sensors	Dependent item	aranet.voltage.discovery

Item prototypes for Voltage discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.voltage["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Weight discovery

Name	Description	Type	Key and additional info
Weight discovery	Discovery for Aranet Cloud Weight sensors	Dependent item	aranet.weight.discovery

Item prototypes for Weight discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.weight["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Volumetric Water Content discovery

Name	Description	Type	Key and additional info
Volumetric Water Content discovery	Discovery for Aranet Cloud Volumetric Water Content sensors	Dependent item	aranet.volumwatercontent.discovery

Item prototypes for Volumetric Water Content discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.volumetric.water.content["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule PPFD discovery

Name	Description	Type	Key and additional info
PPFD discovery	Discovery for Aranet Cloud PPFD sensors	Dependent item	aranet.ppfd.discovery

Item prototypes for PPFD discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.ppfd["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Distance discovery

Name	Description	Type	Key and additional info
Distance discovery	Discovery for Aranet Cloud Distance sensors	Dependent item	aranet.distance.discovery

Item prototypes for Distance discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.distance["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Illuminance discovery

Name	Description	Type	Key and additional info
Illuminance discovery	Discovery for Aranet Cloud Illuminance sensors	Dependent item	aranet.illuminance.discovery

Item prototypes for Illuminance discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.illuminance["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule pH discovery

Name	Description	Type	Key and additional info
pH discovery	Discovery for Aranet Cloud pH sensors	Dependent item	aranet.ph.discovery

Item prototypes for pH discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.ph["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Current discovery

Name	Description	Type	Key and additional info
Current discovery	Discovery for Aranet Cloud Current sensors	Dependent item	aranet.current.discovery

Item prototypes for Current discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.current["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Soil Dielectric Permittivity discovery

Name	Description	Type	Key and additional info
Soil Dielectric Permittivity discovery	Discovery for Aranet Cloud Soil Dielectric Permittivity sensors	Dependent item	aranet.soildielectricperm.discovery

Item prototypes for Soil Dielectric Permittivity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.soildielectricperm["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Soil Electrical Conductivity discovery

Name	Description	Type	Key and additional info
Soil Electrical Conductivity discovery	Discovery for Aranet Cloud Soil Electrical Conductivity sensors	Dependent item	aranet.soilelectriccond.discovery

Item prototypes for Soil Electrical Conductivity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.soilelectriccond["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Pore Electrical Conductivity discovery

Name	Description	Type	Key and additional info
Pore Electrical Conductivity discovery	Discovery for Aranet Cloud Pore Electrical Conductivity sensors	Dependent item	aranet.poreelectriccond.discovery

Item prototypes for Pore Electrical Conductivity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.poreelectriccond["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Pulses discovery

Name	Description	Type	Key and additional info
Pulses discovery	Discovery for Aranet Cloud Pulses sensors	Dependent item	aranet.pulses.discovery

Item prototypes for Pulses discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.pulses["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Pulses Cumulative discovery

Name	Description	Type	Key and additional info
Pulses Cumulative discovery	Discovery for Aranet Cloud Pulses Cumulative sensors	Dependent item	aranet.pulses_cumulative.discovery

Item prototypes for Pulses Cumulative discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.pulsescumulative["{#GATEWAYID}", "{#SENSOR_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Differential Pressure discovery

Name	Description	Type	Key and additional info
Differential Pressure discovery	Discovery for Aranet Cloud Differential Pressure sensors	Dependent item	aranet.diff_pressure.discovery

Item prototypes for Differential Pressure discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.diffpressure["{#GATEWAYID}", "{#SENSOR_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Last update discovery

Name	Description	Type	Key and additional info
Last update discovery	Discovery for Aranet Cloud Last update metric	Dependent item	aranet.last_update.discovery

Item prototypes for Last update discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.lastupdate["{#GATEWAYID}", "{#SENSOR_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Last update discovery

Name	Description	Expression	Severity	Dependencies and additional info
Aranet: {#METRIC}: Sensor data "[{#GATEWAYNAME}] {#SENSORNAME}" is not updated		`last(/Aranet Cloud/aranet.last_update["{#GATEWAY_ID}", "{#SENSOR_ID}"]) > {$ARANET.LAST_UPDATE.MAX.WARN:"{#SENSOR_NAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_apache_http

View README Download JSON

Apache by HTTP

Overview

This template is designed for the effortless deployment of Apache monitoring by Zabbix via HTTP and doesn't require any external scripts.

The template collects metrics by polling mod_status with HTTP agent remotely:

127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache 2.4.41

Configuration

Setup

See the setup instructions for mod_status.

Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module

This is an example configuration of the Apache web server:

<Location "/server-status">
  SetHandler server-status
  Require host example.com
</Location>

Set the hostname or IP address of the Apache status page host in the {$APACHE.STATUS.HOST} macro. You can also change the status page port in the {$APACHE.STATUS.PORT} macro and status page path in the {$APACHE.STATUS.PATH} macro if necessary.

Macros used

Name	Description	Default
{$APACHE.STATUS.HOST}	The hostname or IP address of the Apache status page host.	`<SET APACHE HOST>`
{$APACHE.STATUS.PORT}	The port of the Apache status page.	`80`
{$APACHE.STATUS.PATH}	The URL path.	`server-status?auto`
{$APACHE.STATUS.SCHEME}	The request scheme, which may be either HTTP or HTTPS.	`http`
{$APACHE.RESPONSE_TIME.MAX.WARN}	The maximum Apache response time expressed in seconds for a trigger expression.	`10`

Items

Name	Description	Type	Key and additional info
Get status	Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status.	HTTP agent	apache.get_status Preprocessing JavaScript: `The text is too long. Please see the template.`
Service ping		Simple check	net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Simple check	net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"]
Total bytes	The total bytes served.	Dependent item	apache.bytes Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024`
Bytes per second	It is calculated as a rate of change for total bytes statistics. `BytesPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.bytes.rate Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024` Change per second
Requests per second	It is calculated as a rate of change for the "Total requests" statistics. `ReqPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.requests.rate Preprocessing JSON Path: `$["Total Accesses"]` Change per second
Total requests	The total number of the Apache server accesses.	Dependent item	apache.requests Preprocessing JSON Path: `$["Total Accesses"]`
Uptime	The service uptime expressed in seconds.	Dependent item	apache.uptime Preprocessing JSON Path: `$.ServerUptimeSeconds`
Version	The Apache service version.	Dependent item	apache.version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Total workers busy	The total number of busy worker threads/processes.	Dependent item	apache.workers_total.busy Preprocessing JSON Path: `$.BusyWorkers`
Total workers idle	The total number of idle worker threads/processes.	Dependent item	apache.workers_total.idle Preprocessing JSON Path: `$.IdleWorkers`
Workers closing connection	The number of workers in closing state.	Dependent item	apache.workers.closing Preprocessing JSON Path: `$.Workers.closing`
Workers DNS lookup	The number of workers in `dnslookup` state.	Dependent item	apache.workers.dnslookup Preprocessing JSON Path: `$.Workers.dnslookup`
Workers finishing	The number of workers in finishing state.	Dependent item	apache.workers.finishing Preprocessing JSON Path: `$.Workers.finishing`
Workers idle cleanup	The number of workers in cleanup state.	Dependent item	apache.workers.cleanup Preprocessing JSON Path: `$.Workers.cleanup`
Workers keepalive (read)	The number of workers in `keepalive` state.	Dependent item	apache.workers.keepalive Preprocessing JSON Path: `$.Workers.keepalive`
Workers logging	The number of workers in logging state.	Dependent item	apache.workers.logging Preprocessing JSON Path: `$.Workers.logging`
Workers reading request	The number of workers in reading state.	Dependent item	apache.workers.reading Preprocessing JSON Path: `$.Workers.reading`
Workers sending reply	The number of workers in sending state.	Dependent item	apache.workers.sending Preprocessing JSON Path: `$.Workers.sending`
Workers slot with no current process	The number of slots with no current process.	Dependent item	apache.workers.slot Preprocessing JSON Path: `$.Workers.slot`
Workers starting up	The number of workers in starting state.	Dependent item	apache.workers.starting Preprocessing JSON Path: `$.Workers.starting`
Workers waiting for connection	The number of workers in waiting state.	Dependent item	apache.workers.waiting Preprocessing JSON Path: `$.Workers.waiting`

Triggers

Name	Description	Expression	Severity
Apache: Failed to fetch status page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Apache by HTTP/apache.get_status,30m)=1`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Service is down		`last(/Apache by HTTP/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0`\|Average	Manual close: Yes
Apache: Service response time is too high		`min(/Apache by HTTP/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Service has been restarted	Uptime is less than 10 minutes.	`last(/Apache by HTTP/apache.uptime)<10m`\|Info	Manual close: Yes
Apache: Version has changed	Apache version has changed. Acknowledge to close the problem manually.	`last(/Apache by HTTP/apache.version,#1)<>last(/Apache by HTTP/apache.version,#2) and length(last(/Apache by HTTP/apache.version))>0`\|Info	Manual close: Yes

LLD rule Event MPM discovery

Name Description Type Key and additional info

Event MPM discovery

The discovery of additional metrics if the event Multi-Processing Module (MPM) is used.

For more details see Apache MPM event.

Dependent item

apache.mpm.event.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Event MPM discovery

Name	Description	Type	Key and additional info
Connections async closing	The number of asynchronous connections in closing state (applicable only to the event MPM).	Dependent item	apache.connections[async_closing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncClosing`
Connections async keepalive	The number of asynchronous connections in keepalive state (applicable only to the event MPM).	Dependent item	apache.connections[asynckeepalive{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncKeepAlive`
Connections async writing	The number of asynchronous connections in writing state (applicable only to the event MPM).	Dependent item	apache.connections[async_writing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncWriting`
Connections total	The number of total connections.	Dependent item	apache.connections[total{#SINGLETON}] Preprocessing JSON Path: `$.ConnsTotal`
Bytes per request	The average number of client requests per second.	Dependent item	apache.bytes[per_request{#SINGLETON}] Preprocessing JSON Path: `$.BytesPerReq`
Number of async processes	The number of asynchronous processes.	Dependent item	apache.process[num{#SINGLETON}] Preprocessing JSON Path: `$.Processes`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_apache_agent_active

View README Download JSON

Apache by Zabbix agent active

Overview

This template is designed for the effortless deployment of Apache monitoring by Zabbix via Zabbix agent and doesn't require any external scripts. The template Apache by Zabbix agent - collects metrics by polling mod_status locally with Zabbix agent:

127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...

It also uses Zabbix agent to collect Apache Linux process statistics such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache 2.4.41

Configuration

Setup

See the setup instructions for mod_status.

Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module

This is an example configuration of the Apache web server:

<Location "/server-status">
  SetHandler server-status
  Require host example.com
</Location>

If you use another path, then do not forget to change the {$APACHE.STATUS.PATH} macro. Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$APACHE.STATUS.HOST}	The hostname or IP address of the Apache status page.	`127.0.0.1`
{$APACHE.STATUS.PORT}	The port of the Apache status page.	`80`
{$APACHE.STATUS.PATH}	The URL path.	`server-status?auto`
{$APACHE.STATUS.SCHEME}	The request scheme, which may be either HTTP or HTTPS.	`http`
{$APACHE.RESPONSE_TIME.MAX.WARN}	The maximum Apache response time expressed in seconds for a trigger expression.	`10`
{$APACHE.PROCESS_NAME}	The process name filter for the Apache process discovery.	`(httpd\|apache2)`
{$APACHE.PROCESS.NAME.PARAMETER}	The process name of the Apache web server used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Get status	Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status.	Zabbix agent (active)	web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing JavaScript: `The text is too long. Please see the template.`
Service ping		Zabbix agent (active)	net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Zabbix agent (active)	net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"]
Total bytes	The total bytes served.	Dependent item	apache.bytes Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024`
Bytes per second	It is calculated as a rate of change for total bytes statistics. `BytesPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.bytes.rate Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024` Change per second
Requests per second	It is calculated as a rate of change for the "Total requests" statistics. `ReqPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.requests.rate Preprocessing JSON Path: `$["Total Accesses"]` Change per second
Total requests	The total number of the Apache server accesses.	Dependent item	apache.requests Preprocessing JSON Path: `$["Total Accesses"]`
Uptime	The service uptime expressed in seconds.	Dependent item	apache.uptime Preprocessing JSON Path: `$.ServerUptimeSeconds`
Version	The Apache service version.	Dependent item	apache.version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Total workers busy	The total number of busy worker threads/processes.	Dependent item	apache.workers_total.busy Preprocessing JSON Path: `$.BusyWorkers`
Total workers idle	The total number of idle worker threads/processes.	Dependent item	apache.workers_total.idle Preprocessing JSON Path: `$.IdleWorkers`
Workers closing connection	The number of workers in closing state.	Dependent item	apache.workers.closing Preprocessing JSON Path: `$.Workers.closing`
Workers DNS lookup	The number of workers in `dnslookup` state.	Dependent item	apache.workers.dnslookup Preprocessing JSON Path: `$.Workers.dnslookup`
Workers finishing	The number of workers in finishing state.	Dependent item	apache.workers.finishing Preprocessing JSON Path: `$.Workers.finishing`
Workers idle cleanup	The number of workers in cleanup state.	Dependent item	apache.workers.cleanup Preprocessing JSON Path: `$.Workers.cleanup`
Workers keepalive (read)	The number of workers in `keepalive` state.	Dependent item	apache.workers.keepalive Preprocessing JSON Path: `$.Workers.keepalive`
Workers logging	The number of workers in logging state.	Dependent item	apache.workers.logging Preprocessing JSON Path: `$.Workers.logging`
Workers reading request	The number of workers in reading state.	Dependent item	apache.workers.reading Preprocessing JSON Path: `$.Workers.reading`
Workers sending reply	The number of workers in sending state.	Dependent item	apache.workers.sending Preprocessing JSON Path: `$.Workers.sending`
Workers slot with no current process	The number of slots with no current process.	Dependent item	apache.workers.slot Preprocessing JSON Path: `$.Workers.slot`
Workers starting up	The number of workers in starting state.	Dependent item	apache.workers.starting Preprocessing JSON Path: `$.Workers.starting`
Workers waiting for connection	The number of workers in waiting state.	Dependent item	apache.workers.waiting Preprocessing JSON Path: `$.Workers.waiting`
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent (active)	proc.get[{$APACHE.PROCESS.NAME.PARAMETER},,,summary]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Apache: Service has been restarted	Uptime is less than 10 minutes.	`last(/Apache by Zabbix agent active/apache.uptime)<10m`\|Info	Manual close: Yes
Apache: Version has changed	Apache version has changed. Acknowledge to close the problem manually.	`last(/Apache by Zabbix agent active/apache.version,#1)<>last(/Apache by Zabbix agent active/apache.version,#2) and length(last(/Apache by Zabbix agent active/apache.version))>0`\|Info	Manual close: Yes

LLD rule Event MPM discovery

Name Description Type Key and additional info

Event MPM discovery

The discovery of additional metrics if the event Multi-Processing Module (MPM) is used.

For more details see Apache MPM event.

Dependent item

apache.mpm.event.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Event MPM discovery

Name	Description	Type	Key and additional info
Connections async closing	The number of asynchronous connections in closing state (applicable only to the event MPM).	Dependent item	apache.connections[async_closing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncClosing`
Connections async keepalive	The number of asynchronous connections in keepalive state (applicable only to the event MPM).	Dependent item	apache.connections[asynckeepalive{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncKeepAlive`
Connections async writing	The number of asynchronous connections in writing state (applicable only to the event MPM).	Dependent item	apache.connections[async_writing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncWriting`
Connections total	The number of total connections.	Dependent item	apache.connections[total{#SINGLETON}] Preprocessing JSON Path: `$.ConnsTotal`
Bytes per request	The average number of client requests per second.	Dependent item	apache.bytes[per_request{#SINGLETON}] Preprocessing JSON Path: `$.BytesPerReq`
Number of async processes	The number of asynchronous processes.	Dependent item	apache.process[num{#SINGLETON}] Preprocessing JSON Path: `$.Processes`

LLD rule Apache process discovery

Name	Description	Type	Key and additional info
Apache process discovery	The discovery of the Apache process summary.	Dependent item	apache.proc.discovery

Item prototypes for Apache process discovery

Name	Description	Type	Key and additional info
CPU utilization	The percentage of the CPU utilization by a process {#APACHE.NAME}.	Zabbix agent (active)	proc.cpu.util[{#APACHE.NAME}]
Get process data	The summary metrics aggregated by a process {#APACHE.NAME}.	Dependent item	apache.proc.get[{#APACHE.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#APACHE.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#APACHE.NAME} data`
Memory usage (rss)	The summary of resident set size memory used by a process {#APACHE.NAME} expressed in bytes.	Dependent item	apache.proc.rss[{#APACHE.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage (vsize)	The summary of virtual memory used by a process {#APACHE.NAME} expressed in bytes.	Dependent item	apache.proc.vmem[{#APACHE.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process {#APACHE.NAME}.	Dependent item	apache.proc.pmem[{#APACHE.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Number of running processes	The number of running processes {#APACHE.NAME}.	Dependent item	apache.proc.num[{#APACHE.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Apache process discovery

Name	Description	Expression	Severity
Apache: Process is not running		`last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])=0`\|High
Apache: Service is down		`last(/Apache by Zabbix agent active/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])>0`\|Average	Manual close: Yes
Apache: Failed to fetch status page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Apache by Zabbix agent active/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])>0`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Service response time is too high		`min(/Apache by Zabbix agent active/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])>0`\|Warning	Manual close: Yes Depends on: Apache: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_apache_agent

View README Download JSON

Apache by Zabbix agent

Overview

127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...

It also uses Zabbix agent to collect Apache Linux process statistics such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache 2.4.41

Configuration

Setup

See the setup instructions for mod_status.

Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module

This is an example configuration of the Apache web server:

<Location "/server-status">
  SetHandler server-status
  Require host example.com
</Location>

If you use another path, then do not forget to change the {$APACHE.STATUS.PATH} macro. Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$APACHE.STATUS.HOST}	The hostname or IP address of the Apache status page.	`127.0.0.1`
{$APACHE.STATUS.PORT}	The port of the Apache status page.	`80`
{$APACHE.STATUS.PATH}	The URL path.	`server-status?auto`
{$APACHE.STATUS.SCHEME}	The request scheme, which may be either HTTP or HTTPS.	`http`
{$APACHE.RESPONSE_TIME.MAX.WARN}	The maximum Apache response time expressed in seconds for a trigger expression.	`10`
{$APACHE.PROCESS_NAME}	The process name filter for the Apache process discovery.	`(httpd\|apache2)`
{$APACHE.PROCESS.NAME.PARAMETER}	The process name of the Apache web server used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Get status	Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status.	Zabbix agent	web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing JavaScript: `The text is too long. Please see the template.`
Service ping		Zabbix agent	net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time		Zabbix agent	net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"]
Total bytes	The total bytes served.	Dependent item	apache.bytes Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024`
Bytes per second	It is calculated as a rate of change for total bytes statistics. `BytesPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.bytes.rate Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024` Change per second
Requests per second	It is calculated as a rate of change for the "Total requests" statistics. `ReqPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.requests.rate Preprocessing JSON Path: `$["Total Accesses"]` Change per second
Total requests	The total number of the Apache server accesses.	Dependent item	apache.requests Preprocessing JSON Path: `$["Total Accesses"]`
Uptime	The service uptime expressed in seconds.	Dependent item	apache.uptime Preprocessing JSON Path: `$.ServerUptimeSeconds`
Version	The Apache service version.	Dependent item	apache.version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Total workers busy	The total number of busy worker threads/processes.	Dependent item	apache.workers_total.busy Preprocessing JSON Path: `$.BusyWorkers`
Total workers idle	The total number of idle worker threads/processes.	Dependent item	apache.workers_total.idle Preprocessing JSON Path: `$.IdleWorkers`
Workers closing connection	The number of workers in closing state.	Dependent item	apache.workers.closing Preprocessing JSON Path: `$.Workers.closing`
Workers DNS lookup	The number of workers in `dnslookup` state.	Dependent item	apache.workers.dnslookup Preprocessing JSON Path: `$.Workers.dnslookup`
Workers finishing	The number of workers in finishing state.	Dependent item	apache.workers.finishing Preprocessing JSON Path: `$.Workers.finishing`
Workers idle cleanup	The number of workers in cleanup state.	Dependent item	apache.workers.cleanup Preprocessing JSON Path: `$.Workers.cleanup`
Workers keepalive (read)	The number of workers in `keepalive` state.	Dependent item	apache.workers.keepalive Preprocessing JSON Path: `$.Workers.keepalive`
Workers logging	The number of workers in logging state.	Dependent item	apache.workers.logging Preprocessing JSON Path: `$.Workers.logging`
Workers reading request	The number of workers in reading state.	Dependent item	apache.workers.reading Preprocessing JSON Path: `$.Workers.reading`
Workers sending reply	The number of workers in sending state.	Dependent item	apache.workers.sending Preprocessing JSON Path: `$.Workers.sending`
Workers slot with no current process	The number of slots with no current process.	Dependent item	apache.workers.slot Preprocessing JSON Path: `$.Workers.slot`
Workers starting up	The number of workers in starting state.	Dependent item	apache.workers.starting Preprocessing JSON Path: `$.Workers.starting`
Workers waiting for connection	The number of workers in waiting state.	Dependent item	apache.workers.waiting Preprocessing JSON Path: `$.Workers.waiting`
Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$APACHE.PROCESS.NAME.PARAMETER},,,summary]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Apache: Service has been restarted	Uptime is less than 10 minutes.	`last(/Apache by Zabbix agent/apache.uptime)<10m`\|Info	Manual close: Yes
Apache: Version has changed	Apache version has changed. Acknowledge to close the problem manually.	`last(/Apache by Zabbix agent/apache.version,#1)<>last(/Apache by Zabbix agent/apache.version,#2) and length(last(/Apache by Zabbix agent/apache.version))>0`\|Info	Manual close: Yes

LLD rule Event MPM discovery

Name Description Type Key and additional info

Event MPM discovery

The discovery of additional metrics if the event Multi-Processing Module (MPM) is used.

For more details see Apache MPM event.

Dependent item

apache.mpm.event.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Event MPM discovery

Name	Description	Type	Key and additional info
Connections async closing	The number of asynchronous connections in closing state (applicable only to the event MPM).	Dependent item	apache.connections[async_closing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncClosing`
Connections async keepalive	The number of asynchronous connections in keepalive state (applicable only to the event MPM).	Dependent item	apache.connections[asynckeepalive{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncKeepAlive`
Connections async writing	The number of asynchronous connections in writing state (applicable only to the event MPM).	Dependent item	apache.connections[async_writing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncWriting`
Connections total	The number of total connections.	Dependent item	apache.connections[total{#SINGLETON}] Preprocessing JSON Path: `$.ConnsTotal`
Bytes per request	The average number of client requests per second.	Dependent item	apache.bytes[per_request{#SINGLETON}] Preprocessing JSON Path: `$.BytesPerReq`
Number of async processes	The number of asynchronous processes.	Dependent item	apache.process[num{#SINGLETON}] Preprocessing JSON Path: `$.Processes`

LLD rule Apache process discovery

Name	Description	Type	Key and additional info
Apache process discovery	The discovery of the Apache process summary.	Dependent item	apache.proc.discovery

Item prototypes for Apache process discovery

Name	Description	Type	Key and additional info
CPU utilization	The percentage of the CPU utilization by a process {#APACHE.NAME}.	Zabbix agent	proc.cpu.util[{#APACHE.NAME}]
Get process data	The summary metrics aggregated by a process {#APACHE.NAME}.	Dependent item	apache.proc.get[{#APACHE.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#APACHE.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#APACHE.NAME} data`
Memory usage (rss)	The summary of resident set size memory used by a process {#APACHE.NAME} expressed in bytes.	Dependent item	apache.proc.rss[{#APACHE.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Memory usage (vsize)	The summary of virtual memory used by a process {#APACHE.NAME} expressed in bytes.	Dependent item	apache.proc.vmem[{#APACHE.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Memory usage, %	The percentage of real memory used by a process {#APACHE.NAME}.	Dependent item	apache.proc.pmem[{#APACHE.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Number of running processes	The number of running processes {#APACHE.NAME}.	Dependent item	apache.proc.num[{#APACHE.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Apache process discovery

Name	Description	Expression	Severity
Apache: Process is not running		`last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])=0`\|High
Apache: Service is down		`last(/Apache by Zabbix agent/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0`\|Average	Manual close: Yes
Apache: Failed to fetch status page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Apache by Zabbix agent/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Service response time is too high		`min(/Apache by Zabbix agent/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0`\|Warning	Manual close: Yes Depends on: Apache: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_activemq_jmx

View README Download JSON

Apache ActiveMQ by JMX

Overview

This template is designed for the effortless deployment of Apache ActiveMQ monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Apache ActiveMQ 5.15.5

Configuration

Setup

Metrics are collected by JMX.

Enable and configure JMX access to Apache ActiveMQ. See documentation for instructions.
Set values in host macros {$ACTIVEMQ.USERNAME}, {$ACTIVEMQ.PASSWORD} and {$ACTIVEMQ.PORT}.

Macros used

Name	Description	Default
{$ACTIVEMQ.USER}	User for JMX	`admin`
{$ACTIVEMQ.PASSWORD}	Password for JMX	`activemq`
{$ACTIVEMQ.PORT}	Port for JMX	`1099`
{$ACTIVEMQ.LLD.FILTER.BROKER.MATCHES}	Filter of discoverable discovered brokers	`.*`
{$ACTIVEMQ.LLD.FILTER.BROKER.NOT_MATCHES}	Filter to exclude discovered brokers	`CHANGE IF NEEDED`
{$ACTIVEMQ.LLD.FILTER.DESTINATION.MATCHES}	Filter of discoverable discovered destinations	`.*`
{$ACTIVEMQ.LLD.FILTER.DESTINATION.NOT_MATCHES}	Filter to exclude discovered destinations	`CHANGE IF NEEDED`
{$ACTIVEMQ.MSG.RATE.WARN.TIME}	The time for message enqueue/dequeue rate. Can be used with destination or broker name as context.	`15m`
{$ACTIVEMQ.MEM.MAX.WARN}	Memory threshold for AVERAGE trigger. Can be used with destination or broker name as context.	`75`
{$ACTIVEMQ.MEM.MAX.HIGH}	Memory threshold for HIGH trigger. Can be used with destination or broker name as context.	`90`
{$ACTIVEMQ.MEM.TIME}	Time during which the metric can be above the threshold. Can be used with destination or broker name as context.	`5m`
{$ACTIVEMQ.STORE.MAX.WARN}	Storage threshold for AVERAGE trigger. Can be used with broker name as context.	`75`
{$ACTIVEMQ.STORE.TIME}	Time during which the metric can be above the threshold. Can be used with destination or broker name as context.	`5m`
{$ACTIVEMQ.STORE.MAX.HIGH}	Storage threshold for HIGH trigger. Can be used with broker name as context.	`90`
{$ACTIVEMQ.TEMP.MAX.WARN}	Temp threshold for AVERAGE trigger. Can be used with broker name as context.	`75`
{$ACTIVEMQ.TEMP.MAX.HIGH}	Temp threshold for HIGH trigger. Can be used with broker name as context.	`90`
{$ACTIVEMQ.TEMP.TIME}	Time during which the metric can be above the threshold. Can be used with destination or broker name as context.	`5m`
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME}	Time during which there may be no consumers in destination. Can be used with destination name as context.	`10m`
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH}	Minimum amount of consumers for destination. Can be used with destination name as context.	`1`
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME}	Time during which there may be no producers on destination. Can be used with destination name as context.	`10m`
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH}	Minimum amount of producers for destination. Can be used with destination name as context.	`1`
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME}	Time during which there may be no consumers on destination. Can be used with broker name as context.	`5m`
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH}	Minimum amount of consumers for broker. Can be used with broker name as context.	`1`
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME}	Time during which there may be no producers on broker. Can be used with broker name as context.	`5m`
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH}	Minimum amount of producers for broker. Can be used with broker name as context.	`1`
{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT}	Attribute for TotalConsumerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold.	`TotalConsumerCount`
{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT}	Attribute for TotalProducerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold.	`TotalProducerCount`
{$ACTIVEMQ.QUEUE.TIME}	Time during which the QueueSize can be higher than threshold. Can be used with destination name as context.	`10m`
{$ACTIVEMQ.QUEUE.WARN}	Threshold for QueueSize. Can be used with destination name as context.	`100`
{$ACTIVEMQ.QUEUE.ENABLED}	Use this to disable alerting for specific destination. 1 = enabled, 0 = disabled. Can be used with destination name as context.	`1`
{$ACTIVEMQ.EXPIRED.WARN}	Threshold for expired messages count. Can be used with destination name as context.	`0`

LLD rule Brokers discovery

Name	Description	Type	Key and additional info
Brokers discovery	Discovery of brokers	JMX agent	jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=*"]

Item prototypes for Brokers discovery

Name	Description	Type	Key and additional info
Broker {#JMXBROKERNAME}: Version	The version of the broker.	JMX agent	jmx[{#JMXOBJ},BrokerVersion] Preprocessing Discard unchanged with heartbeat: `3h`
Broker {#JMXBROKERNAME}: Uptime	The uptime of the broker.	JMX agent	jmx[{#JMXOBJ},UptimeMillis] Preprocessing Custom multiplier: `0.001`
Broker {#JMXBROKERNAME}: Memory limit	Memory limit, in bytes, used for holding undelivered messages before paging to temporary storage.	JMX agent	jmx[{#JMXOBJ},MemoryLimit] Preprocessing Discard unchanged with heartbeat: `1h`
Broker {#JMXBROKERNAME}: Memory usage in percents	Percent of memory limit used.	JMX agent	jmx[{#JMXOBJ}, MemoryPercentUsage]
Broker {#JMXBROKERNAME}: Storage limit	Disk limit, in bytes, used for persistent messages before producers are blocked.	JMX agent	jmx[{#JMXOBJ},StoreLimit] Preprocessing Discard unchanged with heartbeat: `1h`
Broker {#JMXBROKERNAME}: Storage usage in percents	Percent of store limit used.	JMX agent	jmx[{#JMXOBJ},StorePercentUsage]
Broker {#JMXBROKERNAME}: Temp limit	Disk limit, in bytes, used for non-persistent messages and temporary data before producers are blocked.	JMX agent	jmx[{#JMXOBJ},TempLimit] Preprocessing Discard unchanged with heartbeat: `1h`
Broker {#JMXBROKERNAME}: Temp usage in percents	Percent of temp limit used.	JMX agent	jmx[{#JMXOBJ},TempPercentUsage]
Broker {#JMXBROKERNAME}: Messages enqueue rate	Rate of messages that have been sent to the broker.	JMX agent	jmx[{#JMXOBJ},TotalEnqueueCount] Preprocessing Change per second
Broker {#JMXBROKERNAME}: Messages dequeue rate	Rate of messages that have been delivered by the broker and acknowledged by consumers.	JMX agent	jmx[{#JMXOBJ},TotalDequeueCount] Preprocessing Change per second
Broker {#JMXBROKERNAME}: Consumers count total	Number of consumers attached to this broker.	JMX agent	jmx[{#JMXOBJ},TotalConsumerCount]
Broker {#JMXBROKERNAME}: Producers count total	Number of producers attached to this broker.	JMX agent	jmx[{#JMXOBJ},TotalProducerCount]

Trigger prototypes for Brokers discovery

Name	Description	Expression	Severity
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Version has been changed	The Broker {#JMXBROKERNAME} version has changed. Acknowledge to close the problem manually.	`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#1)<>last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#2) and length(last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion]))>0`\|Info	Manual close: Yes
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Broker has been restarted	Uptime is less than 10 minutes.	`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},UptimeMillis])<10m`\|Info	Manual close: Yes
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Memory usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXBROKERNAME}"}`\|Average	Depends on: Apache ActiveMQ: Broker {#JMXBROKERNAME}: Memory usage is too high
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Memory usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXBROKERNAME}"}`\|High
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Storage usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.WARN:"{#JMXBROKERNAME}"}`\|Average	Depends on: Apache ActiveMQ: Broker {#JMXBROKERNAME}: Storage usage is too high
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Storage usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.HIGH:"{#JMXBROKERNAME}"}`\|High
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Temp usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.WARN}`\|Average	Depends on: Apache ActiveMQ: Broker {#JMXBROKERNAME}: Temp usage is too high
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Temp usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.HIGH}`\|High
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Message enqueue rate is higher than dequeue rate	Enqueue rate is higher than dequeue rate. It may indicate performance problems.	`avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})`\|Average
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Consumers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalConsumerCount],{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|High
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Producers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalProducerCount],{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|High

LLD rule Destinations discovery

Name	Description	Type	Key and additional info
Destinations discovery	Discovery of destinations	JMX agent	jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=,destinationType=,destinationName=*"]

Item prototypes for Destinations discovery

Name	Description	Type	Key and additional info
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count	Number of consumers attached to this destination.	JMX agent	jmx[{#JMXOBJ},ConsumerCount]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count total on {#JMXBROKERNAME}	Number of consumers attached to the broker of this destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold.	JMX agent	jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing In range: `0 -> {$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH}` ⛔️Custom on fail: Set value to: `{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH}` Discard unchanged with heartbeat: `3h`
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count	Number of producers attached to this destination.	JMX agent	jmx[{#JMXOBJ},ProducerCount]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count total on {#JMXBROKERNAME}	Number of producers attached to the broker of this destination. Used to suppress destination's triggers when the count of producers on the broker is lower than threshold.	JMX agent	jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing In range: `0 -> {$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH}` ⛔️Custom on fail: Set value to: `{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH}` Discard unchanged with heartbeat: `3h`
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage in percents	The percentage of the memory limit used.	JMX agent	jmx[{#JMXOBJ},MemoryPercentUsage]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages enqueue rate	Rate of messages that have been sent to the destination.	JMX agent	jmx[{#JMXOBJ},EnqueueCount] Preprocessing Change per second
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages dequeue rate	Rate of messages that has been acknowledged (and removed) from the destination.	JMX agent	jmx[{#JMXOBJ},DequeueCount] Preprocessing Change per second
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size	Number of messages on this destination, including any that have been dispatched but not acknowledged.	JMX agent	jmx[{#JMXOBJ},QueueSize]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count	Number of messages that have been expired.	JMX agent	jmx[{#JMXOBJ},ExpiredCount] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Destinations discovery

Name	Description	Expression	Severity
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ConsumerCount],{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|Average	Manual close: Yes
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ProducerCount],{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|Average	Manual close: Yes
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high		`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXDESTINATIONNAME}"}`\|Average
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high		`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXDESTINATIONNAME}"}`\|High
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Message enqueue rate is higher than dequeue rate	Enqueue rate is higher than dequeue rate. It may indicate performance problems.	`avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},EnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},DequeueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})`\|Average
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size is high	Queue size is higher than threshold. It may indicate performance problems.	`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},QueueSize],{$ACTIVEMQ.QUEUE.TIME:"{#JMXDESTINATIONNAME}"})>{$ACTIVEMQ.QUEUE.WARN:"{#JMXDESTINATIONNAME}"} and {$ACTIVEMQ.QUEUE.ENABLED:"{#JMXDESTINATIONNAME}"}=1`\|Average
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count is high	This metric represents the number of messages that expired before they could be delivered. If you expect all messages to be delivered and acknowledged within a certain amount of time, you can set an expiration for each message, and investigate if your ExpiredCount metric rises above zero.	`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ExpiredCount])>{$ACTIVEMQ.EXPIRED.WARN:"{#JMXDESTINATIONNAME}"}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_acronis_cyber_protect_cloud_http

View README Download JSON

Acronis Cyber Protect Cloud by HTTP

Overview

This template is designed for the effortless deployment of Acronis Cyber Protect Cloud monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Acronis Cloud Platform version 23.07

Configuration

Setup

This is a master template that needs to be assigned to a host, and it will automatically create MSP host prototype, which will monitor Acronis Cyber Protect Cloud metrics.

Before using this template it is required to create a new MSP-level API client for Zabbix to use. To do that, sign into your Acronis Cyber Protect Cloud WEB interface, navigate to Settings -> API clients and create new API client. You will be shown credentials for this API client. These credentials need to be entered in the following user macros of this template:

{$ACRONIS.CPC.AUTH.CLIENT.ID} - enter Client ID here;
{$ACRONIS.CPC.AUTH.SECRET} - enter Secret here;
{$ACRONIS.CPC.DATACENTER.URL} - enter Data center URL

This is all the configuration needed for this integration.

Macros used

Name	Description	Default
{$ACRONIS.CPC.DATACENTER.URL}	Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com.
{$ACRONIS.CPC.AUTH.INTERVAL}	API token regeneration interval, in minutes. By default, Acronis Cyber Protect Cloud tokens expire after 2 hours.	`110m`
{$ACRONIS.CPC.HTTP.PROXY}	Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used.
{$ACRONIS.CPC.AUTH.CLIENT.ID}	Client ID for API user access.
{$ACRONIS.CPC.AUTH.SECRET}	Secret for API user access.
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT}	Sub-path for the Account Management API.	`/api/2`

Items

Name Description Type Key and additional info

Get access token

Authorizes API user and receives access token.

HTTP agent

acronis.cpc.accountmanager.gettoken

Preprocessing

JavaScript: The text is too long. Please see the template.

LLD rule Acronis CPC: MSP Discovery

Name	Description	Type	Key and additional info
Acronis CPC: MSP Discovery	Discovers MSP and creates host prototype based on that.	Dependent item	acronis.cpc.lld.msp_discovery

Acronis Cyber Protect Cloud MSP by HTTP

Overview

This template is designed for the effortless deployment of Acronis Cyber Protect Cloud MSP monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Acronis Cloud Platform version 23.07

Configuration

Setup

This template is not meant to be used independently. A host with the Acronis Cyber Protect Cloud by HTTP template will request API token and automatically create a host prototype with this template assigned to it.

If needed, you can specify an HTTP proxy for the template to use by changing the value of {$ACRONIS.CPC.HTTP.PROXY} user macro.

Device discovery trigger prototypes that check services which have failed to run, have trigger time offset user macros:

{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}

Using these macros, their respective triggers can be offset in both directions. For example, if you wish to make sure that the trigger fires only when the current time is at least 3 minutes over the next scheduled antimalware scan, then set the value of {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE} user macro to -180. This is the default behaviour.

Macros used

Name	Description	Default
{$ACRONIS.CPC.DATACENTER.URL}	Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com.
{$ACRONIS.CPC.HTTP.PROXY}	Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used.
{$ACRONIS.CPC.CYBERFIT.WARN}	CyberFit score threshold for "warning" severity trigger.	`669`
{$ACRONIS.CPC.CYBERFIT.HIGH}	CyberFit score threshold for "high" severity trigger.	`579`
{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}	Offset time in seconds for scheduled antimalware scan trigger check.	`-180`
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}	Offset time in seconds for scheduled backup run trigger check.	`-180`
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}	Offset time in seconds for scheduled vulnerability assessment run trigger check.	`-180`
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}	Offset time in seconds for scheduled patch management run trigger check.	`-180`
{$ACRONIS.CPC.DEVICE.RESOURCE.TYPE}	Comma separated list of resource types for devices retrieval.	`resource.machine`
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.MATCHES}	Sets the alert category regex filter to use in alert discovery for including.	`.*`
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.NOT_MATCHES}	Sets the alert category regex filter to use in alert discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.MATCHES}	Sets the alert severity regex filter to use in alert discovery for including.	`.*`
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.NOT_MATCHES}	Sets the alert severity regex filter to use in alert discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.MATCHES}	Sets the alert resource name regex filter to use in alert discovery for including.	`.*`
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.NOT_MATCHES}	Sets the alert resource name regex filter to use in alert discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.KIND.MATCHES}	Sets the customer name regex filter to use in customer discovery for including.	`customer`
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.MATCHES}	Sets the customer name regex filter to use in customer discovery for including.	`.*`
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.NOT_MATCHES}	Sets the customer name regex filter to use in customer discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.MATCHES}	Sets the tenant name regex filter to use in device discovery for including.	`.*`
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.NOT_MATCHES}	Sets the tenant name regex filter to use in device discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.ACCESS_TOKEN}	API access token.
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT}	Sub-path for the Account Management API.	`/api/2`
{$ACRONIS.CPC.PATH.RESOURCE.MANAGEMENT}	Sub-path for the Resource Management API.	`/api/resource_management/v4`
{$ACRONIS.CPC.PATH.ALERTS}	Sub-path for the Alerts API.	`/api/alert_manager/v1`
{$ACRONIS.CPC.PATH.AGENTS}	Sub-path for the Agents API.	`/api/agent_manager/v2`
{$ACRONIS.CPC.MSP.TENANT.UUID}	UUID for MSP.

Items

Name	Description	Type	Key and additional info
Register integration	Registers integration on Acronis services.	Script	acronis.cpc.register.integration
Get alerts	Fetches all alerts.	HTTP agent	acronis.cpc.alerts.get Preprocessing JSON Path: `$.items`
Get customers	Fetches all customers.	HTTP agent	acronis.cpc.customers.get Preprocessing JSON Path: `$.items`
Get devices	Fetches all devices.	HTTP agent	acronis.cpc.devices.get Preprocessing JSON Path: `$.items`
Alerts with "ok" severity	Gets count of alerts with "ok" severity.	Dependent item	acronis.cpc.alerts.severity.ok Preprocessing JSON Path: `$..[?(@.severity == 'ok')].length()` Discard unchanged with heartbeat: `1h`
Alerts with "warning" severity	Gets count of alerts with "warning" severity.	Dependent item	acronis.cpc.alerts.severity.warn Preprocessing JSON Path: `$..[?(@.severity == 'warning')].length()` Discard unchanged with heartbeat: `1h`
Alerts with "error" severity	Gets count of alerts with "error" severity.	Dependent item	acronis.cpc.alerts.severity.err Preprocessing JSON Path: `$..[?(@.severity == 'error')].length()` Discard unchanged with heartbeat: `1h`
Alerts with "critical" severity	Gets count of alerts with "critical" severity.	Dependent item	acronis.cpc.alerts.severity.crit Preprocessing JSON Path: `$..[?(@.severity == 'critical')].length()` Discard unchanged with heartbeat: `1h`
Alerts with "information" severity	Gets count of alerts with "information" severity.	Dependent item	acronis.cpc.alerts.severity.info Preprocessing JSON Path: `$..[?(@.severity == 'information')].length()` Discard unchanged with heartbeat: `1h`

LLD rule Acronis CPC: Alerts discovery

Name Description Type Key and additional info

Acronis CPC: Alerts discovery

Discovers alerts.

Dependent item

acronis.cpc.alerts.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Acronis CPC: Alerts discovery

Name Description Type Key and additional info

Alert [{#TYPE}]:[{#ALERT_ID}]: Alert severity

Severity for the alert.

Dependent item

acronis.cpc.alert.severity[{#ALERT_ID}]

Preprocessing

JSON Path: $[?(@.id == "{#ALERT_ID}")].severity.first()
⛔️Custom on fail: Set error to: Could not find alert severity
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Acronis CPC: Alerts discovery

Name	Description	Expression	Severity
Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "critical" severity	Alert has "critical" severity.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=3`\|High	Manual close: Yes
Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "error" severity	Alert has "error" severity.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=2`\|Average	Manual close: Yes Depends on: Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "critical" severity
Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "warning" severity	Alert has "warning" severity.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=1`\|Warning	Manual close: Yes Depends on: Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "error" severity

LLD rule Acronis CPC: Customer discovery

Name Description Type Key and additional info

Acronis CPC: Customer discovery

Discovers customers.

Dependent item

acronis.cpc.customer.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Acronis CPC: Customer discovery

Name Description Type Key and additional info

Customer [{#NAME}]: Enabled status

Enabled status for customer (true or false).

Dependent item

acronis.cpc.customer.status[{#NAME}]

Preprocessing

JSON Path: $[?(@.name == "{#NAME}")].enabled.first()
⛔️Custom on fail: Set error to: Could not find customer status
Boolean to decimal
Discard unchanged with heartbeat: 1h

LLD rule Acronis CPC: Device discovery

Name Description Type Key and additional info

Acronis CPC: Device discovery

Discovers devices.

Dependent item

acronis.cpc.device.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Acronis CPC: Device discovery

Name	Description	Type	Key and additional info
Device [{#NAME}]:[{#ID}]: Raw data resources status	Gets statuses for device resources.	HTTP agent	acronis.cpc.device.res.status.raw[{#NAME}] Preprocessing JSON Path: `$.items[0]` ⛔️Custom on fail: Set error to: `Could not parse resource status data`
Device [{#NAME}]:[{#ID}]: CyberFit score	Acronis "CyberFit" score for the device. Value of "-1" is assigned if "CyberFit" could not be found for device.	Dependent item	acronis.cpc.device.cyberfit[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `-1` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Agent version	Agent version for the device.	Dependent item	acronis.cpc.device.agent.version[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set error to: `Could not parse agent version` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Agent enabled	Agent status (enabled or disabled) for the device.	Dependent item	acronis.cpc.device.agent.enabled[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set error to: `Could not parse agent status` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Agent online	Agent reachability for the device.	Dependent item	acronis.cpc.device.agent.online[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set error to: `Could not parse agent reachability status` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Protection status	Protection status for device.	Dependent item	acronis.cpc.device.protection.status[{#NAME}] Preprocessing JSON Path: `$.aggregate.status` ⛔️Custom on fail: Set error to: `Could not parse protection status` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Protection plan name	Protection plan name for device.	Dependent item	acronis.cpc.device.protection.name[{#NAME}] Preprocessing JSON Path: `$.aggregate.names` ⛔️Custom on fail: Set error to: `Could not parse protection plan name` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful antimalware protection scan	Previous successful antimalware protection scan for device.	Dependent item	acronis.cpc.device.protection.scan.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous antimalware protection scan	Previous antimalware protection scan for device.	Dependent item	acronis.cpc.device.protection.scan.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next antimalware protection scan	Next scheduled antimalware protection scan for device.	Dependent item	acronis.cpc.device.protection.scan.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful machine backup run	Previous successful machine backup run for device.	Dependent item	acronis.cpc.device.backup.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous machine backup run	Previous machine backup run for device.	Dependent item	acronis.cpc.device.backup.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next machine backup run	Next scheduled machine backup run for device.	Dependent item	acronis.cpc.device.backup.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful vulnerability assessment	Previous successful vulnerability assessment for device.	Dependent item	acronis.cpc.device.vuln.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment	Previous vulnerability assessment for device.	Dependent item	acronis.cpc.device.vuln.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next vulnerability assessment	Next scheduled vulnerability assessment for device.	Dependent item	acronis.cpc.device.vuln.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful patch management run	Previous successful patch management run for device.	Dependent item	acronis.cpc.device.patch.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous patch management run	Previous patch management run for device.	Dependent item	acronis.cpc.device.patch.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next patch management run	Next scheduled patch management run for device.	Dependent item	acronis.cpc.device.patch.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Acronis CPC: Device discovery

Name	Description	Expression	Severity
Acronis: Device [{#NAME}]:[{#ID}]: CyberFit score critical	CyberFit score for this device is critical for at least 3 minutes.	`min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.HIGH} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1`\|High	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: CyberFit score low	CyberFit score for this device is low for at least 3 minutes.	`min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.WARN} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1`\|Warning	Manual close: Yes Depends on: Acronis: Device [{#NAME}]:[{#ID}]: CyberFit score critical
Acronis: Device [{#NAME}]:[{#ID}]: Agent disabled	Agent for this device is disabled for at least 3 minutes.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.agent.enabled[{#NAME}],3m) < 1`\|Info	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Protection status "error"	Device has "error" protection status.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="error"`\|Average	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Protection status "warning"	Device has "warning" protection status.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="warning"`\|Warning	Manual close: Yes Depends on: Acronis: Device [{#NAME}]:[{#ID}]: Protection status "error"
Acronis: Device [{#NAME}]:[{#ID}]: Previous protection scan not successful	Device has "error" protection status.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev.ok[{#NAME}])<>last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev[{#NAME}])`\|Average	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled antimalware scan failed to run	Scheduled antimalware scan failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE})`\|Warning	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Previous machine backup run not successful	Previous machine backup did not run successfully.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev[{#NAME}],1m)`\|Average	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled machine backup failed to run	Scheduled machine backup failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP})`\|Warning	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment not successful	Previous vulnerability assessment did not run successfully.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev[{#NAME}],1m)`\|Average	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled vulnerability assessment failed to run	Scheduled vulnerability assessment failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY})`\|Warning	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Previous patch management run not successful	Previous patch management run did not run successfully.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev[{#NAME}],1m)`\|Average	Manual close: Yes
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled patch management failed to run	Scheduled patch management failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH})`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums