app

app_zookeeper_http

Zookeeper by HTTP

Overview

This template is designed for the effortless deployment of Apache Zookeeper monitoring by Zabbix via HTTP and doesn't require any external scripts.

This template works with standalone and cluster instances. Metrics are collected from each Zookeeper node by requests to AdminServer.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Apache Zookeeper, version 3.6+, 3.8+

Configuration

Setup

Enable the AdminServer and configure the parameters according to the official documentation.
Set the hostname or IP address of the Apache Zookeeper host in the {$ZOOKEEPER.HOST} macro. You can also change the {$ZOOKEEPER.COMMAND_URL}, {$ZOOKEEPER.PORT} and {$ZOOKEEPER.SCHEME} macros if necessary.

Macros used

Name	Description	Default
{$ZOOKEEPER.HOST}	The hostname or IP address of the Apache Zookeeper host.	`<SET ZOOKEEPER HOST>`
{$ZOOKEEPER.PORT}	The port the embedded Jetty server listens on (admin.serverPort).	`8080`
{$ZOOKEEPER.COMMAND_URL}	The URL for listing and issuing commands relative to the root URL (admin.commandURL).	`commands`
{$ZOOKEEPER.SCHEME}	Request scheme which may be http or https	`http`
{$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}	Maximum percentage of file descriptors usage alert threshold (for trigger expression).	`85`
{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN}	Maximum number of outstanding requests (for trigger expression).	`10`
{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN}	Maximum number of pending syncs from the followers (for trigger expression).	`10`

Items

Name	Description	Type	Key and additional info
Zookeeper: Get server metrics		HTTP agent	zookeeper.get_metrics
Zookeeper: Get connections stats	Get information on client connections to server. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance).	HTTP agent	zookeeper.getconnectionsstats
Zookeeper: Server mode	Mode of the server. In an ensemble, this may either be leader or follower. Otherwise, it is standalone	Dependent item	zookeeper.server_state Preprocessing JSON Path: `$.server_state` Discard unchanged with heartbeat: `1h`
Zookeeper: Uptime	Uptime that a peer has been in a table leading/following/observing state.	Dependent item	zookeeper.uptime Preprocessing JSON Path: `$.uptime` Custom multiplier: `0.001`
Zookeeper: Version	Version of Zookeeper server.	Dependent item	zookeeper.version Preprocessing JSON Path: `$.version` Regular expression: `^([0-9\.]+) \1` Discard unchanged with heartbeat: `3h`
Zookeeper: Approximate data size	Data tree size in bytes.The size includes the znode path and its value.	Dependent item	zookeeper.approximatedatasize Preprocessing JSON Path: `$.approximate_data_size`
Zookeeper: File descriptors, max	Maximum number of file descriptors that a zookeeper server can open.	Dependent item	zookeeper.maxfiledescriptor_count Preprocessing JSON Path: `$.max_file_descriptor_count` Discard unchanged with heartbeat: `1h`
Zookeeper: File descriptors, open	Number of file descriptors that a zookeeper server has open.	Dependent item	zookeeper.openfiledescriptor_count Preprocessing JSON Path: `$.open_file_descriptor_count`
Zookeeper: Outstanding requests	The number of queued requests when the server is under load and is receiving more sustained requests than it can process.	Dependent item	zookeeper.outstanding_requests Preprocessing JSON Path: `$.outstanding_requests`
Zookeeper: Commit per sec	The number of commits performed per second	Dependent item	zookeeper.commit_count.rate Preprocessing JSON Path: `$.commit_count` Change per second
Zookeeper: Diff syncs per sec	Number of diff syncs performed per second	Dependent item	zookeeper.diff_count.rate Preprocessing JSON Path: `$.diff_count` Change per second
Zookeeper: Snap syncs per sec	Number of snap syncs performed per second	Dependent item	zookeeper.snap_count.rate Preprocessing JSON Path: `$.snap_count` Change per second
Zookeeper: Looking per sec	Rate of transitions into looking state.	Dependent item	zookeeper.looking_count.rate Preprocessing JSON Path: `$.looking_count` Change per second
Zookeeper: Alive connections	Number of active clients connected to a zookeeper server.	Dependent item	zookeeper.numaliveconnections Preprocessing JSON Path: `$.num_alive_connections`
Zookeeper: Global sessions	Number of global sessions.	Dependent item	zookeeper.global_sessions Preprocessing JSON Path: `$.global_sessions`
Zookeeper: Local sessions	Number of local sessions.	Dependent item	zookeeper.local_sessions Preprocessing JSON Path: `$.local_sessions`
Zookeeper: Drop connections per sec	Rate of connection drops.	Dependent item	zookeeper.connectiondropcount.rate Preprocessing JSON Path: `$.connection_drop_count` Change per second
Zookeeper: Rejected connections per sec	Rate of connection rejected.	Dependent item	zookeeper.connection_rejected.rate Preprocessing JSON Path: `$.connection_rejected` Change per second
Zookeeper: Revalidate connections per sec	Rate of connection revalidations.	Dependent item	zookeeper.connectionrevalidatecount.rate Preprocessing JSON Path: `$.connection_revalidate_count` Change per second
Zookeeper: Revalidate per sec	Rate of revalidations.	Dependent item	zookeeper.revalidate_count.rate Preprocessing JSON Path: `$.revalidate_count` Change per second
Zookeeper: Latency, max	The maximum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.max_latency Preprocessing JSON Path: `$.max_latency`
Zookeeper: Latency, min	The minimum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.min_latency Preprocessing JSON Path: `$.min_latency`
Zookeeper: Latency, avg	The average amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.avg_latency Preprocessing JSON Path: `$.avg_latency`
Zookeeper: Znode count	The number of znodes in the ZooKeeper namespace (the data)	Dependent item	zookeeper.znode_count Preprocessing JSON Path: `$.znode_count` Discard unchanged with heartbeat: `1h`
Zookeeper: Ephemeral nodes count	Number of ephemeral nodes that a zookeeper server has in its data tree.	Dependent item	zookeeper.ephemerals_count Preprocessing JSON Path: `$.ephemerals_count`
Zookeeper: Watch count	Number of watches currently set on the local ZooKeeper process.	Dependent item	zookeeper.watch_count Preprocessing JSON Path: `$.watch_count`
Zookeeper: Packets sent per sec	The number of zookeeper packets sent from a server per second.	Dependent item	zookeeper.packets_sent Preprocessing JSON Path: `$.packets_sent` Change per second
Zookeeper: Packets received per sec	The number of zookeeper packets received by a server per second.	Dependent item	zookeeper.packets_received.rate Preprocessing JSON Path: `$.packets_received` Change per second
Zookeeper: Bytes received per sec	Number of bytes received per second.	Dependent item	zookeeper.bytesreceivedcount.rate Preprocessing JSON Path: `$.bytes_received_count` Change per second
Zookeeper: Election time, avg	Time between entering and leaving election.	Dependent item	zookeeper.avgelectiontime Preprocessing JavaScript: `The text is too long. Please see the template.`
Zookeeper: Elections	Number of elections happened.	Dependent item	zookeeper.cntelectiontime Preprocessing JavaScript: `The text is too long. Please see the template.`
Zookeeper: Fsync time, avg	Time to fsync transaction log.	Dependent item	zookeeper.avg_fsynctime Preprocessing JavaScript: `The text is too long. Please see the template.`
Zookeeper: Fsync	Count of performed fsyncs.	Dependent item	zookeeper.cnt_fsynctime Preprocessing JavaScript: `The text is too long. Please see the template.`
Zookeeper: Snapshot write time, avg	Average time to write a snapshot.	Dependent item	zookeeper.avg_snapshottime Preprocessing JavaScript: `The text is too long. Please see the template.`
Zookeeper: Snapshot writes	Count of performed snapshot writes.	Dependent item	zookeeper.cnt_snapshottime Preprocessing JavaScript: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
Zookeeper: Server mode has changed	Zookeeper node state has changed. Acknowledge to close the problem manually.	`last(/Zookeeper by HTTP/zookeeper.server_state,#1)<>last(/Zookeeper by HTTP/zookeeper.server_state,#2) and length(last(/Zookeeper by HTTP/zookeeper.server_state))>0`\|Info	Manual close: Yes
Zookeeper: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes	`nodata(/Zookeeper by HTTP/zookeeper.uptime,10m)=1`\|Warning	Manual close: Yes
Zookeeper: Version has changed	Zookeeper version has changed. Acknowledge to close the problem manually.	`last(/Zookeeper by HTTP/zookeeper.version,#1)<>last(/Zookeeper by HTTP/zookeeper.version,#2) and length(last(/Zookeeper by HTTP/zookeeper.version))>0`\|Info	Manual close: Yes
Zookeeper: Too many file descriptors used	Number of file descriptors used more than {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}% of the available number of file descriptors.	`min(/Zookeeper by HTTP/zookeeper.open_file_descriptor_count,5m) * 100 / last(/Zookeeper by HTTP/zookeeper.max_file_descriptor_count) > {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}`\|Warning
Zookeeper: Too many queued requests	Number of queued requests in the server. This goes up when the server receives more requests than it can process.	`min(/Zookeeper by HTTP/zookeeper.outstanding_requests,5m)>{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN}`\|Average	Manual close: Yes

LLD rule Leader metrics discovery

Name Description Type Key and additional info

Leader metrics discovery

Name	Description	Type	Key and additional info
Leader metrics discovery	Additional metrics for leader node	Dependent item	zookeeper.metrics.leader Preprocessing JSON Path: `$.server_state` JavaScript: `The text is too long. Please see the template.`

Additional metrics for leader node

Dependent item

zookeeper.metrics.leader

Preprocessing

JSON Path: $.server_state
JavaScript: The text is too long. Please see the template.

Item prototypes for Leader metrics discovery

Name	Description	Type	Key and additional info
Zookeeper: Pending syncs{#SINGLETON}	Number of pending syncs to carry out to ZooKeeper ensemble followers.	Dependent item	zookeeper.pending_syncs[{#SINGLETON}] Preprocessing JSON Path: `$.pending_syncs`
Zookeeper: Quorum size{#SINGLETON}		Dependent item	zookeeper.quorum_size[{#SINGLETON}] Preprocessing JSON Path: `$.quorum_size`
Zookeeper: Synced followers{#SINGLETON}	Number of synced followers reported when a node server_state is leader.	Dependent item	zookeeper.synced_followers[{#SINGLETON}] Preprocessing JSON Path: `$.synced_followers`
Zookeeper: Synced non-voting follower{#SINGLETON}	Number of synced voting followers reported when a node server_state is leader.	Dependent item	zookeeper.syncednonvoting_followers[{#SINGLETON}] Preprocessing JSON Path: `$.synced_non_voting_followers`
Zookeeper: Synced observers{#SINGLETON}	Number of synced observers.	Dependent item	zookeeper.synced_observers[{#SINGLETON}] Preprocessing JSON Path: `$.synced_observers`
Zookeeper: Learners{#SINGLETON}	Number of learners.	Dependent item	zookeeper.learners[{#SINGLETON}] Preprocessing JSON Path: `$.learners`

Trigger prototypes for Leader metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Zookeeper: Too many pending syncs		`min(/Zookeeper by HTTP/zookeeper.pending_syncs[{#SINGLETON}],5m)>{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN}`\|Average	Manual close: Yes
Zookeeper: Too few active followers	The number of followers should equal the total size of your ZooKeeper ensemble, minus 1 (the leader is not included in the follower count). If the ensemble fails to maintain quorum, all automatic failover features are suspended.	`last(/Zookeeper by HTTP/zookeeper.synced_followers[{#SINGLETON}]) < last(/Zookeeper by HTTP/zookeeper.quorum_size[{#SINGLETON}])-1`\|Average

LLD rule Clients discovery

Name Description Type Key and additional info

Clients discovery

Name	Description	Type	Key and additional info
Clients discovery	Get list of client connections. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance).	HTTP agent	zookeeper.clients Preprocessing JavaScript: `The text is too long. Please see the template.`

Get list of client connections.

Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance).

HTTP agent

zookeeper.clients

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Clients discovery

Name	Description	Type	Key and additional info
Zookeeper client {#TYPE} [{#CLIENT}]: Get client info	The item gets information about "{#CLIENT}" client of "{#TYPE}" type.	Dependent item	zookeeper.client_info[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, max	The maximum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.max_latency[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.max_latency`
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, min	The minimum amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.min_latency[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.min_latency`
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, avg	The average amount of time it takes for the server to respond to a client request.	Dependent item	zookeeper.avg_latency[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.avg_latency`
Zookeeper client {#TYPE} [{#CLIENT}]: Packets sent per sec	The number of packets sent.	Dependent item	zookeeper.packets_sent[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.packets_sent` Change per second
Zookeeper client {#TYPE} [{#CLIENT}]: Packets received per sec	The number of packets received.	Dependent item	zookeeper.packets_received[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.packets_received` Change per second
Zookeeper client {#TYPE} [{#CLIENT}]: Outstanding requests	The number of queued requests when the server is under load and is receiving more sustained requests than it can process.	Dependent item	zookeeper.outstanding_requests[{#TYPE},{#CLIENT}] Preprocessing JSON Path: `$.outstanding_requests`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_zabbix_server_remote

View README Download JSON

Remote Zabbix server health

Overview

This template is designed to monitor internal Zabbix metrics on the remote Zabbix server.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Zabbix server 6.4

Configuration

Setup

Specify the address of the remote Zabbix server by changing {$ZABBIX.SERVER.ADDRESS} and {$ZABBIX.SERVER.PORT} macros. Don't forget to adjust the StatsAllowedIP parameter in the remote server's configuration file to allow the collection of statistics.

Macros used

Name	Description	Default
{$ZABBIX.SERVER.ADDRESS}	IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1).
{$ZABBIX.SERVER.PORT}	Port of server to be remotely queried (default is 10051).
{$PROXY.LAST_SEEN.MAX}	The maximum number of seconds that Zabbix proxy has not been seen.	`600`
{$ZABBIX.SERVER.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expression.	`5m`

Items

Name	Description	Type	Key and additional info
Remote Zabbix server: Zabbix stats	The master item of Zabbix server statistics.	Zabbix internal	zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}]
Zabbix proxies stats	The master item of Zabbix proxies' statistics.	Dependent item	zabbix.proxies.stats Preprocessing JSON Path: `$.data.proxy`
Remote Zabbix server: Zabbix stats queue over 10m	The number of monitored items in the queue, which are delayed at least by 10 minutes.	Zabbix internal	zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Remote Zabbix server: Zabbix stats queue	The number of monitored items in the queue, which are delayed at least by 6 seconds.	Zabbix internal	zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing JSON Path: `$.queue`
Remote Zabbix server: Utilization of alert manager internal processes, in %	The average percentage of the time during which the alert manager processes have been busy for the last minute.	Dependent item	process.alert_manager.avg.busy Preprocessing JSON Path: `$.data.process['alert manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert manager" processes started.`
Remote Zabbix server: Utilization of alert syncer internal processes, in %	The average percentage of the time during which the alert syncer processes have been busy for the last minute.	Dependent item	process.alert_syncer.avg.busy Preprocessing JSON Path: `$.data.process['alert syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "alert syncer" processes started.`
Remote Zabbix server: Utilization of alerter internal processes, in %	The average percentage of the time during which the alerter processes have been busy for the last minute.	Dependent item	process.alerter.avg.busy Preprocessing JSON Path: `$.data.process['alerter'].busy.avg` ⛔️Custom on fail: Set error to: `No "alerter" processes started.`
Remote Zabbix server: Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "availability manager" processes started.`
Remote Zabbix server: Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "configuration syncer" processes started.`
Remote Zabbix server: Utilization of discoverer data collector processes, in %	The average percentage of the time during which the discoverer processes have been busy for the last minute.	Dependent item	process.discoverer.avg.busy Preprocessing JSON Path: `$.data.process['discoverer'].busy.avg` ⛔️Custom on fail: Set error to: `No "discoverer" processes started.`
Remote Zabbix server: Utilization of escalator internal processes, in %	The average percentage of the time during which the escalator processes have been busy for the last minute.	Dependent item	process.escalator.avg.busy Preprocessing JSON Path: `$.data.process['escalator'].busy.avg` ⛔️Custom on fail: Set error to: `No "escalator" processes started.`
Remote Zabbix server: Utilization of history poller data collector processes, in %	The average percentage of the time during which the history poller processes have been busy for the last minute.	Dependent item	process.history_poller.avg.busy Preprocessing JSON Path: `$.data.process['history poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "history poller" processes started.`
Remote Zabbix server: Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Remote Zabbix server: Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `No "history syncer" processes started.`
Remote Zabbix server: Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "housekeeper" processes started.`
Remote Zabbix server: Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "http poller" processes started.`
Remote Zabbix server: Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `No "icmp pinger" processes started.`
Remote Zabbix server: Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi manager" processes started.`
Remote Zabbix server: Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "ipmi poller" processes started.`
Remote Zabbix server: Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "java poller" processes started.`
Remote Zabbix server: Utilization of LLD manager internal processes, in %	The average percentage of the time during which the lld manager processes have been busy for the last minute.	Dependent item	process.lld_manager.avg.busy Preprocessing JSON Path: `$.data.process['lld manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD manager" processes started.`
Remote Zabbix server: Utilization of LLD worker internal processes, in %	The average percentage of the time during which the lld worker processes have been busy for the last minute.	Dependent item	process.lld_worker.avg.busy Preprocessing JSON Path: `$.data.process['lld worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "LLD worker" processes started.`
Remote Zabbix server: Utilization of connector manager internal processes, in %	The average percentage of the time during which the connector manager processes have been busy for the last minute.	Dependent item	process.connector_manager.avg.busy Preprocessing JSON Path: `$.data.process['connector manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector manager" processes started.`
Remote Zabbix server: Utilization of connector worker internal processes, in %	The average percentage of the time during which the connector worker processes have been busy for the last minute.	Dependent item	process.connector_worker.avg.busy Preprocessing JSON Path: `$.data.process['connector worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "connector worker" processes started.`
Remote Zabbix server: Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "poller" processes started.`
Remote Zabbix server: Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing worker" processes started.`
Remote Zabbix server: Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "preprocessing manager" processes started.`
Remote Zabbix server: Utilization of proxy poller data collector processes, in %	The average percentage of the time during which the proxy poller processes have been busy for the last minute.	Dependent item	process.proxy_poller.avg.busy Preprocessing JSON Path: `$.data.process['proxy poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "proxy poller" processes started.`
Remote Zabbix server: Utilization of report manager internal processes, in %	The average percentage of the time during which the report manager processes have been busy for the last minute.	Dependent item	process.report_manager.avg.busy Preprocessing JSON Path: `$.data.process['report manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "report manager" processes started.`
Remote Zabbix server: Utilization of report writer internal processes, in %	The average percentage of the time during which the report writer processes have been busy for the last minute.	Dependent item	process.report_writer.avg.busy Preprocessing JSON Path: `$.data.process['report writer'].busy.avg` ⛔️Custom on fail: Set error to: `No "report writer" processes started.`
Remote Zabbix server: Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `No "self-monitoring" processes started.`
Remote Zabbix server: Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "snmp trapper" processes started.`
Remote Zabbix server: Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "task manager" processes started.`
Remote Zabbix server: Utilization of timer internal processes, in %	The average percentage of the time during which the timer processes have been busy for the last minute.	Dependent item	process.timer.avg.busy Preprocessing JSON Path: `$.data.process['timer'].busy.avg` ⛔️Custom on fail: Set error to: `No "timer" processes started.`
Remote Zabbix server: Utilization of service manager internal processes, in %	The average percentage of the time during which the service manager processes have been busy for the last minute.	Dependent item	process.service_manager.avg.busy Preprocessing JSON Path: `$.data.process['service manager'].busy.avg` ⛔️Custom on fail: Set error to: `No "service manager" processes started.`
Remote Zabbix server: Utilization of trigger housekeeper internal processes, in %	The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute.	Dependent item	process.trigger_housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['trigger housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trigger housekeeper" processes started.`
Remote Zabbix server: Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `No "trapper" processes started.`
Remote Zabbix server: Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `No "unreachable poller" processes started.`
Remote Zabbix server: Utilization of vmware data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
Remote Zabbix server: Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Remote Zabbix server: Trend function cache, % of unique requests	The effectiveness statistics of Zabbix trend function cache. The percentage of cached items calculated from the sum of cached items plus requests. Low percentage most likely means that the cache size can be reduced.	Dependent item	tcache.pitems Preprocessing JSON Path: `$.data.tcache.pitems` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Remote Zabbix server: Trend function cache, % of misses	The effectiveness statistics of Zabbix trend function cache. The percentage of cache misses.	Dependent item	tcache.pmisses Preprocessing JSON Path: `$.data.tcache.pmisses` ⛔️Custom on fail: Set error to: `Not supported in this version.`
Remote Zabbix server: Value cache, % used	The availability statistics of Zabbix value cache. The percentage of used data buffer.	Dependent item	vcache.buffer.pused Preprocessing JSON Path: `$.data.vcache.buffer.pused`
Remote Zabbix server: Value cache hits	The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache).	Dependent item	vcache.cache.hits Preprocessing JSON Path: `$.data.vcache.cache.hits` Change per second
Remote Zabbix server: Value cache misses	The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database).	Dependent item	vcache.cache.misses Preprocessing JSON Path: `$.data.vcache.cache.misses` Change per second
Remote Zabbix server: Value cache operating mode	The operating mode of the value cache.	Dependent item	vcache.cache.mode Preprocessing JSON Path: `$.data.vcache.cache.mode`
Remote Zabbix server: Version	A version of Zabbix server.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
Remote Zabbix server: VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No "vmware collector" processes started.`
Remote Zabbix server: History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates performance problems on the database side.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
Remote Zabbix server: History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Remote Zabbix server: Trend write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have receive data for the current hour.	Dependent item	wcache.trend.pused Preprocessing JSON Path: `$.data.wcache.trend.pused`
Remote Zabbix server: Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Remote Zabbix server: Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed float values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Remote Zabbix server: Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Remote Zabbix server: Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or keeping that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Remote Zabbix server: Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character/string values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Remote Zabbix server: Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Remote Zabbix server: LLD queue	The count of values enqueued in the low-level discovery processing queue.	Dependent item	lld_queue Preprocessing JSON Path: `$.data.lld_queue`
Remote Zabbix server: Preprocessing queue	The count of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing_queue`
Remote Zabbix server: Connector queue	The count of values enqueued in the connector queue.	Dependent item	connector_queue Preprocessing JSON Path: `$.data.connector_queue` ⛔️Custom on fail: Set error to: `No "connector" processes started. Please check "StartConnectors" parameter in the server configuration file.`
Remote Zabbix server: Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second

Triggers

Name	Description	Expression	Severity
Remote Zabbix server: More than 100 items having missing data for more than 10 minutes	The `zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m]` item collects data about the number of items that have been missing the data for more than 10 minutes.	`min(/Remote Zabbix server health/zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Remote Zabbix server: Utilization of alert manager processes is high		`avg(/Remote Zabbix server health/process.alert_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of alert syncer processes is high		`avg(/Remote Zabbix server health/process.alert_syncer.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of alerter processes is high		`avg(/Remote Zabbix server health/process.alerter.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of availability manager processes is high		`avg(/Remote Zabbix server health/process.availability_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of configuration syncer processes is high		`avg(/Remote Zabbix server health/process.configuration_syncer.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of discoverer processes is high		`avg(/Remote Zabbix server health/process.discoverer.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of escalator processes is high		`avg(/Remote Zabbix server health/process.escalator.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of history poller processes is high		`avg(/Remote Zabbix server health/process.history_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of ODBC poller processes is high		`avg(/Remote Zabbix server health/process.odbc_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of history syncer processes is high		`avg(/Remote Zabbix server health/process.history_syncer.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of housekeeper processes is high		`avg(/Remote Zabbix server health/process.housekeeper.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of http poller processes is high		`avg(/Remote Zabbix server health/process.http_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of icmp pinger processes is high		`avg(/Remote Zabbix server health/process.icmp_pinger.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of ipmi manager processes is high		`avg(/Remote Zabbix server health/process.ipmi_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of ipmi poller processes is high		`avg(/Remote Zabbix server health/process.ipmi_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of java poller processes is high		`avg(/Remote Zabbix server health/process.java_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of lld manager processes is high		`avg(/Remote Zabbix server health/process.lld_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of lld worker processes is high		`avg(/Remote Zabbix server health/process.lld_worker.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of connector manager processes is high		`avg(/Remote Zabbix server health/process.connector_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of connector worker processes is high		`avg(/Remote Zabbix server health/process.connector_worker.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of poller processes is high		`avg(/Remote Zabbix server health/process.poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of preprocessing worker processes is high		`avg(/Remote Zabbix server health/process.preprocessing_worker.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of preprocessing manager processes is high		`avg(/Remote Zabbix server health/process.preprocessing_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of proxy poller processes is high		`avg(/Remote Zabbix server health/process.proxy_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of report manager processes is high		`avg(/Remote Zabbix server health/process.report_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of report writer processes is high		`avg(/Remote Zabbix server health/process.report_writer.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of self-monitoring processes is high		`avg(/Remote Zabbix server health/process.self-monitoring.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of snmp trapper processes is high		`avg(/Remote Zabbix server health/process.snmp_trapper.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of task manager processes is high		`avg(/Remote Zabbix server health/process.task_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of timer processes is high		`avg(/Remote Zabbix server health/process.timer.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of service manager processes is high		`avg(/Remote Zabbix server health/process.service_manager.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of trigger housekeeper processes is high		`avg(/Remote Zabbix server health/process.trigger_housekeeper.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of trapper processes is high		`avg(/Remote Zabbix server health/process.trapper.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of unreachable poller processes is high		`avg(/Remote Zabbix server health/process.unreachable_poller.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Utilization of vmware collector processes is high		`avg(/Remote Zabbix server health/process.vmware_collector.avg.busy,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: More than 75% used in the configuration cache	Consider increasing `CacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/rcache.buffer.pused,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: Failed to fetch stats data	Zabbix has not received statistics data for {$ZABBIX.SERVER.NODATA_TIMEOUT}.	`nodata(/Remote Zabbix server health/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1`\|Warning
Remote Zabbix server: More than 95% used in the value cache	Consider increasing `ValueCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/vcache.buffer.pused,10m)>95`\|Average	Manual close: Yes
Remote Zabbix server: Zabbix value cache working in low memory mode	Once the low memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner.	`last(/Remote Zabbix server health/vcache.cache.mode)=1`\|High	Manual close: Yes
Remote Zabbix server: Version has changed	Zabbix server version has changed. Acknowledge to close the problem manually.	`last(/Remote Zabbix server health/version,#1)<>last(/Remote Zabbix server health/version,#2) and length(last(/Remote Zabbix server health/version))>0`\|Info	Manual close: Yes
Remote Zabbix server: More than 75% used in the vmware cache	Consider increasing `VMwareCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/vmware.buffer.pused,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: More than 75% used in the history cache	Consider increasing `HistoryCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/wcache.history.pused,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: More than 75% used in the history index cache	Consider increasing `HistoryIndexCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/wcache.index.pused,10m)>75`\|Average	Manual close: Yes
Remote Zabbix server: More than 75% used in the trends cache	Consider increasing `TrendCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Remote Zabbix server health/wcache.trend.pused,10m)>75`\|Average	Manual close: Yes

LLD rule Zabbix proxy discovery

Name	Description	Type	Key and additional info
Zabbix proxy discovery	LLD rule with item and trigger prototypes for the proxy discovery.	Dependent item	zabbix.proxy.discovery

Item prototypes for Zabbix proxy discovery

Name	Description	Type	Key and additional info
Proxy [{#PROXY.NAME}]: Stats	The statistics for the discovered proxy.	Dependent item	zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing JSON Path: `$.[?(@.name=="{#PROXY.NAME}")].first()`
Proxy [{#PROXY.NAME}]: Mode	The mode of Zabbix proxy.	Dependent item	zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing JSON Path: `$.passive` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Unencrypted	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing JSON Path: `$.unencrypted` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: PSK	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing JSON Path: `$.psk` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Certificate	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing JSON Path: `$.cert` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Compression	The compression status of a proxy.	Dependent item	zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing JSON Path: `$.compression` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Item count	The number of enabled items on enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.items[{#PROXY.NAME}] Preprocessing JSON Path: `$.items` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Host count	The number of enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing JSON Path: `$.hosts` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Version	A version of Zabbix proxy.	Dependent item	zabbix.proxy.version[{#PROXY.NAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Last seen, in seconds	The time when a proxy was last seen by a server.	Dependent item	zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing JSON Path: `$.last_seen`
Proxy [{#PROXY.NAME}]: Compatibility	Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version).	Dependent item	zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing JSON Path: `$.compatibility` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing JSON Path: `$.requiredperformance` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy discovery

Name	Description	Expression
Proxy [{#PROXY.NAME}]: Proxy last seen	Zabbix proxy is not updating the configuration data.	`last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$PROXY.LAST_SEEN.MAX}`\|Warning
Proxy [{#PROXY.NAME}]: Zabbix proxy never seen	Zabbix proxy is not updating the configuration data.	`last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1`\|Warning
Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated	Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available.	`last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2`\|Warning
Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported	Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version.	`last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3`\|High

LLD rule High availability cluster node discovery

Name Description Type Key and additional info

High availability cluster node discovery

Name	Description	Type	Key and additional info
High availability cluster node discovery	LLD rule with item and trigger prototypes for the node discovery.	Dependent item	zabbix.nodes.discovery Preprocessing JSON Path: `$.data.ha`

LLD rule with item and trigger prototypes for the node discovery.

Dependent item

zabbix.nodes.discovery

Preprocessing

JSON Path: $.data.ha

Item prototypes for High availability cluster node discovery

Name	Description	Type	Key and additional info
Cluster node [{#NODE.NAME}]: Stats	Provides the statistics of a node.	Dependent item	zabbix.nodes.stats[{#NODE.ID}] Preprocessing JSON Path: `$.data.ha[?(@.id=="{#NODE.ID}")].first()`
Cluster node [{#NODE.NAME}]: Address	The IPv4 address of a node.	Dependent item	zabbix.nodes.address[{#NODE.ID}] Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `12h`
Cluster node [{#NODE.NAME}]: Last access time	Last access time.	Dependent item	zabbix.nodes.lastaccess.time[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess`
Cluster node [{#NODE.NAME}]: Last access age	The time between the database's `unix_timestamp()` and the last access time.	Dependent item	zabbix.nodes.lastaccess.age[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess_age`
Cluster node [{#NODE.NAME}]: Status	The status of a node.	Dependent item	zabbix.nodes.status[{#NODE.ID}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `12h`

Trigger prototypes for High availability cluster node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Cluster node [{#NODE.NAME}]: Status changed	The state of the node has changed. Acknowledge to close the problem manually.	`last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_zabbix_server

View README Download JSON

Zabbix server health

Overview

This template is designed to monitor internal Zabbix metrics on the local Zabbix server.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Zabbix server 6.4

Configuration

Setup

Link this template to the local Zabbix server host.

Macros used

Name	Description	Default
{$PROXY.LAST_SEEN.MAX}	The maximum number of seconds that Zabbix proxy has not been seen.	`600`

Items

Name	Description	Type	Key and additional info
Zabbix server: Zabbix stats cluster	The master item of Zabbix cluster statistics.	Zabbix internal	zabbix[cluster,discovery,nodes]
Zabbix server: Zabbix proxies stats	The master item of Zabbix proxies' statistics.	Zabbix internal	zabbix[proxy,discovery]
Zabbix server: Queue over 10 minutes	The number of monitored items in the queue, which are delayed at least by 10 minutes.	Zabbix internal	zabbix[queue,10m]
Zabbix server: Queue	The number of monitored items in the queue, which are delayed at least by 6 seconds.	Zabbix internal	zabbix[queue]
Zabbix server: Utilization of alert manager internal processes, in %	The average percentage of the time during which the alert manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,alert manager,avg,busy]
Zabbix server: Utilization of alert syncer internal processes, in %	The average percentage of the time during which the alert syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,alert syncer,avg,busy]
Zabbix server: Utilization of alerter internal processes, in %	The average percentage of the time during which the alerter processes have been busy for the last minute.	Zabbix internal	zabbix[process,alerter,avg,busy]
Zabbix server: Utilization of availability manager internal processes, in %	The average percentage of the time during which the availability manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,availability manager,avg,busy]
Zabbix server: Utilization of configuration syncer internal processes, in %	The average percentage of the time during which the configuration syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,configuration syncer,avg,busy]
Zabbix server: Utilization of discoverer data collector processes, in %	The average percentage of the time during which the discoverer processes have been busy for the last minute.	Zabbix internal	zabbix[process,discoverer,avg,busy]
Zabbix server: Utilization of escalator internal processes, in %	The average percentage of the time during which the escalator processes have been busy for the last minute.	Zabbix internal	zabbix[process,escalator,avg,busy]
Zabbix server: Utilization of history poller data collector processes, in %	The average percentage of the time during which the history poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,history poller,avg,busy]
Zabbix server: Utilization of ODBC poller data collector processes, in %	The average percentage of the time during which the ODBC poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,odbc poller,avg,busy]
Zabbix server: Utilization of history syncer internal processes, in %	The average percentage of the time during which the history syncer processes have been busy for the last minute.	Zabbix internal	zabbix[process,history syncer,avg,busy]
Zabbix server: Utilization of housekeeper internal processes, in %	The average percentage of the time during which the housekeeper processes have been busy for the last minute.	Zabbix internal	zabbix[process,housekeeper,avg,busy]
Zabbix server: Utilization of http poller data collector processes, in %	The average percentage of the time during which the http poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,http poller,avg,busy]
Zabbix server: Utilization of icmp pinger data collector processes, in %	The average percentage of the time during which the icmp pinger processes have been busy for the last minute.	Zabbix internal	zabbix[process,icmp pinger,avg,busy]
Zabbix server: Utilization of ipmi manager internal processes, in %	The average percentage of the time during which the ipmi manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,ipmi manager,avg,busy]
Zabbix server: Utilization of ipmi poller data collector processes, in %	The average percentage of the time during which the ipmi poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,ipmi poller,avg,busy]
Zabbix server: Utilization of java poller data collector processes, in %	The average percentage of the time during which the java poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,java poller,avg,busy]
Zabbix server: Utilization of LLD manager internal processes, in %	The average percentage of the time during which the lld manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,lld manager,avg,busy]
Zabbix server: Utilization of LLD worker internal processes, in %	The average percentage of the time during which the lld worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,lld worker,avg,busy]
Zabbix server: Utilization of connector manager internal processes, in %	The average percentage of the time during which the connector manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,connector manager,avg,busy]
Zabbix server: Utilization of connector worker internal processes, in %	The average percentage of the time during which the connector worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,connector worker,avg,busy]
Zabbix server: Utilization of poller data collector processes, in %	The average percentage of the time during which the poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,poller,avg,busy]
Zabbix server: Utilization of preprocessing worker internal processes, in %	The average percentage of the time during which the preprocessing worker processes have been busy for the last minute.	Zabbix internal	zabbix[process,preprocessing worker,avg,busy]
Zabbix server: Utilization of preprocessing manager internal processes, in %	The average percentage of the time during which the preprocessing manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,preprocessing manager,avg,busy]
Zabbix server: Utilization of proxy poller data collector processes, in %	The average percentage of the time during which the proxy poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,proxy poller,avg,busy]
Zabbix server: Utilization of report manager internal processes, in %	The average percentage of the time during which the report manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,report manager,avg,busy]
Zabbix server: Utilization of report writer internal processes, in %	The average percentage of the time during which the report writer processes have been busy for the last minute.	Zabbix internal	zabbix[process,report writer,avg,busy]
Zabbix server: Utilization of self-monitoring internal processes, in %	The average percentage of the time during which the self-monitoring processes have been busy for the last minute.	Zabbix internal	zabbix[process,self-monitoring,avg,busy]
Zabbix server: Utilization of snmp trapper data collector processes, in %	The average percentage of the time during which the snmp trapper processes have been busy for the last minute.	Zabbix internal	zabbix[process,snmp trapper,avg,busy]
Zabbix server: Utilization of task manager internal processes, in %	The average percentage of the time during which the task manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,task manager,avg,busy]
Zabbix server: Utilization of timer internal processes, in %	The average percentage of the time during which the timer processes have been busy for the last minute.	Zabbix internal	zabbix[process,timer,avg,busy]
Zabbix server: Utilization of service manager internal processes, in %	The average percentage of the time during which the service manager processes have been busy for the last minute.	Zabbix internal	zabbix[process,service manager,avg,busy]
Zabbix server: Utilization of trigger housekeeper internal processes, in %	The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute.	Zabbix internal	zabbix[process,trigger housekeeper,avg,busy]
Zabbix server: Utilization of trapper data collector processes, in %	The average percentage of the time during which the trapper processes have been busy for the last minute.	Zabbix internal	zabbix[process,trapper,avg,busy]
Zabbix server: Utilization of unreachable poller data collector processes, in %	The average percentage of the time during which the unreachable poller processes have been busy for the last minute.	Zabbix internal	zabbix[process,unreachable poller,avg,busy]
Zabbix server: Utilization of vmware data collector processes, in %	The average percentage of the time during which the vmware collector processes have been busy for the last minute.	Zabbix internal	zabbix[process,vmware collector,avg,busy]
Zabbix server: Configuration cache, % used	The availability statistics of Zabbix configuration cache. The percentage of used data buffer.	Zabbix internal	zabbix[rcache,buffer,pused]
Zabbix server: Trend function cache, % of unique requests	The effectiveness statistics of Zabbix trend function cache. The percentage of cached items calculated from the sum of cached items plus requests. Low percentage most likely means that the cache size can be reduced.	Zabbix internal	zabbix[tcache,cache,pitems]
Zabbix server: Trend function cache, % of misses	The effectiveness statistics of Zabbix trend function cache. The percentage of cache misses.	Zabbix internal	zabbix[tcache,cache,pmisses]
Zabbix server: Value cache, % used	The availability statistics of Zabbix value cache. The percentage of used data buffer.	Zabbix internal	zabbix[vcache,buffer,pused]
Zabbix server: Value cache hits	The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache).	Zabbix internal	zabbix[vcache,cache,hits] Preprocessing Change per second
Zabbix server: Value cache misses	The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database).	Zabbix internal	zabbix[vcache,cache,misses] Preprocessing Change per second
Zabbix server: Value cache operating mode	The operating mode of the value cache.	Zabbix internal	zabbix[vcache,cache,mode]
Zabbix server: Version	A version of Zabbix server.	Zabbix internal	zabbix[version] Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix server: VMware cache, % used	The availability statistics of Zabbix vmware cache. The percentage of used data buffer.	Zabbix internal	zabbix[vmware,buffer,pused]
Zabbix server: History write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates performance problems on the database side.	Zabbix internal	zabbix[wcache,history,pused]
Zabbix server: History index cache, % used	The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache.	Zabbix internal	zabbix[wcache,index,pused]
Zabbix server: Trend write cache, % used	The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour.	Zabbix internal	zabbix[wcache,trend,pused]
Zabbix server: Number of processed values per second	The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Zabbix internal	zabbix[wcache,values] Preprocessing Change per second
Zabbix server: Number of processed numeric (float) values per second	The statistics and availability of Zabbix write cache. The number of processed float values.	Zabbix internal	zabbix[wcache,values,float] Preprocessing Change per second
Zabbix server: Number of processed log values per second	The statistics and availability of Zabbix write cache. The number of processed log values.	Zabbix internal	zabbix[wcache,values,log] Preprocessing Change per second
Zabbix server: Number of processed not supported values per second	The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or keeping that state.	Zabbix internal	zabbix[wcache,values,not supported] Preprocessing Change per second
Zabbix server: Number of processed character values per second	The statistics and availability of Zabbix write cache. The number of processed character/string values.	Zabbix internal	zabbix[wcache,values,str] Preprocessing Change per second
Zabbix server: Number of processed text values per second	The statistics and availability of Zabbix write cache. The number of processed text values.	Zabbix internal	zabbix[wcache,values,text] Preprocessing Change per second
Zabbix server: LLD queue	The count of values enqueued in the low-level discovery processing queue.	Zabbix internal	zabbix[lld_queue]
Zabbix server: Preprocessing queue	The count of values enqueued in the preprocessing queue.	Zabbix internal	zabbix[preprocessing_queue]
Zabbix server: Connector queue	The count of values enqueued in the connector queue.	Zabbix internal	zabbix[connector_queue]
Zabbix server: Number of processed numeric (unsigned) values per second	The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values.	Zabbix internal	zabbix[wcache,values,uint] Preprocessing Change per second

Triggers

Name	Description	Expression	Severity
Zabbix server: More than 100 items having missing data for more than 10 minutes	The `zabbix[stats,{$IP},{$PORT},queue,10m]` item collects data about the number of items that have been missing the data for more than 10 minutes.	`min(/Zabbix server health/zabbix[queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix server: Utilization of alert manager processes is high		`avg(/Zabbix server health/zabbix[process,alert manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of alert syncer processes is high		`avg(/Zabbix server health/zabbix[process,alert syncer,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of alerter processes is high		`avg(/Zabbix server health/zabbix[process,alerter,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of availability manager processes is high		`avg(/Zabbix server health/zabbix[process,availability manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of configuration syncer processes is high		`avg(/Zabbix server health/zabbix[process,configuration syncer,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of discoverer processes is high		`avg(/Zabbix server health/zabbix[process,discoverer,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of escalator processes is high		`avg(/Zabbix server health/zabbix[process,escalator,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of history poller processes is high		`avg(/Zabbix server health/zabbix[process,history poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of ODBC poller processes is high		`avg(/Zabbix server health/zabbix[process,odbc poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of history syncer processes is high		`avg(/Zabbix server health/zabbix[process,history syncer,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of housekeeper processes is high		`avg(/Zabbix server health/zabbix[process,housekeeper,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of http poller processes is high		`avg(/Zabbix server health/zabbix[process,http poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of icmp pinger processes is high		`avg(/Zabbix server health/zabbix[process,icmp pinger,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi manager processes is high		`avg(/Zabbix server health/zabbix[process,ipmi manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of ipmi poller processes is high		`avg(/Zabbix server health/zabbix[process,ipmi poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of java poller processes is high		`avg(/Zabbix server health/zabbix[process,java poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of lld manager processes is high		`avg(/Zabbix server health/zabbix[process,lld manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of lld worker processes is high		`avg(/Zabbix server health/zabbix[process,lld worker,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of connector manager processes is high		`avg(/Zabbix server health/zabbix[process,connector manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of connector worker processes is high		`avg(/Zabbix server health/zabbix[process,connector worker,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of poller processes is high		`avg(/Zabbix server health/zabbix[process,poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing worker processes is high		`avg(/Zabbix server health/zabbix[process,preprocessing worker,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of preprocessing manager processes is high		`avg(/Zabbix server health/zabbix[process,preprocessing manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of proxy poller processes is high		`avg(/Zabbix server health/zabbix[process,proxy poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of report manager processes is high		`avg(/Zabbix server health/zabbix[process,report manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of report writer processes is high		`avg(/Zabbix server health/zabbix[process,report writer,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of self-monitoring processes is high		`avg(/Zabbix server health/zabbix[process,self-monitoring,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of snmp trapper processes is high		`avg(/Zabbix server health/zabbix[process,snmp trapper,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of task manager processes is high		`avg(/Zabbix server health/zabbix[process,task manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of timer processes is high		`avg(/Zabbix server health/zabbix[process,timer,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of service manager processes is high		`avg(/Zabbix server health/zabbix[process,service manager,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of trigger housekeeper processes is high		`avg(/Zabbix server health/zabbix[process,trigger housekeeper,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of trapper processes is high		`avg(/Zabbix server health/zabbix[process,trapper,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of unreachable poller processes is high		`avg(/Zabbix server health/zabbix[process,unreachable poller,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: Utilization of vmware collector processes is high		`avg(/Zabbix server health/zabbix[process,vmware collector,avg,busy],10m)>75`\|Average	Manual close: Yes
Zabbix server: More than 75% used in the configuration cache	Consider increasing `CacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[rcache,buffer,pused],10m)>75`\|Average	Manual close: Yes
Zabbix server: More than 95% used in the value cache	Consider increasing `ValueCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[vcache,buffer,pused],10m)>95`\|Average	Manual close: Yes
Zabbix server: Zabbix value cache working in low memory mode	Once the low memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner.	`last(/Zabbix server health/zabbix[vcache,cache,mode])=1`\|High	Manual close: Yes
Zabbix server: Version has changed	Zabbix server version has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health/zabbix[version],#1)<>last(/Zabbix server health/zabbix[version],#2) and length(last(/Zabbix server health/zabbix[version]))>0`\|Info	Manual close: Yes
Zabbix server: More than 75% used in the vmware cache	Consider increasing `VMwareCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[vmware,buffer,pused],10m)>75`\|Average	Manual close: Yes
Zabbix server: More than 75% used in the history cache	Consider increasing `HistoryCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[wcache,history,pused],10m)>75`\|Average	Manual close: Yes
Zabbix server: More than 75% used in the history index cache	Consider increasing `HistoryIndexCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[wcache,index,pused],10m)>75`\|Average	Manual close: Yes
Zabbix server: More than 75% used in the trends cache	Consider increasing `TrendCacheSize` in the `zabbix_server.conf` configuration file.	`max(/Zabbix server health/zabbix[wcache,trend,pused],10m)>75`\|Average	Manual close: Yes

LLD rule Zabbix proxy discovery

Name	Description	Type	Key and additional info
Zabbix proxy discovery	LLD rule with item and trigger prototypes for the proxy discovery.	Dependent item	zabbix.proxy.discovery

Item prototypes for Zabbix proxy discovery

Name	Description	Type	Key and additional info
Proxy [{#PROXY.NAME}]: Stats	The statistics for the discovered proxy.	Dependent item	zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing JSON Path: `$.[?(@.name=="{#PROXY.NAME}")].first()`
Proxy [{#PROXY.NAME}]: Mode	The mode of Zabbix proxy.	Dependent item	zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing JSON Path: `$.passive` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Unencrypted	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing JSON Path: `$.unencrypted` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: PSK	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing JSON Path: `$.psk` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Certificate	The encryption status for connections from a proxy.	Dependent item	zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing JSON Path: `$.cert` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Compression	The compression status of a proxy.	Dependent item	zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing JSON Path: `$.compression` JavaScript: `return value === 'false' ? 0 : 1` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Item count	The number of enabled items on enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.items[{#PROXY.NAME}] Preprocessing JSON Path: `$.items` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Host count	The number of enabled hosts assigned to a proxy.	Dependent item	zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing JSON Path: `$.hosts` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Version	A version of Zabbix proxy.	Dependent item	zabbix.proxy.version[{#PROXY.NAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Last seen, in seconds	The time when a proxy was last seen by a server.	Dependent item	zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing JSON Path: `$.last_seen`
Proxy [{#PROXY.NAME}]: Compatibility	Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version).	Dependent item	zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing JSON Path: `$.compatibility` Discard unchanged with heartbeat: `12h`
Proxy [{#PROXY.NAME}]: Required VPS	The required performance of a proxy (the number of values that need to be collected per second).	Dependent item	zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing JSON Path: `$.requiredperformance` Discard unchanged with heartbeat: `12h`

Trigger prototypes for Zabbix proxy discovery

Name	Description	Expression
Proxy [{#PROXY.NAME}]: Proxy last seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$PROXY.LAST_SEEN.MAX}`\|Warning
Proxy [{#PROXY.NAME}]: Zabbix proxy never seen	Zabbix proxy is not updating the configuration data.	`last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1`\|Warning
Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated	Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available.	`last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2`\|Warning
Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported	Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version.	`last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3`\|High

LLD rule High availability cluster node discovery

Name	Description	Type	Key and additional info
High availability cluster node discovery	LLD rule with item and trigger prototypes for the node discovery.	Dependent item	zabbix.nodes.discovery

Item prototypes for High availability cluster node discovery

Name	Description	Type	Key and additional info
Cluster node [{#NODE.NAME}]: Stats	Provides the statistics of a node.	Dependent item	zabbix.nodes.stats[{#NODE.ID}] Preprocessing JSON Path: `$.[?(@.id=="{#NODE.ID}")].first()`
Cluster node [{#NODE.NAME}]: Address	The IPv4 address of a node.	Dependent item	zabbix.nodes.address[{#NODE.ID}] Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `12h`
Cluster node [{#NODE.NAME}]: Last access time	Last access time.	Dependent item	zabbix.nodes.lastaccess.time[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess`
Cluster node [{#NODE.NAME}]: Last access age	The time between the database's `unix_timestamp()` and the last access time.	Dependent item	zabbix.nodes.lastaccess.age[{#NODE.ID}] Preprocessing JSON Path: `$.lastaccess_age`
Cluster node [{#NODE.NAME}]: Status	The status of a node.	Dependent item	zabbix.nodes.status[{#NODE.ID}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `12h`

Trigger prototypes for High availability cluster node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Cluster node [{#NODE.NAME}]: Status changed	The state of the node has changed. Acknowledge to close the problem manually.	`last(/Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_zabbix_proxy_remote

View README Download JSON

Remote Zabbix proxy health

Macros used

Name	Description	Default
{$ZABBIX.PROXY.ADDRESS}	IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1).
{$ZABBIX.PROXY.PORT}	Port of proxy to be remotely queried (default is 10051).
{$ZABBIX.PROXY.UTIL.MAX}	Maximum average percentage of time processes busy in the last minute (default is 75).	`75`
{$ZABBIX.PROXY.UTIL.MIN}	Minimum average percentage of time processes busy in the last minute (default is 65).	`65`
{$ZABBIX.PROXY.NODATA_TIMEOUT}	The time threshold after which statistics are considered unavailable. Used in trigger expression.	`5m`

Items

Name	Description	Type	Key and additional info
Remote Zabbix proxy: Zabbix stats	Zabbix server statistics master item.	Zabbix internal	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}]
Remote Zabbix proxy: Zabbix stats queue over 10m	Number of monitored items in the queue which are delayed at least by 10 minutes.	Zabbix internal	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing JSON Path: `$.queue`
Remote Zabbix proxy: Zabbix stats queue	Number of monitored items in the queue which are delayed at least by 6 seconds.	Zabbix internal	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing JSON Path: `$.queue`
Remote Zabbix proxy: Utilization of data sender internal processes, in %	Average percentage of time data sender processes have been busy in the last minute.	Dependent item	process.data_sender.avg.busy Preprocessing JSON Path: `$.data.process['data sender'].busy.avg` ⛔️Custom on fail: Set error to: `Processes data sender not started`
Remote Zabbix proxy: Utilization of availability manager internal processes, in %	Average percentage of time availability manager processes have been busy in the last minute.	Dependent item	process.availability_manager.avg.busy Preprocessing JSON Path: `$.data.process['availability manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes availability manager not started`
Remote Zabbix proxy: Utilization of configuration syncer internal processes, in %	Average percentage of time configuration syncer processes have been busy in the last minute.	Dependent item	process.configuration_syncer.avg.busy Preprocessing JSON Path: `$.data.process['configuration syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes configuration syncer not started`
Remote Zabbix proxy: Utilization of discoverer data collector processes, in %	Average percentage of time discoverer processes have been busy in the last minute.	Dependent item	process.discoverer.avg.busy Preprocessing JSON Path: `$.data.process['discoverer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes discoverer not started`
Remote Zabbix proxy: Utilization of ODBC poller data collector processes, in %	Average percentage of time ODBC poller processes have been busy in the last minute.	Dependent item	process.odbc_poller.avg.busy Preprocessing JSON Path: `$.data.process['odbc poller'].busy.avg`
Remote Zabbix proxy: Utilization of history poller data collector processes, in %	Average percentage of time history poller processes have been busy in the last minute.	Dependent item	process.history_poller.avg.busy Preprocessing JSON Path: `$.data.process['history poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes history poller not started`
Remote Zabbix proxy: Utilization of history syncer internal processes, in %	Average percentage of time history syncer processes have been busy in the last minute.	Dependent item	process.history_syncer.avg.busy Preprocessing JSON Path: `$.data.process['history syncer'].busy.avg` ⛔️Custom on fail: Set error to: `Processes history syncer not started`
Remote Zabbix proxy: Utilization of housekeeper internal processes, in %	Average percentage of time housekeeper processes have been busy in the last minute.	Dependent item	process.housekeeper.avg.busy Preprocessing JSON Path: `$.data.process['housekeeper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes housekeeper not started`
Remote Zabbix proxy: Utilization of http poller data collector processes, in %	Average percentage of time http poller processes have been busy in the last minute.	Dependent item	process.http_poller.avg.busy Preprocessing JSON Path: `$.data.process['http poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes http poller not started`
Remote Zabbix proxy: Utilization of icmp pinger data collector processes, in %	Average percentage of time icmp pinger processes have been busy in the last minute.	Dependent item	process.icmp_pinger.avg.busy Preprocessing JSON Path: `$.data.process['icmp pinger'].busy.avg` ⛔️Custom on fail: Set error to: `Processes icmp pinger not started`
Remote Zabbix proxy: Utilization of ipmi manager internal processes, in %	Average percentage of time ipmi manager processes have been busy in the last minute.	Dependent item	process.ipmi_manager.avg.busy Preprocessing JSON Path: `$.data.process['ipmi manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi manager not started`
Remote Zabbix proxy: Utilization of ipmi poller data collector processes, in %	Average percentage of time ipmi poller processes have been busy in the last minute.	Dependent item	process.ipmi_poller.avg.busy Preprocessing JSON Path: `$.data.process['ipmi poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes ipmi poller not started`
Remote Zabbix proxy: Utilization of java poller data collector processes, in %	Average percentage of time java poller processes have been busy in the last minute.	Dependent item	process.java_poller.avg.busy Preprocessing JSON Path: `$.data.process['java poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes java poller not started`
Remote Zabbix proxy: Utilization of poller data collector processes, in %	Average percentage of time poller processes have been busy in the last minute.	Dependent item	process.poller.avg.busy Preprocessing JSON Path: `$.data.process['poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes poller not started`
Remote Zabbix proxy: Utilization of preprocessing worker internal processes, in %	Average percentage of time preprocessing worker processes have been busy in the last minute.	Dependent item	process.preprocessing_worker.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing worker'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing worker not started`
Remote Zabbix proxy: Utilization of preprocessing manager internal processes, in %	Average percentage of time preprocessing manager processes have been busy in the last minute.	Dependent item	process.preprocessing_manager.avg.busy Preprocessing JSON Path: `$.data.process['preprocessing manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes preprocessing manager not started`
Remote Zabbix proxy: Utilization of self-monitoring internal processes, in %	Average percentage of time self-monitoring processes have been busy in the last minute.	Dependent item	process.self-monitoring.avg.busy Preprocessing JSON Path: `$.data.process['self-monitoring'].busy.avg` ⛔️Custom on fail: Set error to: `Processes self-monitoring not started`
Remote Zabbix proxy: Utilization of snmp trapper data collector processes, in %	Average percentage of time snmp trapper processes have been busy in the last minute.	Dependent item	process.snmp_trapper.avg.busy Preprocessing JSON Path: `$.data.process['snmp trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes snmp trapper not started`
Remote Zabbix proxy: Utilization of task manager internal processes, in %	Average percentage of time task manager processes have been busy in the last minute.	Dependent item	process.task_manager.avg.busy Preprocessing JSON Path: `$.data.process['task manager'].busy.avg` ⛔️Custom on fail: Set error to: `Processes task manager not started`
Remote Zabbix proxy: Utilization of trapper data collector processes, in %	Average percentage of time trapper processes have been busy in the last minute.	Dependent item	process.trapper.avg.busy Preprocessing JSON Path: `$.data.process['trapper'].busy.avg` ⛔️Custom on fail: Set error to: `Processes trapper not started`
Remote Zabbix proxy: Utilization of unreachable poller data collector processes, in %	Average percentage of time unreachable poller processes have been busy in the last minute.	Dependent item	process.unreachable_poller.avg.busy Preprocessing JSON Path: `$.data.process['unreachable poller'].busy.avg` ⛔️Custom on fail: Set error to: `Processes unreachable poller not started`
Remote Zabbix proxy: Utilization of vmware data collector processes, in %	Average percentage of time vmware collector processes have been busy in the last minute.	Dependent item	process.vmware_collector.avg.busy Preprocessing JSON Path: `$.data.process['vmware collector'].busy.avg` ⛔️Custom on fail: Set error to: `Processes vmware collector not started`
Remote Zabbix proxy: Configuration cache, % used	Availability statistics of Zabbix configuration cache. Percentage of used buffer.	Dependent item	rcache.buffer.pused Preprocessing JSON Path: `$.data.rcache.pused`
Remote Zabbix proxy: Version	Version of Zabbix proxy.	Dependent item	version Preprocessing JSON Path: `$.data.version` Discard unchanged with heartbeat: `1d`
Remote Zabbix proxy: VMware cache, % used	Availability statistics of Zabbix vmware cache. Percentage of used buffer.	Dependent item	vmware.buffer.pused Preprocessing JSON Path: `$.data.vmware.pused` ⛔️Custom on fail: Set error to: `No vmware collector processes started`
Remote Zabbix proxy: History write cache, % used	Statistics and availability of Zabbix write cache. Percentage of used history buffer. History cache is used to store item values. A high number indicates performance problems on the database side.	Dependent item	wcache.history.pused Preprocessing JSON Path: `$.data.wcache.history.pused`
Remote Zabbix proxy: History index cache, % used	Statistics and availability of Zabbix write cache. Percentage of used history index buffer. History index cache is used to index values stored in history cache.	Dependent item	wcache.index.pused Preprocessing JSON Path: `$.data.wcache.index.pused`
Remote Zabbix proxy: Number of processed values per second	Statistics and availability of Zabbix write cache. Total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Dependent item	wcache.values Preprocessing JSON Path: `$.data.wcache.values.all` Change per second
Remote Zabbix proxy: Number of processed numeric (float) values per second	Statistics and availability of Zabbix write cache. Number of processed float values.	Dependent item	wcache.values.float Preprocessing JSON Path: `$.data.wcache.values.float` Change per second
Remote Zabbix proxy: Number of processed log values per second	Statistics and availability of Zabbix write cache. Number of processed log values.	Dependent item	wcache.values.log Preprocessing JSON Path: `$.data.wcache.values.log` Change per second
Remote Zabbix proxy: Number of processed not supported values per second	Statistics and availability of Zabbix write cache. Number of times item processing resulted in item becoming unsupported or keeping that state.	Dependent item	wcache.values.not_supported Preprocessing JSON Path: `$.data.wcache.values['not supported']` Change per second
Remote Zabbix proxy: Number of processed character values per second	Statistics and availability of Zabbix write cache. Number of processed character/string values.	Dependent item	wcache.values.str Preprocessing JSON Path: `$.data.wcache.values.str` Change per second
Remote Zabbix proxy: Number of processed text values per second	Statistics and availability of Zabbix write cache. Number of processed text values.	Dependent item	wcache.values.text Preprocessing JSON Path: `$.data.wcache.values.text` Change per second
Remote Zabbix proxy: Preprocessing queue	Count of values enqueued in the preprocessing queue.	Dependent item	preprocessing_queue Preprocessing JSON Path: `$.data.preprocessing_queue`
Remote Zabbix proxy: Number of processed numeric (unsigned) values per second	Statistics and availability of Zabbix write cache. Number of processed numeric (unsigned) values.	Dependent item	wcache.values.uint Preprocessing JSON Path: `$.data.wcache.values.uint` Change per second
Remote Zabbix proxy: Required performance	Required performance of Zabbix proxy, in new values per second expected.	Dependent item	requiredperformance Preprocessing JSON Path: `$.data.requiredperformance`
Remote Zabbix proxy: Uptime	Uptime of Zabbix proxy process in seconds.	Dependent item	uptime Preprocessing JSON Path: `$.data.uptime`

Triggers

Name	Description	Expression	Severity
Remote Zabbix proxy: More than 100 items having missing data for more than 10 minutes	zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] item is collecting data about how many items are missing data for more than 10 minutes.	`min(/Remote Zabbix proxy health/zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100`\|Warning	Manual close: Yes
Remote Zabbix proxy: Utilization of data sender processes is high		`avg(/Remote Zabbix proxy health/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of availability manager processes is high		`avg(/Remote Zabbix proxy health/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of configuration syncer processes is high		`avg(/Remote Zabbix proxy health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of discoverer processes is high		`avg(/Remote Zabbix proxy health/process.discoverer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discoverer"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of ODBC poller processes is high		`avg(/Remote Zabbix proxy health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of history poller processes is high		`avg(/Remote Zabbix proxy health/process.history_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of history syncer processes is high		`avg(/Remote Zabbix proxy health/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of housekeeper processes is high		`avg(/Remote Zabbix proxy health/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of http poller processes is high		`avg(/Remote Zabbix proxy health/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of icmp pinger processes is high		`avg(/Remote Zabbix proxy health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of ipmi manager processes is high		`avg(/Remote Zabbix proxy health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of ipmi poller processes is high		`avg(/Remote Zabbix proxy health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of java poller processes is high		`avg(/Remote Zabbix proxy health/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of poller processes is high		`avg(/Remote Zabbix proxy health/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of preprocessing worker processes is high		`avg(/Remote Zabbix proxy health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of preprocessing manager processes is high		`avg(/Remote Zabbix proxy health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of self-monitoring processes is high		`avg(/Remote Zabbix proxy health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of snmp trapper processes is high		`avg(/Remote Zabbix proxy health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of task manager processes is high		`avg(/Remote Zabbix proxy health/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of trapper processes is high		`avg(/Remote Zabbix proxy health/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of unreachable poller processes is high		`avg(/Remote Zabbix proxy health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Remote Zabbix proxy: Utilization of vmware collector processes is high		`avg(/Remote Zabbix proxy health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the configuration cache	Consider increasing CacheSize in the zabbix_server.conf configuration file.	`max(/Remote Zabbix proxy health/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Remote Zabbix proxy: Failed to fetch stats data	Zabbix has not received statistics data for {$ZABBIX.PROXY.NODATA_TIMEOUT}.	`nodata(/Remote Zabbix proxy health/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1`\|Warning
Remote Zabbix proxy: Version has changed	Zabbix proxy version has changed. Acknowledge to close the problem manually.	`last(/Remote Zabbix proxy health/version,#1)<>last(/Remote Zabbix proxy health/version,#2) and length(last(/Remote Zabbix proxy health/version))>0`\|Info	Manual close: Yes
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the vmware cache	Consider increasing VMwareCacheSize in the zabbix_server.conf configuration file.	`max(/Remote Zabbix proxy health/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history cache	Consider increasing HistoryCacheSize in the zabbix_server.conf configuration file.	`max(/Remote Zabbix proxy health/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history index cache	Consider increasing HistoryIndexCacheSize in the zabbix_server.conf configuration file.	`max(/Remote Zabbix proxy health/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Remote Zabbix proxy: {HOST.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Remote Zabbix proxy health/uptime)<10m`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_zabbix_proxy

View README Download JSON

Zabbix proxy health

Macros used

Name	Description	Default
{$ZABBIX.PROXY.UTIL.MAX}	Maximum average percentage of time processes busy in the last minute (default is 75).	`75`
{$ZABBIX.PROXY.UTIL.MIN}	Minimum average percentage of time processes busy in the last minute (default is 65).	`65`

Items

Name	Description	Type	Key and additional info
Zabbix proxy: Queue over 10 minutes	Number of monitored items in the queue which are delayed at least by 10 minutes.	Zabbix internal	zabbix[queue,10m]
Zabbix proxy: Queue	Number of monitored items in the queue which are delayed at least by 6 seconds.	Zabbix internal	zabbix[queue]
Zabbix proxy: Utilization of data sender internal processes, in %	Average percentage of time data sender processes have been busy in the last minute.	Zabbix internal	zabbix[process,data sender,avg,busy]
Zabbix proxy: Utilization of availability manager internal processes, in %	Average percentage of time availability manager processes have been busy in the last minute.	Zabbix internal	zabbix[process,availability manager,avg,busy]
Zabbix proxy: Utilization of configuration syncer internal processes, in %	Average percentage of time configuration syncer processes have been busy in the last minute.	Zabbix internal	zabbix[process,configuration syncer,avg,busy]
Zabbix proxy: Utilization of discoverer data collector processes, in %	Average percentage of time discoverer processes have been busy in the last minute.	Zabbix internal	zabbix[process,discoverer,avg,busy]
Zabbix proxy: Utilization of ODBC poller data collector processes, in %	Average percentage of time ODBC poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,odbc poller,avg,busy]
Zabbix proxy: Utilization of history poller data collector processes, in %	Average percentage of time history poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,history poller,avg,busy]
Zabbix proxy: Utilization of history syncer internal processes, in %	Average percentage of time history syncer processes have been busy in the last minute.	Zabbix internal	zabbix[process,history syncer,avg,busy]
Zabbix proxy: Utilization of housekeeper internal processes, in %	Average percentage of time housekeeper processes have been busy in the last minute.	Zabbix internal	zabbix[process,housekeeper,avg,busy]
Zabbix proxy: Utilization of http poller data collector processes, in %	Average percentage of time http poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,http poller,avg,busy]
Zabbix proxy: Utilization of icmp pinger data collector processes, in %	Average percentage of time icmp pinger processes have been busy in the last minute.	Zabbix internal	zabbix[process,icmp pinger,avg,busy]
Zabbix proxy: Utilization of ipmi manager internal processes, in %	Average percentage of time ipmi manager processes have been busy in the last minute.	Zabbix internal	zabbix[process,ipmi manager,avg,busy]
Zabbix proxy: Utilization of ipmi poller data collector processes, in %	Average percentage of time ipmi poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,ipmi poller,avg,busy]
Zabbix proxy: Utilization of java poller data collector processes, in %	Average percentage of time java poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,java poller,avg,busy]
Zabbix proxy: Utilization of poller data collector processes, in %	Average percentage of time poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,poller,avg,busy]
Zabbix proxy: Utilization of preprocessing worker internal processes, in %	Average percentage of time preprocessing worker processes have been busy in the last minute.	Zabbix internal	zabbix[process,preprocessing worker,avg,busy]
Zabbix proxy: Utilization of preprocessing manager internal processes, in %	Average percentage of time preprocessing manager processes have been busy in the last minute.	Zabbix internal	zabbix[process,preprocessing manager,avg,busy]
Zabbix proxy: Utilization of self-monitoring internal processes, in %	Average percentage of time self-monitoring processes have been busy in the last minute.	Zabbix internal	zabbix[process,self-monitoring,avg,busy]
Zabbix proxy: Utilization of snmp trapper data collector processes, in %	Average percentage of time snmp trapper processes have been busy in the last minute.	Zabbix internal	zabbix[process,snmp trapper,avg,busy]
Zabbix proxy: Utilization of task manager internal processes, in %	Average percentage of time task manager processes have been busy in the last minute.	Zabbix internal	zabbix[process,task manager,avg,busy]
Zabbix proxy: Utilization of trapper data collector processes, in %	Average percentage of time trapper processes have been busy in the last minute.	Zabbix internal	zabbix[process,trapper,avg,busy]
Zabbix proxy: Utilization of unreachable poller data collector processes, in %	Average percentage of time unreachable poller processes have been busy in the last minute.	Zabbix internal	zabbix[process,unreachable poller,avg,busy]
Zabbix proxy: Utilization of vmware data collector processes, in %	Average percentage of time vmware collector processes have been busy in the last minute.	Zabbix internal	zabbix[process,vmware collector,avg,busy]
Zabbix proxy: Configuration cache, % used	Availability statistics of Zabbix configuration cache. Percentage of used buffer.	Zabbix internal	zabbix[rcache,buffer,pused]
Zabbix proxy: Version	Version of Zabbix proxy.	Zabbix internal	zabbix[version] Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix proxy: VMware cache, % used	Availability statistics of Zabbix vmware cache. Percentage of used buffer.	Zabbix internal	zabbix[vmware,buffer,pused]
Zabbix proxy: History write cache, % used	Statistics and availability of Zabbix write cache. Percentage of used history buffer. History cache is used to store item values. A high number indicates performance problems on the database side.	Zabbix internal	zabbix[wcache,history,pused]
Zabbix proxy: History index cache, % used	Statistics and availability of Zabbix write cache. Percentage of used history index buffer. History index cache is used to index values stored in history cache.	Zabbix internal	zabbix[wcache,index,pused]
Zabbix proxy: Number of processed values per second	Statistics and availability of Zabbix write cache. Total number of values processed by Zabbix server or Zabbix proxy, except unsupported items.	Zabbix internal	zabbix[wcache,values] Preprocessing Change per second
Zabbix proxy: Number of processed numeric (float) values per second	Statistics and availability of Zabbix write cache. Number of processed float values.	Zabbix internal	zabbix[wcache,values,float] Preprocessing Change per second
Zabbix proxy: Number of processed log values per second	Statistics and availability of Zabbix write cache. Number of processed log values.	Zabbix internal	zabbix[wcache,values,log] Preprocessing Change per second
Zabbix proxy: Number of processed not supported values per second	Statistics and availability of Zabbix write cache. Number of times item processing resulted in item becoming unsupported or keeping that state.	Zabbix internal	zabbix[wcache,values,not supported] Preprocessing Change per second
Zabbix proxy: Number of processed character values per second	Statistics and availability of Zabbix write cache. Number of processed character/string values.	Zabbix internal	zabbix[wcache,values,str] Preprocessing Change per second
Zabbix proxy: Number of processed text values per second	Statistics and availability of Zabbix write cache. Number of processed text values.	Zabbix internal	zabbix[wcache,values,text] Preprocessing Change per second
Zabbix proxy: Preprocessing queue	Count of values enqueued in the preprocessing queue.	Zabbix internal	zabbix[preprocessing_queue]
Zabbix proxy: Number of processed numeric (unsigned) values per second	Statistics and availability of Zabbix write cache. Number of processed numeric (unsigned) values.	Zabbix internal	zabbix[wcache,values,uint] Preprocessing Change per second
Zabbix proxy: Values waiting to be sent	Number of values in the proxy history table waiting to be sent to the server.	Zabbix internal	zabbix[proxy_history]
Zabbix proxy: Required performance	Required performance of Zabbix proxy, in new values per second expected.	Zabbix internal	zabbix[requiredperformance]
Zabbix proxy: Uptime	Uptime of Zabbix proxy process in seconds.	Zabbix internal	zabbix[uptime]

Triggers

Name	Description	Expression	Severity
Zabbix proxy: More than 100 items having missing data for more than 10 minutes	zabbix[stats,{$IP},{$PORT},queue,10m] item is collecting data about how many items are missing data for more than 10 minutes.	`min(/Zabbix proxy health/zabbix[queue,10m],10m)>100`\|Warning	Manual close: Yes
Zabbix proxy: Utilization of data sender processes is high		`avg(/Zabbix proxy health/zabbix[process,data sender,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of availability manager processes is high		`avg(/Zabbix proxy health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of configuration syncer processes is high		`avg(/Zabbix proxy health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of discoverer processes is high		`avg(/Zabbix proxy health/zabbix[process,discoverer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discoverer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ODBC poller processes is high		`avg(/Zabbix proxy health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of history poller processes is high		`avg(/Zabbix proxy health/zabbix[process,history poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of history syncer processes is high		`avg(/Zabbix proxy health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of housekeeper processes is high		`avg(/Zabbix proxy health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of http poller processes is high		`avg(/Zabbix proxy health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of icmp pinger processes is high		`avg(/Zabbix proxy health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi manager processes is high		`avg(/Zabbix proxy health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of ipmi poller processes is high		`avg(/Zabbix proxy health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of java poller processes is high		`avg(/Zabbix proxy health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of poller processes is high		`avg(/Zabbix proxy health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing worker processes is high		`avg(/Zabbix proxy health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of preprocessing manager processes is high		`avg(/Zabbix proxy health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of self-monitoring processes is high		`avg(/Zabbix proxy health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of snmp trapper processes is high		`avg(/Zabbix proxy health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of task manager processes is high		`avg(/Zabbix proxy health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of trapper processes is high		`avg(/Zabbix proxy health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of unreachable poller processes is high		`avg(/Zabbix proxy health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"}`\|Average	Manual close: Yes
Zabbix proxy: Utilization of vmware collector processes is high		`avg(/Zabbix proxy health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"}`\|Average	Manual close: Yes
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the configuration cache	Consider increasing CacheSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Zabbix proxy: Version has changed	Zabbix proxy version has changed. Acknowledge to close the problem manually.	`last(/Zabbix proxy health/zabbix[version],#1)<>last(/Zabbix proxy health/zabbix[version],#2) and length(last(/Zabbix proxy health/zabbix[version]))>0`\|Info	Manual close: Yes
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the vmware cache	Consider increasing VMwareCacheSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history cache	Consider increasing HistoryCacheSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health/zabbix[wcache,history,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history index cache	Consider increasing HistoryIndexCacheSize in the zabbix_proxy.conf configuration file.	`max(/Zabbix proxy health/zabbix[wcache,index,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX}`\|Average	Manual close: Yes
Zabbix proxy: {HOST.NAME} has been restarted	Uptime is less than 10 minutes.	`last(/Zabbix proxy health/zabbix[uptime])<10m`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

module_zabbix_agent

View README Download JSON

Zabbix agent

Macros used

Name	Description	Default
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable. Works only for agents reachable from Zabbix server/proxy (passive mode).	`3m`

Items

Name	Description	Type	Key and additional info
Zabbix agent: Version of Zabbix agent running		Zabbix agent	agent.version Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix agent: Host name of Zabbix agent running		Zabbix agent	agent.hostname Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix agent: Zabbix agent ping	The agent always returns 1 for this item. It could be used in combination with nodata() for availability check.	Zabbix agent	agent.ping
Zabbix agent: Zabbix agent availability	Monitoring the availability status of the agent.	Zabbix internal	zabbix[host,agent,available]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix agent: Zabbix agent is not available	For passive only agents, host availability is used with {$AGENT.TIMEOUT} as time threshold.	`max(/Zabbix agent/zabbix[host,agent,available],{$AGENT.TIMEOUT})=0`\|Average	Manual close: Yes

Zabbix agent active

Macros used

Name	Description	Default
{$AGENT.NODATA_TIMEOUT}	No data timeout for active agents. Consider to keep it relatively high.	`30m`
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable.	`5m`

Items

Name	Description	Type	Key and additional info
Zabbix agent active: Version of Zabbix agent running		Zabbix agent (active)	agent.version Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix agent active: Host name of Zabbix agent running		Zabbix agent (active)	agent.hostname Preprocessing Discard unchanged with heartbeat: `1d`
Zabbix agent active: Zabbix agent ping	The agent always returns 1 for this item. It could be used in combination with nodata() for availability check.	Zabbix agent (active)	agent.ping
Zabbix agent active: Active agent availability	Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available	Zabbix internal	zabbix[host,active_agent,available]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Zabbix agent active: Zabbix agent is not available	For active agents, nodata() with agent.ping is used with {$AGENT.NODATA_TIMEOUT} as time threshold.	`nodata(/Zabbix agent active/agent.ping,{$AGENT.NODATA_TIMEOUT})=1`\|Average	Manual close: Yes
Zabbix agent active: Active checks are not available	Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time.	`min(/Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2`\|High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_wildfly_server_jmx

View README Download JSON

WildFly Server by JMX

Overview

Official JMX Template for WildFly server.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

WildFly 22.6.0

Configuration

Setup

Metrics are collected by JMX. This template works with standalone and domain instances.

Enable and configure JMX access to WildFly. See documentation for instructions.
Copy jboss-client.jar from /(wildfly,EAP,Jboss,AS)/bin/client in to directory /usr/share/zabbix-java-gateway/lib
Restart Zabbix Java gateway
Set the user name and password in host macros {$WILDFLY.USER} and {$WILDFLY.PASSWORD}. Depending on your server setup, you may need to specify a custom JMX scheme in macro {$WILDFLY.JMX.PROTOCOL} (default: remote+http)

Macros used

Name	Description	Default
{$WILDFLY.USER}		`zabbix`
{$WILDFLY.PASSWORD}		`zabbix`
{$WILDFLY.JMX.PROTOCOL}		`remote+http`
{$WILDFLY.DEPLOYMENT.MATCHES}	Filter of discoverable deployments	`.*`
{$WILDFLY.DEPLOYMENT.NOT_MATCHES}	Filter to exclude discovered deployments	`CHANGE_IF_NEEDED`
{$WILDFLY.CONN.USAGE.WARN.MAX}	The maximum connection usage percent for trigger expression.	`80`
{$WILDFLY.CONN.WAIT.MAX.WARN}	The maximum number of waiting connections for trigger expression.	`300`

Items

Name	Description	Type	Key and additional info
WildFly: Launch type	The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine.	JMX agent	jmx["jboss.as:management-root=server","launchType"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Name	For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain.	JMX agent	jmx["jboss.as:management-root=server","name"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Process type	The type of process represented by this root resource.	JMX agent	jmx["jboss.as:management-root=server","processType"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Runtime configuration state	The current persistent configuration state, one of starting, ok, reload-required, restart-required, stopping or stopped.	JMX agent	jmx["jboss.as:management-root=server","runtimeConfigurationState"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Server controller state	The current state of the server controller; either STARTING, RUNNING, RESTARTREQUIRED, RELOADREQUIRED or STOPPING.	JMX agent	jmx["jboss.as:management-root=server","serverState"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Version	The version of the WildFly Core based product release.	JMX agent	jmx["jboss.as:management-root=server","productVersion"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Uptime	WildFly server uptime.	JMX agent	jmx["java.lang:type=Runtime","Uptime"] Preprocessing Custom multiplier: `0.001`
WildFly: Transactions: Total, rate	The total number of transactions (top-level and nested) created per second.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfTransactions"] Preprocessing Change per second
WildFly: Transactions: Aborted, rate	The number of aborted (i.e. rolledback) transactions per second.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfAbortedTransactions"] Preprocessing Change per second
WildFly: Transactions: Application rollbacks, rate	The number of transactions that have been rolled back by application request. This includes those that timeout, since the timeout behavior is considered an attribute of the application configuration.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfApplicationRollbacks"] Preprocessing Change per second
WildFly: Transactions: Committed, rate	The number of committed transactions.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfCommittedTransactions"] Preprocessing Change per second
WildFly: Transactions: Heuristics, rate	The number of transactions which have terminated with heuristic outcomes.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfHeuristics"] Preprocessing Change per second
WildFly: Transactions: Current	The number of transactions that have begun but not yet terminated.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfInflightTransactions"]
WildFly: Transactions: Nested, rate	The total number of nested (sub) transactions created.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfNestedTransactions"] Preprocessing Change per second
WildFly: Transactions: ResourceRollbacks, rate	The number of transactions that rolled back due to resource (participant) failure.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfResourceRollbacks"] Preprocessing Change per second
WildFly: Transactions: System rollbacks, rate	The number of transactions that have been rolled back due to internal system errors.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfSystemRollbacks"] Preprocessing Change per second
WildFly: Transactions: Timed out, rate	The number of transactions that have rolled back due to timeout.	JMX agent	jmx["jboss.as:subsystem=transactions","numberOfTimedOutTransactions"] Preprocessing Change per second

Triggers

Name	Description	Expression	Severity
WildFly: Server needs to restart for configuration change.		`find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","runtimeConfigurationState"],,"like","ok")=0`\|Warning
WildFly: Server controller is not in RUNNING state		`find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","serverState"],,"like","running")=0`\|Warning	Depends on: WildFly: Server needs to restart for configuration change.
WildFly: Version has changed	WildFly version has changed. Acknowledge to close the problem manually.	`last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0`\|Info	Manual close: Yes
WildFly: Host has been restarted	Uptime is less than 10 minutes.	`last(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m`\|Info	Manual close: Yes
WildFly: Failed to fetch info data	Zabbix has not received data for items for the last 15 minutes	`nodata(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"],15m)=1`\|Warning

LLD rule Deployments discovery

Name	Description	Type	Key and additional info
Deployments discovery	Discovery deployments metrics.	JMX agent	jmx.get[beans,"jboss.as.expr:deployment=*"]

Item prototypes for Deployments discovery

Name	Description	Type	Key and additional info
WildFly deployment [{#DEPLOYMENT}]: Status	The current runtime status of a deployment. Possible status modes are OK, FAILED, and STOPPED. FAILED indicates a dependency is missing or a service could not start. STOPPED indicates that the deployment was not enabled or was manually stopped.	JMX agent	jmx["{#JMXOBJ}",status] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly deployment [{#DEPLOYMENT}]: Enabled	Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts).	JMX agent	jmx["{#JMXOBJ}",enabled] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
WildFly deployment [{#DEPLOYMENT}]: Managed	Indicates if the deployment is managed (aka uses the ContentRepository).	JMX agent	jmx["{#JMXOBJ}",managed] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
WildFly deployment [{#DEPLOYMENT}]: Persistent	Indicates if the deployment is managed (aka uses the ContentRepository).	JMX agent	jmx["{#JMXOBJ}",persistent] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
WildFly deployment [{#DEPLOYMENT}]: Enabled time	Indicates if the deployment is managed (aka uses the ContentRepository).	JMX agent	jmx["{#JMXOBJ}",enabledTime] Preprocessing Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Deployments discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly deployment [{#DEPLOYMENT}]: Deployment status has changed	Deployment status has changed. Acknowledge to close the problem manually.	`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status]))>0`\|Warning	Manual close: Yes

LLD rule JDBC metrics discovery

Name	Description	Type	Key and additional info
JDBC metrics discovery		JMX agent	jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=jdbc"]

Item prototypes for JDBC metrics discovery

Name	Description	Type	Key and additional info
WildFly {#JMXDATASOURCE}: Cache access, rate	The number of times that the statement cache was accessed per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheAccessCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Cache add, rate	The number of statements added to the statement cache per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheAddCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Cache current size	The number of prepared and callable statements currently cached in the statement cache.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheCurrentSize]
WildFly {#JMXDATASOURCE}: Cache delete, rate	The number of statements discarded from the cache per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheDeleteCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Cache hit, rate	The number of times that statements from the cache were used per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheHitCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Cache miss, rate	The number of times that a statement request could not be satisfied with a statement from the cache per second.	JMX agent	jmx["{#JMXOBJ}",PreparedStatementCacheMissCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Statistics enabled	Define whether runtime statistics are enabled or not.	JMX agent	jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`

Trigger prototypes for JDBC metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly {#JMXDATASOURCE}: JDBC monitoring statistic is not enabled		`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"])=0`\|Info

LLD rule Pools metrics discovery

Name	Description	Type	Key and additional info
Pools metrics discovery		JMX agent	jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=pool"]

Item prototypes for Pools metrics discovery

Name	Description	Type	Key and additional info
WildFly {#JMXDATASOURCE}: Connections: Active	The number of open connections.	JMX agent	jmx["{#JMXOBJ}",ActiveCount]
WildFly {#JMXDATASOURCE}: Connections: Available	The available count.	JMX agent	jmx["{#JMXOBJ}",AvailableCount]
WildFly {#JMXDATASOURCE}: Blocking time, avg	Average Blocking Time for pool.	JMX agent	jmx["{#JMXOBJ}",AverageBlockingTime]
WildFly {#JMXDATASOURCE}: Connections: Creating time, avg	The average time spent creating a physical connection.	JMX agent	jmx["{#JMXOBJ}",AverageCreationTime]
WildFly {#JMXDATASOURCE}: Connections: Get time, avg	The average time spent obtaining a physical connection.	JMX agent	jmx["{#JMXOBJ}",AverageGetTime]
WildFly {#JMXDATASOURCE}: Connections: Pool time, avg	The average time for a physical connection spent in the pool.	JMX agent	jmx["{#JMXOBJ}",AveragePoolTime]
WildFly {#JMXDATASOURCE}: Connections: Usage time, avg	The average time spent using a physical connection	JMX agent	jmx["{#JMXOBJ}",AverageUsageTime]
WildFly {#JMXDATASOURCE}: Connections: Blocking failure, rate	The number of failures trying to obtain a physical connection per second.	JMX agent	jmx["{#JMXOBJ}",BlockingFailureCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Connections: Created, rate	The created per second	JMX agent	jmx["{#JMXOBJ}",CreatedCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Connections: Destroyed, rate	The destroyed count.	JMX agent	jmx["{#JMXOBJ}",DestroyedCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Connections: Idle	The number of physical connections currently idle.	JMX agent	jmx["{#JMXOBJ}",IdleCount]
WildFly {#JMXDATASOURCE}: Connections: In use	The number of physical connections currently in use.	JMX agent	jmx["{#JMXOBJ}",InUseCount]
WildFly {#JMXDATASOURCE}: Connections: Used, max	The maximum number of connections used.	JMX agent	jmx["{#JMXOBJ}",MaxUsedCount]
WildFly {#JMXDATASOURCE}: Statistics enabled	Define whether runtime statistics are enabled or not.	JMX agent	jmx["{#JMXOBJ}",statisticsEnabled] Preprocessing Boolean to decimal Discard unchanged with heartbeat: `3h`
WildFly {#JMXDATASOURCE}: Connections: Timed out, rate	The timed out connections per second.	JMX agent	jmx["{#JMXOBJ}",TimedOut] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: Connections: Wait	The number of requests that had to wait to obtain a physical connection.	JMX agent	jmx["{#JMXOBJ}",WaitCount]
WildFly {#JMXDATASOURCE}: XA: Commit time, avg	The average time for a XAResource commit invocation.	JMX agent	jmx["{#JMXOBJ}",XACommitAverageTime]
WildFly {#JMXDATASOURCE}: XA: Commit, rate	The number of XAResource commit invocations per second.	JMX agent	jmx["{#JMXOBJ}",XACommitCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: XA: End time, avg	The average time for a XAResource end invocation.	JMX agent	jmx["{#JMXOBJ}",XAEndAverageTime]
WildFly {#JMXDATASOURCE}: XA: End, rate	The number of XAResource end invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAEndCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: XA: Forget time, avg	The average time for a XAResource forget invocation.	JMX agent	jmx["{#JMXOBJ}",XAForgetAverageTime]
WildFly {#JMXDATASOURCE}: XA: Forget, rate	The number of XAResource forget invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAForgetCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: XA: Prepare time, avg	The average time for a XAResource prepare invocation.	JMX agent	jmx["{#JMXOBJ}",XAPrepareAverageTime]
WildFly {#JMXDATASOURCE}: XA: Prepare, rate	The number of XAResource prepare invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAPrepareCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: XA: Recover time, avg	The average time for a XAResource recover invocation.	JMX agent	jmx["{#JMXOBJ}",XARecoverAverageTime]
WildFly {#JMXDATASOURCE}: XA: Recover, rate	The number of XAResource recover invocations per second.	JMX agent	jmx["{#JMXOBJ}",XARecoverCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: XA: Rollback time, avg	The average time for a XAResource rollback invocation.	JMX agent	jmx["{#JMXOBJ}",XARollbackAverageTime]
WildFly {#JMXDATASOURCE}: XA: Rollback, rate	The number of XAResource rollback invocations per second.	JMX agent	jmx["{#JMXOBJ}",XARollbackCount] Preprocessing Change per second
WildFly {#JMXDATASOURCE}: XA: Start time, avg	The average time for a XAResource start invocation.	JMX agent	jmx["{#JMXOBJ}",XAStartAverageTime]
WildFly {#JMXDATASOURCE}: XA: Start rate	The number of XAResource start invocations per second.	JMX agent	jmx["{#JMXOBJ}",XAStartCount] Preprocessing Change per second

Trigger prototypes for Pools metrics discovery

Name	Description	Expression
WildFly {#JMXDATASOURCE}: There are no active connections for 5m		`max(/WildFly Server by JMX/jmx["{#JMXOBJ}",ActiveCount],5m)=0`\|Warning
WildFly {#JMXDATASOURCE}: Connection usage is too high		`min(/WildFly Server by JMX/jmx["{#JMXOBJ}",InUseCount],5m)/last(/WildFly Server by JMX/jmx["{#JMXOBJ}",AvailableCount])*100>{$WILDFLY.CONN.USAGE.WARN.MAX}`\|High
WildFly {#JMXDATASOURCE}: Pools monitoring statistic is not enabled	Zabbix has not received data for items for the last 15 minutes	`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled])=0`\|Info
WildFly {#JMXDATASOURCE}: There are timeout connections		`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",TimedOut])>0`\|Warning
WildFly {#JMXDATASOURCE}: Too many waiting connections		`min(/WildFly Server by JMX/jmx["{#JMXOBJ}",WaitCount],5m)>{$WILDFLY.CONN.WAIT.MAX.WARN}`\|Warning

LLD rule Undertow metrics discovery

Name	Description	Type	Key and additional info
Undertow metrics discovery		JMX agent	jmx.get[beans,"jboss.as:subsystem=undertow,server=,http-listener="]

Item prototypes for Undertow metrics discovery

Name	Description	Type	Key and additional info
WildFly listener {#HTTP_LISTENER}: Errors, rate	The number of 500 responses that have been sent by this listener per second.	JMX agent	jmx["{#JMXOBJ}",errorCount] Preprocessing Change per second
WildFly listener {#HTTP_LISTENER}: Requests, rate	The number of requests this listener has served per second.	JMX agent	jmx["{#JMXOBJ}",requestCount] Preprocessing Change per second
WildFly listener {#HTTP_LISTENER}: Bytes sent, rate	The number of bytes that have been sent out on this listener per second.	JMX agent	jmx["{#JMXOBJ}",bytesSent] Preprocessing Change per second
WildFly listener {#HTTP_LISTENER}: Bytes received, rate	The number of bytes that have been received by this listener per second.	JMX agent	jmx["{#JMXOBJ}",bytesReceived] Preprocessing Change per second

Trigger prototypes for Undertow metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly listener {#HTTP_LISTENER}: There are 500 responses by this listener.		`last(/WildFly Server by JMX/jmx["{#JMXOBJ}",errorCount])>0`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_wildfly_domain_jmx

View README Download JSON

WildFly Domain by JMX

Overview

Official JMX Template for WildFly Domain Controller.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

WildFly 22.6.0

Configuration

Setup

Metrics are collected by JMX. This template works with Domain Controller.

Enable and configure JMX access to WildFly. See documentation for instructions.
Copy jboss-client.jar from /(wildfly,EAP,Jboss,AS)/bin/client in to directory /usr/share/zabbix-java-gateway/lib
Restart Zabbix Java gateway
Set the user name and password in host macros {$WILDFLY.USER} and {$WILDFLY.PASSWORD}. Depending on your server setup, you may need to specify a custom JMX scheme in macro {$WILDFLY.JMX.PROTOCOL} (default: remote+http)

Macros used

Name	Description	Default
{$WILDFLY.USER}		`zabbix`
{$WILDFLY.PASSWORD}		`zabbix`
{$WILDFLY.JMX.PROTOCOL}		`remote+http`
{$WILDFLY.DEPLOYMENT.MATCHES}	Filter of discoverable deployments	`.*`
{$WILDFLY.DEPLOYMENT.NOT_MATCHES}	Filter to exclude discovered deployments	`CHANGE_IF_NEEDED`
{$WILDFLY.SERVER.MATCHES}	Filter of discoverable servers	`.*`
{$WILDFLY.SERVER.NOT_MATCHES}	Filter to exclude discovered servers	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
WildFly: Launch type	The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine.	JMX agent	jmx["jboss.as:management-root=server","launchType"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Name	For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain	JMX agent	jmx["jboss.as:management-root=server","name"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Process type	The type of process represented by this root resource.	JMX agent	jmx["jboss.as:management-root=server","processType"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Version	The version of the WildFly Core based product release.	JMX agent	jmx["jboss.as:management-root=server","productVersion"] Preprocessing Discard unchanged with heartbeat: `3h`
WildFly: Uptime	WildFly server uptime.	JMX agent	jmx["java.lang:type=Runtime","Uptime"] Preprocessing Custom multiplier: `0.001`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
WildFly: Version has changed	WildFly version has changed. Acknowledge to close the problem manually.	`last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0`\|Info	Manual close: Yes
WildFly: Host has been restarted	Uptime is less than 10 minutes.	`last(/WildFly Domain by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m`\|Info	Manual close: Yes

LLD rule Deployments discovery

Name	Description	Type	Key and additional info
Deployments discovery	Discovery deployments metrics.	JMX agent	jmx.get[beans,"jboss.as.expr:deployment=,server-group="]

Item prototypes for Deployments discovery

Name Description Type Key and additional info

WildFly deployment [{#DEPLOYMENT}]: Enabled

Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts).

JMX agent

jmx["{#JMXOBJ}",enabled]

Preprocessing

Boolean to decimal
Discard unchanged with heartbeat: 3h

WildFly deployment [{#DEPLOYMENT}]: Managed

Indicates if the deployment is managed (aka uses the ContentRepository).

JMX agent

jmx["{#JMXOBJ}",managed]

Preprocessing

Boolean to decimal
Discard unchanged with heartbeat: 3h

LLD rule Servers discovery

Name	Description	Type	Key and additional info
Servers discovery	Discovery instances in domain.	JMX agent	jmx.get[beans,"jboss.as:host=master,server-config=*"]

Item prototypes for Servers discovery

Name Description Type Key and additional info

WildFly domain: Server {#SERVER}: Autostart

Whether or not this server should be started when the Host Controller starts.

JMX agent

jmx["{#JMXOBJ}",autoStart]

Preprocessing

Boolean to decimal
Discard unchanged with heartbeat: 3h

WildFly domain: Server {#SERVER}: Status

The current status of the server.

JMX agent

jmx["{#JMXOBJ}",status]

Preprocessing

Discard unchanged with heartbeat: 3h

WildFly domain: Server {#SERVER}: Server group

The name of a server group from the domain model.

JMX agent

jmx["{#JMXOBJ}",group]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for Servers discovery

Name	Description	Expression	Severity	Dependencies and additional info
WildFly domain: Server {#SERVER}: Server status has changed	Server status has changed. Acknowledge to close the problem manually.	`last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status]))>0`\|Warning	Manual close: Yes
WildFly domain: Server {#SERVER}: Server group has changed	Server group has changed. Acknowledge to close the problem manually.	`last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group]))>0`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_vmware_fqdn

View README Download JSON

VMware FQDN

Overview

This template is designed for the effortless deployment of both VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.

The "VMware Hypervisor" and "VMware Guest" templates are used by discovery and normally should not be manually linked to a host. For additional information please check https://www.zabbix.com/documentation/6.4/manual/vm_monitoring

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

VMware 6.0

Configuration

Setup

Compile Zabbix server with the required options (--with-libxml2 and --with-libcurl)
Set the StartVMwareCollectors option in Zabbix server configuration file to "1" or more
Create a new host
Set the host macros (on the host or template level) required for VMware authentication:

{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}

Link the template to host created earlier

Note: To enable discovery of hardware sensors of VMware Hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY} to the value true on the discovered host level.

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk)
{$VMWARE.USERNAME}	VMware service user name
{$VMWARE.PASSWORD}	VMware service {$USERNAME} user password
{$VMWARE.HV.SENSOR.DISCOVERY}	Set "true"/"false" to enable or disable monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to allow in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`

Items

Name Description Type Key and additional info

VMware: Event log

Collect VMware event log. See also: https://www.zabbix.com/documentation/6.4/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords

Simple check

vmware.eventlog[{$VMWARE.URL},skip]

VMware: Full name

VMware service full name.

Simple check

vmware.fullname[{$VMWARE.URL}]

Preprocessing

Discard unchanged with heartbeat: 1d

VMware: Version

VMware service version.

Simple check

vmware.version[{$VMWARE.URL}]

Preprocessing

Discard unchanged with heartbeat: 1d

LLD rule Discover VMware clusters

Name	Description	Type	Key and additional info
Discover VMware clusters	Discovery of clusters	Simple check	vmware.cluster.discovery[{$VMWARE.URL}]

Item prototypes for Discover VMware clusters

Name	Description	Type	Key and additional info
VMware: Status of "{#CLUSTER.NAME}" cluster	VMware cluster status.	Simple check	vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}]

Trigger prototypes for Discover VMware clusters

Name	Description	Expression	Severity	Dependencies and additional info
VMware: The {#CLUSTER.NAME} status is Red	A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html	`last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3`\|High
VMware: The {#CLUSTER.NAME} status is Yellow	A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html	`last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2`\|Average	Depends on: VMware: The {#CLUSTER.NAME} status is Red

LLD rule Discover VMware datastores

Name	Description	Type	Key and additional info
Discover VMware datastores		Simple check	vmware.datastore.discovery[{$VMWARE.URL}]

Item prototypes for Discover VMware datastores

Name	Description	Type	Key and additional info
VMware: Average read latency of the datastore {#DATASTORE}	Amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.datastore.read[{$VMWARE.URL},{#DATASTORE},latency]
VMware: Free space on datastore {#DATASTORE} (percentage)	VMware datastore free space in percentage from total.	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree]
VMware: Total size of datastore {#DATASTORE}	VMware datastore space in bytes.	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE}]
VMware: Average write latency of the datastore {#DATASTORE}	Amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.datastore.write[{$VMWARE.URL},{#DATASTORE},latency]

Trigger prototypes for Discover VMware datastores

Name	Description	Expression	Severity	Dependencies and additional info
VMware: {#DATASTORE}: Free space is critically low	Datastore free space has fallen below critical threshold.	`last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT}`\|High
VMware: {#DATASTORE}: Free space is low	Datastore free space has fallen below warning threshold.	`last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware: {#DATASTORE}: Free space is critically low

LLD rule Discover VMware hypervisors

Name	Description	Type	Key and additional info
Discover VMware hypervisors	Discovery of hypervisors.	Simple check	vmware.hv.discovery[{$VMWARE.URL}]

LLD rule Discover VMware VMs FQDN

Name	Description	Type	Key and additional info
Discover VMware VMs FQDN	Discovery of guest virtual machines.	Simple check	vmware.vm.discovery[{$VMWARE.URL}]

VMware Guest

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk)
{$VMWARE.USERNAME}	VMware service user name
{$VMWARE.PASSWORD}	VMware service {$USERNAME} user password

Items

Name	Description	Type	Key and additional info
VMware: Cluster name	Cluster name of the guest VM.	Simple check	vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Number of virtual CPUs	Number of virtual CPUs assigned to the guest.	Simple check	vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU ready	Time that the virtual machine was ready, but could not get scheduled to run on the physical CPU during last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds)	Simple check	vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU usage	Current upper-bound on CPU usage. The upper-bound is based on the host the virtual machine is current running on, as well as limits configured on the virtual machine itself or any parent resource pool. Valid while the virtual machine is running.	Simple check	vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Datacenter name	Datacenter name of the guest VM.	Simple check	vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Hypervisor name	Hypervisor name of the guest VM.	Simple check	vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver.	Simple check	vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Compressed memory	The amount of memory currently in the compression cache for this VM.	Simple check	vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Private memory	Amount of memory backed by host memory and not being shared.	Simple check	vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Shared memory	The amount of guest physical memory shared through transparent page sharing.	Simple check	vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Swapped memory	The amount of guest physical memory swapped out to the VM's swap device by ESX.	Simple check	vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Guest memory usage	The amount of guest physical memory that is being used by the VM.	Simple check	vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Host memory usage	The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs.	Simple check	vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Memory size	Total size of configured memory.	Simple check	vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Power state	The current power state of the virtual machine.	Simple check	vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1h`
VMware: Committed storage space	Total storage space, in bytes, committed to this virtual machine across all datastores.	Simple check	vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Uncommitted storage space	Additional storage space, in bytes, potentially used by this virtual machine on all datastores.	Simple check	vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Unshared storage space	Total storage space, in bytes, occupied by the virtual machine across all datastores, that is not shared with any other virtual machine.	Simple check	vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Uptime	System uptime.	Simple check	vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Guest memory swapped	Amount of guest physical memory that is swapped out to the swap space.	Simple check	vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Host memory consumed	Amount of host physical memory consumed for backing up guest physical memory pages.	Simple check	vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Host memory usage in percents	Percentage of host physical memory that has been consumed.	Simple check	vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU usage in percents	CPU usage as a percentage during the interval.	Simple check	vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU latency in percents	Percentage of time the virtual machine is unable to run because it is contending for access to the physical CPU(s).	Simple check	vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU readiness latency in percents	Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU.	Simple check	vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU swap-in latency in percents	Percentage of CPU time spent waiting for swap-in.	Simple check	vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Uptime of guest OS	Total time elapsed since the last operating system boot-up (in seconds).	Simple check	vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
VMware: VM has been restarted	Uptime is less than 10 minutes.	`last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}])<10m`\|Warning	Manual close: Yes

LLD rule Network device discovery

Name	Description	Type	Key and additional info
Network device discovery	Discovery of all network devices.	Simple check	vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Network device discovery

Name	Description	Type	Key and additional info
VMware: Number of bytes received on interface {#IFDESC}	VMware virtual machine network interface input statistics (bytes per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
VMware: Number of packets received on interface {#IFDESC}	VMware virtual machine network interface input statistics (packets per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
VMware: Number of bytes transmitted on interface {#IFDESC}	VMware virtual machine network interface output statistics (bytes per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
VMware: Number of packets transmitted on interface {#IFDESC}	VMware virtual machine network interface output statistics (packets per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
VMware: Network utilization on interface {#IFDESC}	VMware virtual machine network utilization (combined transmit-rates and receive-rates) during the interval.	Simple check	vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] Preprocessing Custom multiplier: `1024`

LLD rule Disk device discovery

Name	Description	Type	Key and additional info
Disk device discovery	Discovery of all disk devices.	Simple check	vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Disk device discovery

Name	Description	Type	Key and additional info
VMware: Average number of bytes read from the disk {#DISKDESC}	VMware virtual machine disk device read statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
VMware: Average number of reads from the disk {#DISKDESC}	VMware virtual machine disk device read statistics (operations per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
VMware: Average number of bytes written to the disk {#DISKDESC}	VMware virtual machine disk device write statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
VMware: Average number of writes to the disk {#DISKDESC}	VMware virtual machine disk device write statistics (operations per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
VMware: Average number of outstanding read requests to the disk {#DISKDESC}	Average number of outstanding read requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
VMware: Average number of outstanding write requests to the disk {#DISKDESC}	Average number of outstanding write requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
VMware: Average write latency to the disk {#DISKDESC}	The average time a write to the virtual disk takes.	Simple check	vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
VMware: Average read latency to the disk {#DISKDESC}	The average time a read from the virtual disk takes.	Simple check	vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]

LLD rule Mounted filesystem discovery

Name	Description	Type	Key and additional info
Mounted filesystem discovery	Discovery of all guest file systems.	Simple check	vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Mounted filesystem discovery

Name	Description	Type	Key and additional info
VMware: Free disk space on {#FSNAME}	VMware virtual machine file system statistics (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free]
VMware: Free disk space on {#FSNAME} (percentage)	VMware virtual machine file system statistics (percentages).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree]
VMware: Total disk space on {#FSNAME}	VMware virtual machine total disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Used disk space on {#FSNAME}	VMware virtual machine used disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used]

VMware Hypervisor

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service {$USERNAME} user password.
{$VMWARE.HV.SENSOR.DISCOVERY}	Set "true"/"false" to enable or disable monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to allow in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.HV.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.HV.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`

Items

Name	Description	Type	Key and additional info
VMware: Hypervisor ping	Checks if the hypervisor is running and accepting ICMP pings.	Simple check	icmpping[] Preprocessing Discard unchanged with heartbeat: `10m`
VMware: Cluster name	Cluster name of the guest VM.	Simple check	vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU usage	Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected.	Simple check	vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU usage in percents	CPU usage as a percentage during the interval.	Simple check	vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU utilization	CPU usage as a percentage during the interval depends on power management or HT.	Simple check	vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Power usage	Current power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Power usage maximum allowed	Maximum allowed power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing Discard unchanged with heartbeat: `6h`
VMware: Datacenter name	Datacenter name of the hypervisor.	Simple check	vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Full name	The complete product name, including the version information.	Simple check	vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU frequency	The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and number of cores is approximately equal to the sum of the MHz for all the individual cores on the host.	Simple check	vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU model	The CPU model.	Simple check	vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU cores	Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package.	Simple check	vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU threads	Number of physical CPU threads on the host.	Simple check	vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Total memory	The physical memory size.	Simple check	vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Model	The system model identification.	Simple check	vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Bios UUID	The hardware BIOS identification.	Simple check	vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Vendor	The hardware vendor identification.	Simple check	vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs.	Simple check	vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Used memory	Physical memory usage on the host.	Simple check	vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Number of bytes received	VMware hypervisor network input statistics (bytes per second).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
VMware: Number of bytes transmitted	VMware hypervisor network output statistics (bytes per second).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
VMware: Overall status	The overall alarm status of the host: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem.	Simple check	vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Uptime	System uptime.	Simple check	vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Version	Dot-separated version string.	Simple check	vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Number of guest VMs	Number of guest virtual machines.	Simple check	vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Get sensors	Master item for sensors data.	Simple check	vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Triggers

Name	Description	Expression	Severity
VMware: Hypervisor is down	The service is unavailable or does not accept ICMP ping.	`last(/VMware Hypervisor/icmpping[])=0`\|Average	Manual close: Yes
VMware: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3`\|High
VMware: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might become overloaded soon.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2`\|Average	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red
VMware: Hypervisor has been restarted	Uptime is less than 10 minutes.	`last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m`\|Warning	Manual close: Yes

LLD rule Datastore discovery

Name	Description	Type	Key and additional info
Datastore discovery		Simple check	vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Item prototypes for Datastore discovery

Name	Description	Type	Key and additional info
VMware: Average read latency of the datastore {#DATASTORE}	Average amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency]
VMware: Free space on datastore {#DATASTORE} (percentage)	VMware datastore free space in percentage from total.	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree]
VMware: Total size of datastore {#DATASTORE}	VMware datastore space in bytes.	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}]
VMware: Average write latency of the datastore {#DATASTORE}	Average amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency]
VMware: Multipath count for datastore {#DATASTORE}	Number of available datastore paths.	Simple check	vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}]

Trigger prototypes for Datastore discovery

Name	Description	Expression	Severity
VMware: {#DATASTORE}: Free space is critically low	Datastore free space has fallen below critical threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT}`\|High
VMware: {#DATASTORE}: Free space is low	Datastore free space has fallen below warning threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware: {#DATASTORE}: Free space is critically low
VMware: The multipath count has been changed	The number of available datastore paths less than registered ({#MULTIPATH.COUNT}).	`last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}])<{#MULTIPATH.COUNT}`\|Average	Manual close: Yes

LLD rule Healthcheck discovery

Name Description Type Key and additional info

Healthcheck discovery

VMware Rollup Health State sensor discovery.

Dependent item

vmware.hv.healthcheck.discovery

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Set value to: []
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Healthcheck discovery

Name Description Type Key and additional info

VMware: Health state rollup

The host health state rollup sensor value: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem.

Dependent item

vmware.hv.sensor.health.state[{#SINGLETON}]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for Healthcheck discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Red"`\|High	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red
VMware: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might become overloaded soon.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Yellow"`\|Average	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red VMware: The {$VMWARE.HV.UUID} health is Yellow VMware: The {$VMWARE.HV.UUID} health is Red

LLD rule Sensor discovery

Name Description Type Key and additional info

Sensor discovery

VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system.

Dependent item

vmware.hv.sensors.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Sensor discovery

Name Description Type Key and additional info

VMware: Sensor [{#NAME}] health state

VMware hardware sensor health state. One of the following:

- Unknown

- Green

- Yellow

- Red

Dependent item

vmware.hv.sensor.state["{#NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Trigger prototypes for Sensor discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: Sensor [{#NAME}] health state is Red	One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3`\|High	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red
VMware: Sensor [{#NAME}] health state is Yellow	One or more components in the appliance might become overloaded soon.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2`\|Average	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red VMware: The {$VMWARE.HV.UUID} health is Yellow VMware: Sensor [{#NAME}] health state is Red

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_vmware

View README Download JSON

VMware

Overview

This template is designed for the effortless deployment of both VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

VMware 6.0

Configuration

Setup

Compile Zabbix server with the required options (--with-libxml2 and --with-libcurl)
Set the StartVMwareCollectors option in Zabbix server configuration file to "1" or more
Create a new host
Set the host macros (on the host or template level) required for VMware authentication:

{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}

Link the template to host created earlier

Note: To enable discovery of hardware sensors of VMware Hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY} to the value true on the discovered host level.

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk)
{$VMWARE.USERNAME}	VMware service user name
{$VMWARE.PASSWORD}	VMware service {$USERNAME} user password
{$VMWARE.HV.SENSOR.DISCOVERY}	Set "true"/"false" to enable or disable monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to allow in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`

Items

Name Description Type Key and additional info

VMware: Event log

Collect VMware event log. See also: https://www.zabbix.com/documentation/6.4/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords

Simple check

vmware.eventlog[{$VMWARE.URL},skip]

VMware: Full name

VMware service full name.

Simple check

vmware.fullname[{$VMWARE.URL}]

Preprocessing

Discard unchanged with heartbeat: 1d

VMware: Version

VMware service version.

Simple check

vmware.version[{$VMWARE.URL}]

Preprocessing

Discard unchanged with heartbeat: 1d

LLD rule Discover VMware clusters

Name	Description	Type	Key and additional info
Discover VMware clusters	Discovery of clusters	Simple check	vmware.cluster.discovery[{$VMWARE.URL}]

Item prototypes for Discover VMware clusters

Name	Description	Type	Key and additional info
VMware: Status of "{#CLUSTER.NAME}" cluster	VMware cluster status.	Simple check	vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}]

Trigger prototypes for Discover VMware clusters

Name	Description	Expression	Severity	Dependencies and additional info
VMware: The {#CLUSTER.NAME} status is Red	A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html	`last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3`\|High
VMware: The {#CLUSTER.NAME} status is Yellow	A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html	`last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2`\|Average	Depends on: VMware: The {#CLUSTER.NAME} status is Red

LLD rule Discover VMware datastores

Name	Description	Type	Key and additional info
Discover VMware datastores		Simple check	vmware.datastore.discovery[{$VMWARE.URL}]

Item prototypes for Discover VMware datastores

Name	Description	Type	Key and additional info
VMware: Average read latency of the datastore {#DATASTORE}	Amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.datastore.read[{$VMWARE.URL},{#DATASTORE},latency]
VMware: Free space on datastore {#DATASTORE} (percentage)	VMware datastore free space in percentage from total.	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree]
VMware: Total size of datastore {#DATASTORE}	VMware datastore space in bytes.	Simple check	vmware.datastore.size[{$VMWARE.URL},{#DATASTORE}]
VMware: Average write latency of the datastore {#DATASTORE}	Amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.datastore.write[{$VMWARE.URL},{#DATASTORE},latency]

Trigger prototypes for Discover VMware datastores

Name	Description	Expression	Severity	Dependencies and additional info
VMware: {#DATASTORE}: Free space is critically low	Datastore free space has fallen below critical threshold.	`last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT}`\|High
VMware: {#DATASTORE}: Free space is low	Datastore free space has fallen below warning threshold.	`last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware: {#DATASTORE}: Free space is critically low

LLD rule Discover VMware hypervisors

Name	Description	Type	Key and additional info
Discover VMware hypervisors	Discovery of hypervisors.	Simple check	vmware.hv.discovery[{$VMWARE.URL}]

LLD rule Discover VMware VMs

Name	Description	Type	Key and additional info
Discover VMware VMs	Discovery of guest virtual machines.	Simple check	vmware.vm.discovery[{$VMWARE.URL}]

VMware Guest

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk)
{$VMWARE.USERNAME}	VMware service user name
{$VMWARE.PASSWORD}	VMware service {$USERNAME} user password

Items

Name	Description	Type	Key and additional info
VMware: Cluster name	Cluster name of the guest VM.	Simple check	vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Number of virtual CPUs	Number of virtual CPUs assigned to the guest.	Simple check	vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU ready	Time that the virtual machine was ready, but could not get scheduled to run on the physical CPU during last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds)	Simple check	vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU usage	Current upper-bound on CPU usage. The upper-bound is based on the host the virtual machine is current running on, as well as limits configured on the virtual machine itself or any parent resource pool. Valid while the virtual machine is running.	Simple check	vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Datacenter name	Datacenter name of the guest VM.	Simple check	vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Hypervisor name	Hypervisor name of the guest VM.	Simple check	vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver.	Simple check	vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Compressed memory	The amount of memory currently in the compression cache for this VM.	Simple check	vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Private memory	Amount of memory backed by host memory and not being shared.	Simple check	vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Shared memory	The amount of guest physical memory shared through transparent page sharing.	Simple check	vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Swapped memory	The amount of guest physical memory swapped out to the VM's swap device by ESX.	Simple check	vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Guest memory usage	The amount of guest physical memory that is being used by the VM.	Simple check	vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Host memory usage	The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs.	Simple check	vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Memory size	Total size of configured memory.	Simple check	vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Power state	The current power state of the virtual machine.	Simple check	vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing Discard unchanged with heartbeat: `1h`
VMware: Committed storage space	Total storage space, in bytes, committed to this virtual machine across all datastores.	Simple check	vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Uncommitted storage space	Additional storage space, in bytes, potentially used by this virtual machine on all datastores.	Simple check	vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Unshared storage space	Total storage space, in bytes, occupied by the virtual machine across all datastores, that is not shared with any other virtual machine.	Simple check	vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Uptime	System uptime.	Simple check	vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Guest memory swapped	Amount of guest physical memory that is swapped out to the swap space.	Simple check	vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Host memory consumed	Amount of host physical memory consumed for backing up guest physical memory pages.	Simple check	vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Host memory usage in percents	Percentage of host physical memory that has been consumed.	Simple check	vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU usage in percents	CPU usage as a percentage during the interval.	Simple check	vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU latency in percents	Percentage of time the virtual machine is unable to run because it is contending for access to the physical CPU(s).	Simple check	vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU readiness latency in percents	Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU.	Simple check	vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: CPU swap-in latency in percents	Percentage of CPU time spent waiting for swap-in.	Simple check	vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}]
VMware: Uptime of guest OS	Total time elapsed since the last operating system boot-up (in seconds).	Simple check	vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
VMware: VM has been restarted	Uptime is less than 10 minutes.	`last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}])<10m`\|Warning	Manual close: Yes

LLD rule Network device discovery

Name	Description	Type	Key and additional info
Network device discovery	Discovery of all network devices.	Simple check	vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Network device discovery

Name	Description	Type	Key and additional info
VMware: Number of bytes received on interface {#IFDESC}	VMware virtual machine network interface input statistics (bytes per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
VMware: Number of packets received on interface {#IFDESC}	VMware virtual machine network interface input statistics (packets per second).	Simple check	vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
VMware: Number of bytes transmitted on interface {#IFDESC}	VMware virtual machine network interface output statistics (bytes per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps]
VMware: Number of packets transmitted on interface {#IFDESC}	VMware virtual machine network interface output statistics (packets per second).	Simple check	vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps]
VMware: Network utilization on interface {#IFDESC}	VMware virtual machine network utilization (combined transmit-rates and receive-rates) during the interval.	Simple check	vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] Preprocessing Custom multiplier: `1024`

LLD rule Disk device discovery

Name	Description	Type	Key and additional info
Disk device discovery	Discovery of all disk devices.	Simple check	vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Disk device discovery

Name	Description	Type	Key and additional info
VMware: Average number of bytes read from the disk {#DISKDESC}	VMware virtual machine disk device read statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
VMware: Average number of reads from the disk {#DISKDESC}	VMware virtual machine disk device read statistics (operations per second).	Simple check	vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
VMware: Average number of bytes written to the disk {#DISKDESC}	VMware virtual machine disk device write statistics (bytes per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps]
VMware: Average number of writes to the disk {#DISKDESC}	VMware virtual machine disk device write statistics (operations per second).	Simple check	vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops]
VMware: Average number of outstanding read requests to the disk {#DISKDESC}	Average number of outstanding read requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
VMware: Average number of outstanding write requests to the disk {#DISKDESC}	Average number of outstanding write requests to the virtual disk during the collection interval.	Simple check	vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
VMware: Average write latency to the disk {#DISKDESC}	The average time a write to the virtual disk takes.	Simple check	vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]
VMware: Average read latency to the disk {#DISKDESC}	The average time a read from the virtual disk takes.	Simple check	vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}]

LLD rule Mounted filesystem discovery

Name	Description	Type	Key and additional info
Mounted filesystem discovery	Discovery of all guest file systems.	Simple check	vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}]

Item prototypes for Mounted filesystem discovery

Name	Description	Type	Key and additional info
VMware: Free disk space on {#FSNAME}	VMware virtual machine file system statistics (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free]
VMware: Free disk space on {#FSNAME} (percentage)	VMware virtual machine file system statistics (percentages).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree]
VMware: Total disk space on {#FSNAME}	VMware virtual machine total disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Used disk space on {#FSNAME}	VMware virtual machine used disk space (bytes).	Simple check	vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used]

VMware Hypervisor

Macros used

Name	Description	Default
{$VMWARE.URL}	VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk).
{$VMWARE.USERNAME}	VMware service user name.
{$VMWARE.PASSWORD}	VMware service {$USERNAME} user password.
{$VMWARE.HV.SENSOR.DISCOVERY}	Set "true"/"false" to enable or disable monitoring of hardware sensors.	`false`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES}	Sets the regex string of hardware sensor names to allow in discovery.	`.*`
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of hardware sensor names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$VMWARE.HV.DATASTORE.SPACE.CRIT}	The critical threshold of the datastore free space.	`10`
{$VMWARE.HV.DATASTORE.SPACE.WARN}	The warning threshold of the datastore free space.	`20`

Items

Name	Description	Type	Key and additional info
VMware: Hypervisor ping	Checks if the hypervisor is running and accepting ICMP pings.	Simple check	icmpping[] Preprocessing Discard unchanged with heartbeat: `10m`
VMware: Cluster name	Cluster name of the guest VM.	Simple check	vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU usage	Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected.	Simple check	vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU usage in percents	CPU usage as a percentage during the interval.	Simple check	vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU utilization	CPU usage as a percentage during the interval depends on power management or HT.	Simple check	vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Power usage	Current power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Power usage maximum allowed	Maximum allowed power usage.	Simple check	vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing Discard unchanged with heartbeat: `6h`
VMware: Datacenter name	Datacenter name of the hypervisor.	Simple check	vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: Full name	The complete product name, including the version information.	Simple check	vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU frequency	The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and number of cores is approximately equal to the sum of the MHz for all the individual cores on the host.	Simple check	vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU model	The CPU model.	Simple check	vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: CPU cores	Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package.	Simple check	vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing Discard unchanged with heartbeat: `1d`
VMware: CPU threads	Number of physical CPU threads on the host.	Simple check	vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Total memory	The physical memory size.	Simple check	vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Model	The system model identification.	Simple check	vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Bios UUID	The hardware BIOS identification.	Simple check	vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Vendor	The hardware vendor identification.	Simple check	vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Ballooned memory	The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs.	Simple check	vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Used memory	Physical memory usage on the host.	Simple check	vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Number of bytes received	VMware hypervisor network input statistics (bytes per second).	Simple check	vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
VMware: Number of bytes transmitted	VMware hypervisor network output statistics (bytes per second).	Simple check	vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps]
VMware: Overall status	The overall alarm status of the host: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem.	Simple check	vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Uptime	System uptime.	Simple check	vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Version	Dot-separated version string.	Simple check	vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Number of guest VMs	Number of guest virtual machines.	Simple check	vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}]
VMware: Get sensors	Master item for sensors data.	Simple check	vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Triggers

Name	Description	Expression	Severity
VMware: Hypervisor is down	The service is unavailable or does not accept ICMP ping.	`last(/VMware Hypervisor/icmpping[])=0`\|Average	Manual close: Yes
VMware: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3`\|High
VMware: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might become overloaded soon.	`last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2`\|Average	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red
VMware: Hypervisor has been restarted	Uptime is less than 10 minutes.	`last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m`\|Warning	Manual close: Yes

LLD rule Datastore discovery

Name	Description	Type	Key and additional info
Datastore discovery		Simple check	vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}]

Item prototypes for Datastore discovery

Name	Description	Type	Key and additional info
VMware: Average read latency of the datastore {#DATASTORE}	Average amount of time for a read operation from the datastore (milliseconds).	Simple check	vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency]
VMware: Free space on datastore {#DATASTORE} (percentage)	VMware datastore free space in percentage from total.	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree]
VMware: Total size of datastore {#DATASTORE}	VMware datastore space in bytes.	Simple check	vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}]
VMware: Average write latency of the datastore {#DATASTORE}	Average amount of time for a write operation to the datastore (milliseconds).	Simple check	vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency]
VMware: Multipath count for datastore {#DATASTORE}	Number of available datastore paths.	Simple check	vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}]

Trigger prototypes for Datastore discovery

Name	Description	Expression	Severity
VMware: {#DATASTORE}: Free space is critically low	Datastore free space has fallen below critical threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT}`\|High
VMware: {#DATASTORE}: Free space is low	Datastore free space has fallen below warning threshold.	`last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN}`\|Warning	Depends on: VMware: {#DATASTORE}: Free space is critically low
VMware: The multipath count has been changed	The number of available datastore paths less than registered ({#MULTIPATH.COUNT}).	`last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}])<{#MULTIPATH.COUNT}`\|Average	Manual close: Yes

LLD rule Healthcheck discovery

Name Description Type Key and additional info

Healthcheck discovery

VMware Rollup Health State sensor discovery.

Dependent item

vmware.hv.healthcheck.discovery

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Set value to: []
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Healthcheck discovery

Name Description Type Key and additional info

VMware: Health state rollup

The host health state rollup sensor value: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem.

Dependent item

vmware.hv.sensor.health.state[{#SINGLETON}]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for Healthcheck discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: The {$VMWARE.HV.UUID} health is Red	One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Red"`\|High	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red
VMware: The {$VMWARE.HV.UUID} health is Yellow	One or more components in the appliance might become overloaded soon.	`last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Yellow"`\|Average	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red VMware: The {$VMWARE.HV.UUID} health is Yellow VMware: The {$VMWARE.HV.UUID} health is Red

LLD rule Sensor discovery

Name Description Type Key and additional info

Sensor discovery

VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system.

Dependent item

vmware.hv.sensors.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Sensor discovery

Name Description Type Key and additional info

VMware: Sensor [{#NAME}] health state

VMware hardware sensor health state. One of the following:

- Unknown

- Green

- Yellow

- Red

Dependent item

vmware.hv.sensor.state["{#NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Trigger prototypes for Sensor discovery

Name	Description	Expression	Severity	Dependencies and additional info
VMware: Sensor [{#NAME}] health state is Red	One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3`\|High	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red
VMware: Sensor [{#NAME}] health state is Yellow	One or more components in the appliance might become overloaded soon.	`last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2`\|Average	Depends on: VMware: The {$VMWARE.HV.UUID} health is Red VMware: The {$VMWARE.HV.UUID} health is Yellow VMware: Sensor [{#NAME}] health state is Red

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

veeam_enterprise_manager_http

View README Download JSON

Veeam Backup Enterprise Manager by HTTP

Overview

It works without any external scripts and uses the script item.

NOTE: Veeam Backup Enterprise Manager REST API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:

Veeam Universal License (VUL) editions:

Foundation
Advanced
Premium

Veeam Socket License editions:

Enterprise Socket
Enterprise Plus Socket

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Veeam Backup and Replication, version 11.0

Configuration

Setup

Create a user to monitor the service, or use an existing read-only account. Similarly to the user authentication in the Veeam Backup Enterprise Manager Web UI, the client authentication in the REST API dictates which operations a client is allowed to perform when working with the REST API. That is, if the client is authenticated using an account that does not have enough permissions to perform some actions, it will not be able to execute them. You can also obtain the collected jobs if you are logged in under an account having only Portal Administrator role. > See Veeam Help Center for more details.
Link the template to a host.
Configure the following macros: {$VEEAM.MANAGER.API.URL}, {$VEEAM.MANAGER.USER}, {$VEEAM.MANAGER.PASSWORD}.

Macros used

Name	Description	Default
{$VEEAM.MANAGER.API.URL}	Veeam Backup Enterprise Manager API endpoint is a URL in the format: `<scheme>://<host>:<port>`.	`https://localhost:9398`
{$VEEAM.MANAGER.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$VEEAM.MANAGER.PASSWORD}	The `password` of the Veeam Backup Enterprise Manager account.
{$VEEAM.MANAGER.USER}	The `user name` of the Veeam Backup Enterprise Manager account .
{$VEEAM.MANAGER.DATA.TIMEOUT}	A response timeout for API.	`10`
{$BACKUP.TYPE.MATCHES}	This macro is used in backup discovery rule.	`.*`
{$BACKUP.TYPE.NOT_MATCHES}	This macro is used in backup discovery rule.	`CHANGE_IF_NEEDED`
{$BACKUP.NAME.MATCHES}	This macro is used in backup discovery rule.	`.*`
{$BACKUP.NAME.NOT_MATCHES}	This macro is used in backup discovery rule.	`CHANGE_IF_NEEDED`
{$VEEAM.MANAGER.JOB.MAX.WARN}	The maximum score of warning jobs (for a trigger expression).	`10`
{$VEEAM.MANAGER.JOB.MAX.FAIL}	The maximum score of failed jobs (for a trigger expression).	`5`

Items

Name	Description	Type	Key and additional info
Veeam Manager: Get metrics	The result of API requests is expressed in the JSON.	Script	veeam.manager.get.metrics
Veeam Manager: Get errors	The errors from API requests.	Dependent item	veeam.manager.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `1h`
Veeam Manager: Running Jobs	Informs about the running jobs.	Dependent item	veeam.manager.running.jobs Preprocessing JSON Path: `$.JobStatistics.RunningJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Veeam Manager: Scheduled Jobs	Informs about the scheduled jobs.	Dependent item	veeam.manager.scheduled.jobs Preprocessing JSON Path: `$.JobStatistics.ScheduledJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Veeam Manager: Scheduled Backup Jobs	Informs about the scheduled backup jobs.	Dependent item	veeam.manager.scheduled.backup.jobs Preprocessing JSON Path: `$.JobStatistics.ScheduledBackupJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Veeam Manager: Scheduled Replica Jobs	Informs about the scheduled replica jobs.	Dependent item	veeam.manager.scheduled.replica.jobs Preprocessing JSON Path: `$.JobStatistics.ScheduledReplicaJobs` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Veeam Manager: Total Job Runs	Informs about the total job runs.	Dependent item	veeam.manager.scheduled.total.jobs Preprocessing JSON Path: `$.JobStatistics.TotalJobRuns` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Veeam Manager: Warnings Job Runs	Informs about the warning job runs.	Dependent item	veeam.manager.warning.jobs Preprocessing JSON Path: `$.JobStatistics.WarningsJobRuns` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Veeam Manager: Failed Job Runs	Informs about the failed job runs.	Dependent item	veeam.manager.failed.jobs Preprocessing JSON Path: `$.JobStatistics.FailedJobRuns` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Veeam Manager: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.get.errors))>0`\|Average
Veeam Manager: Warning job runs is too high		`last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.warning.jobs)>{$VEEAM.MANAGER.JOB.MAX.WARN}`\|Warning	Manual close: Yes
Veeam Manager: Failed job runs is too high		`last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.failed.jobs)>{$VEEAM.MANAGER.JOB.MAX.FAIL}`\|Average	Manual close: Yes

LLD rule Backup Files discovery

Name Description Type Key and additional info

Backup Files discovery

Discovery of all backup files created on, or imported to the backup servers that are connected to Veeam Backup Enterprise Manager.

Dependent item

veeam.backup.files.discovery

Preprocessing

JSON Path: $.backupFiles.Refs
Discard unchanged with heartbeat: 6h

Item prototypes for Backup Files discovery

Name	Description	Type	Key and additional info
Veeam Manager: Backup Size [{#NAME}]	Gets the backup size with the name `[{#NAME}]`.	Dependent item	veeam.backup.file.size[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.BackupSize` ⛔️Custom on fail: Discard value
Veeam Manager: Data Size [{#NAME}]	Gets the data size with the name `[{#NAME}]`.	Dependent item	veeam.backup.data.size[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.DataSize` ⛔️Custom on fail: Discard value
Veeam Manager: Compression ratio [{#NAME}]	Gets the data compression ratio with the name `[{#NAME}]`.	Dependent item	veeam.backup.compress.ratio[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.CompressRatio` ⛔️Custom on fail: Discard value
Veeam Manager: Deduplication Ratio [{#NAME}]	Gets the data deduplication ratio with the name `[{#NAME}]`.	Dependent item	veeam.backup.deduplication.ratio[{#NAME}] Preprocessing JSON Path: `$.['{#NAME}'].BackupFile.DeduplicationRatio` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

veeam_backup_replication_http

View README Download JSON

Veeam Backup and Replication by HTTP

Overview

This template is designed to monitor Veeam Backup and Replication. It works without any external scripts and uses the script item.

NOTE: Since the RESTful API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:

Veeam Universal License (VUL) editions:

Foundation
Advanced
Premium

Veeam Socket License editions:

Enterprise Plus Socket

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Veeam Backup and Replication, version 11.0

Configuration

Setup

Create a user to monitor the service or use an existing read-only account. > See Veeam Help Center for more details.
Link the template to a host.
Configure the following macros: {$VEEAM.API.URL}, {$VEEAM.USER}, and {$VEEAM.PASSWORD}.

Macros used

Name	Description	Default
{$VEEAM.API.URL}	The Veeam API endpoint is a URL in the format `<scheme>://<host>:<port>`.	`https://localhost:9419`
{$VEEAM.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$VEEAM.PASSWORD}	The `password` of the Veeam Backup and Replication account. It is used to obtain an access token.
{$VEEAM.USER}	The `username` of the Veeam Backup and Replication account. It is used to obtain an access token.
{$VEEAM.DATA.TIMEOUT}	A response timeout for the API.	`10`
{$CREATED.AFTER}	Returns sessions that are created after chosen days.	`7`
{$SESSION.NAME.MATCHES}	This macro is used in discovery rule to evaluate sessions.	`.*`
{$SESSION.NAME.NOT_MATCHES}	This macro is used in discovery rule to evaluate sessions.	`CHANGE_IF_NEEDED`
{$SESSION.TYPE.MATCHES}	This macro is used in discovery rule to evaluate sessions.	`.*`
{$SESSION.TYPE.NOT_MATCHES}	This macro is used in discovery rule to evaluate sessions.	`CHANGE_IF_NEEDED`
{$PROXIES.NAME.MATCHES}	This macro is used in proxies discovery rule.	`.*`
{$PROXIES.NAME.NOT_MATCHES}	This macro is used in proxies discovery rule.	`CHANGE_IF_NEEDED`
{$PROXIES.TYPE.MATCHES}	This macro is used in proxies discovery rule.	`.*`
{$PROXIES.TYPE.NOT_MATCHES}	This macro is used in proxies discovery rule.	`CHANGE_IF_NEEDED`
{$REPOSITORIES.NAME.MATCHES}	This macro is used in repositories discovery rule.	`.*`
{$REPOSITORIES.NAME.NOT_MATCHES}	This macro is used in repositories discovery rule.	`CHANGE_IF_NEEDED`
{$REPOSITORIES.TYPE.MATCHES}	This macro is used in repositories discovery rule.	`.*`
{$REPOSITORIES.TYPE.NOT_MATCHES}	This macro is used in repositories discovery rule.	`CHANGE_IF_NEEDED`
{$JOB.NAME.MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`.*`
{$JOB.NAME.NOT_MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`CHANGE_IF_NEEDED`
{$JOB.TYPE.MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`.*`
{$JOB.TYPE.NOT_MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`CHANGE_IF_NEEDED`
{$JOB.STATUS.MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`.*`
{$JOB.STATUS.NOT_MATCHES}	This macro is used in discovery rule to evaluate the states of jobs.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Veeam: Get metrics

The result of API requests is expressed in the JSON.

Script

veeam.get.metrics

Veeam: Get errors

The errors from API requests.

Dependent item

veeam.get.errors

Preprocessing

JSON Path: $.error
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: 1h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Veeam: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Veeam Backup and Replication by HTTP/veeam.get.errors))>0`\|Average

LLD rule Proxies discovery

Name Description Type Key and additional info

Proxies discovery

Discovery of proxies.

Dependent item

veeam.proxies.discovery

Preprocessing

JSON Path: $.proxies.data
Discard unchanged with heartbeat: 6h

Item prototypes for Proxies discovery

Name	Description	Type	Key and additional info
Veeam: Server [{#NAME}]: Get data	Gets raw data collected by the proxy server.	Dependent item	veeam.proxy.server.raw[{#NAME}] Preprocessing JSON Path: `$.managedServers.data.[?(@.id=='{#HOSTID}')].first()`
Veeam: Proxy [{#NAME}] [{#TYPE}]: Get data	Gets raw data collected by the proxy with the name `[{#NAME}]`, `[{#TYPE}]`.	Dependent item	veeam.proxy.raw[{#NAME}] Preprocessing JSON Path: `$.proxies.data.[?(@.id=='{#ID}')].first()`
Veeam: Proxy [{#NAME}] [{#TYPE}]: Max Task Count	The maximum number of concurrent tasks.	Dependent item	veeam.proxy.maxtask[{#NAME}] Preprocessing JSON Path: `$.server.maxTaskCount`
Veeam: Proxy [{#NAME}] [{#TYPE}]: Host name	The name of the proxy server.	Dependent item	veeam.proxy.server.name[{#NAME}] Preprocessing JSON Path: `$.name`
Veeam: Proxy [{#NAME}] [{#TYPE}]: Host type	The type of the proxy server.	Dependent item	veeam.proxy.server.type[{#NAME}] Preprocessing JSON Path: `$.type`

LLD rule Repositories discovery

Name Description Type Key and additional info

Repositories discovery

Discovery of repositories.

Dependent item

veeam.repositories.discovery

Preprocessing

JSON Path: $.repositories_states.data
Discard unchanged with heartbeat: 6h

Item prototypes for Repositories discovery

Name Description Type Key and additional info

Veeam: Repository [{#NAME}] [{#TYPE}]: Get data

Gets raw data from repository with the name: [{#NAME}], [{#TYPE}].

Dependent item

veeam.repositories.raw[{#NAME}]

Preprocessing

JSON Path: $.repositories_states.data.[?(@.id=='{#ID}')].first()

Veeam: Repository [{#NAME}] [{#TYPE}]: Used space [{#PATH}]

Used space by repositories expressed in gigabytes (GB).

Dependent item

veeam.repository.capacity[{#NAME}]

Preprocessing

JSON Path: $.usedSpaceGB

Veeam: Repository [{#NAME}] [{#TYPE}]: Free space [{#PATH}]

Free space of repositories expressed in gigabytes (GB).

Dependent item

veeam.repository.free.space[{#NAME}]

Preprocessing

JSON Path: $.freeGB

LLD rule Sessions discovery

Name Description Type Key and additional info

Sessions discovery

Discovery of sessions.

Dependent item

veeam.sessions.discovery

Preprocessing

JSON Path: $.sessions.data
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Sessions discovery

Name	Description	Type	Key and additional info
Veeam: Session [{#NAME}] [{#TYPE}]: Get data	Gets raw data from session with the name: `[{#NAME}]`, `[{#TYPE}]`.	Dependent item	veeam.sessions.raw[{#ID}] Preprocessing JSON Path: `$.sessions.data.[?(@.id=='{#ID}')].first()` ⛔️Custom on fail: Discard value
Veeam: Session [{#NAME}] [{#TYPE}]: State	The state of the session. The enums used: `Stopped`, `Starting`, `Stopping`, `Working`, `Pausing`, `Resuming`, `WaitingTape`, `Idle`, `Postprocessing`, `WaitingRepository`, `WaitingSlot`.	Dependent item	veeam.sessions.state[{#ID}] Preprocessing JSON Path: `$.state`
Veeam: Session [{#NAME}] [{#TYPE}]: Result	The result of the session. The enums used: `None`, `Success`, `Warning`, `Failed`.	Dependent item	veeam.sessions.result[{#ID}] Preprocessing JSON Path: `$.result.result`
Veeam: Session [{#NAME}] [{#TYPE}]: Message	A message that explains the session result.	Dependent item	veeam.sessions.message[{#ID}] Preprocessing JSON Path: `$.result.message`
Veeam: Session progress percent [{#NAME}] [{#TYPE}]	The progress of the session expressed as percentage.	Dependent item	veeam.sessions.progress.percent[{#ID}] Preprocessing JSON Path: `$.progressPercent`

Trigger prototypes for Sessions discovery

Name	Description	Expression	Severity	Dependencies and additional info
Veeam: Last result session failed		`find(/Veeam Backup and Replication by HTTP/veeam.sessions.result[{#ID}],,"like","Failed")=1`\|Average	Manual close: Yes

LLD rule Jobs states discovery

Name Description Type Key and additional info

Jobs states discovery

Discovery of the jobs states.

Dependent item

veeam.job.state.discovery

Preprocessing

JSON Path: $.jobs_states.data
Discard unchanged with heartbeat: 6h

Item prototypes for Jobs states discovery

Name Description Type Key and additional info

Veeam: Job states [{#NAME}] [{#TYPE}]: Get data

Gets raw data from the job states with the name [{#NAME}].

Dependent item

veeam.jobs.states.raw[{#ID}]

Preprocessing

JSON Path: $.jobs_states.data.[?(@.id=='{#ID}')].first()

Veeam: Job states [{#NAME}] [{#TYPE}]: Status

The current status of the job. The enums used: running, inactive, disabled.

Dependent item

veeam.jobs.status[{#ID}]

Preprocessing

JSON Path: $.status

Veeam: Job states [{#NAME}] [{#TYPE}]: Last result

The result of the session. The enums used: None, Success, Warning, Failed.

Dependent item

veeam.jobs.last.result[{#ID}]

Preprocessing

JSON Path: $.lastResult

Trigger prototypes for Jobs states discovery

Name	Description	Expression	Severity	Dependencies and additional info
Veeam: Last result job failed		`find(/Veeam Backup and Replication by HTTP/veeam.jobs.last.result[{#ID}],,"like","Failed")=1`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_vault_http

View README Download JSON

HashiCorp Vault by HTTP

Overview

The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Vault by HTTP — collects metrics by HTTP agent from /sys/metrics API endpoint. See https://www.vaultproject.io/api-docs/system/metrics.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Vault 1.6

Configuration

Setup

Configure Vault API. See Vault Configuration. Create a Vault service token and set it to the macro {$VAULT.TOKEN}.

Macros used

Name	Description	Default
{$VAULT.API.PORT}	Vault port.	`8200`
{$VAULT.API.SCHEME}	Vault API scheme.	`http`
{$VAULT.HOST}	Vault host name.	`<PUT YOUR VAULT HOST>`
{$VAULT.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors for trigger expression.	`90`
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN}	Maximum number of Vault leadership setup failed.	`5`
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN}	Maximum number of Vault leadership losses.	`5`
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN}	Maximum number of Vault leadership step downs.	`5`
{$VAULT.LLD.FILTER.STORAGE.MATCHES}	Filter of discoverable storage backends.	`.+`
{$VAULT.TOKEN}	Vault auth token.	`<PUT YOUR AUTH TOKEN>`
{$VAULT.TOKEN.ACCESSORS}	Vault accessors separated by spaces for monitoring token expiration time.
{$VAULT.TOKEN.TTL.MIN.CRIT}	Token TTL critical threshold.	`3d`
{$VAULT.TOKEN.TTL.MIN.WARN}	Token TTL warning threshold.	`7d`

Items

Name	Description	Type	Key and additional info
Vault: Get health		HTTP agent	vault.get_health Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"healthcheck": 0}`
Vault: Get leader		HTTP agent	vault.get_leader Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Vault: Get metrics		HTTP agent	vault.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Vault: Clear metrics		Dependent item	vault.clear_metrics Preprocessing Check for error in JSON: `$.errors` ⛔️Custom on fail: Discard value
Vault: Get tokens	Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}".	Script	vault.get_tokens
Vault: Check WAL discovery		Dependent item	vault.checkwaldiscovery Preprocessing Prometheus to JSON: `{__name__=~"^vault_wal_(?:.+)$"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `15m`
Vault: Check replication discovery		Dependent item	vault.checkreplicationdiscovery Preprocessing Prometheus to JSON: `{__name__=~"^replication_(?:.+)$"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `15m`
Vault: Check storage discovery		Dependent item	vault.checkstoragediscovery Preprocessing Prometheus to JSON: `{name=~"^vault(?:.+)(?:get	put	list	delete)_count$"}`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>JavaScript:`The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat:`15m`
Vault: Check mountpoint discovery		Dependent item	vault.checkmountpointdiscovery Preprocessing Prometheus to JSON: `{__name__=~"^vault_rollback_attempt_(?:.+?)_count$"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `15m`
Vault: Initialized	Initialization status.	Dependent item	vault.health.initialized Preprocessing JSON Path: `$.initialized` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Vault: Sealed	Seal status.	Dependent item	vault.health.sealed Preprocessing JSON Path: `$.sealed` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Vault: Standby	Standby status.	Dependent item	vault.health.standby Preprocessing JSON Path: `$.standby` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Vault: Performance standby	Performance standby status.	Dependent item	vault.health.performance_standby Preprocessing JSON Path: `$.performance_standby` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `1h`
Vault: Performance replication	Performance replication mode https://www.vaultproject.io/docs/enterprise/replication	Dependent item	vault.health.replicationperformancemode Preprocessing JSON Path: `$.replication_performance_mode` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Vault: Disaster Recovery replication	Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication	Dependent item	vault.health.replicationdrmode Preprocessing JSON Path: `$.replication_dr_mode` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Vault: Version	Server version.	Dependent item	vault.health.version Preprocessing JSON Path: `$.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Vault: Healthcheck	Vault healthcheck.	Dependent item	vault.health.check Preprocessing JSON Path: `$.healthcheck` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1h`
Vault: HA enabled	HA enabled status.	Dependent item	vault.leader.ha_enabled Preprocessing JSON Path: `$.ha_enabled` Boolean to decimal Discard unchanged with heartbeat: `1h`
Vault: Is leader	Leader status.	Dependent item	vault.leader.is_self Preprocessing JSON Path: `$.is_self` Boolean to decimal Discard unchanged with heartbeat: `1h`
Vault: Get metrics error	Get metrics error.	Dependent item	vault.get_metrics.error Preprocessing JSON Path: `$.errors[0]` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Vault: Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	vault.metrics.process.cpu.seconds.total Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value
Vault: Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	vault.metrics.process.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Vault: Open file descriptors, current	Number of open file descriptors.	Dependent item	vault.metrics.process.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Vault: Process resident memory	Resident memory size in bytes.	Dependent item	vault.metrics.process.resident_memory.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
Vault: Uptime	Server uptime.	Dependent item	vault.metrics.process.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Vault: Process virtual memory, current	Virtual memory size in bytes.	Dependent item	vault.metrics.process.virtual_memory.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Vault: Process virtual memory, max	Maximum amount of virtual memory available in bytes.	Dependent item	vault.metrics.process.virtual_memory.max.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_max_bytes)` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Vault: Audit log requests, rate	Number of all audit log requests across all audit log devices.	Dependent item	vault.metrics.audit.log.request.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_request_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Audit log request failures, rate	Number of audit log request failures.	Dependent item	vault.metrics.audit.log.request.failure.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_request_failure)` ⛔️Custom on fail: Discard value Change per second
Vault: Audit log response, rate	Number of audit log responses across all audit log devices.	Dependent item	vault.metrics.audit.log.response.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_response_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Audit log response failures, rate	Number of audit log response failures.	Dependent item	vault.metrics.audit.log.response.failure.rate Preprocessing Prometheus pattern: `VALUE(vault_audit_log_response_failure)` ⛔️Custom on fail: Discard value Change per second
Vault: Barrier DELETE ops, rate	Number of DELETE operations at the barrier.	Dependent item	vault.metrics.barrier.delete.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_delete_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Barrier GET ops, rate	Number of GET operations at the barrier.	Dependent item	vault.metrics.vault.barrier.get.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_get_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Barrier LIST ops, rate	Number of LIST operations at the barrier.	Dependent item	vault.metrics.barrier.list.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_list_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Barrier PUT ops, rate	Number of PUT operations at the barrier.	Dependent item	vault.metrics.barrier.put.rate Preprocessing Prometheus pattern: `VALUE(vault_barrier_put_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Cache hit, rate	Number of times a value was retrieved from the LRU cache.	Dependent item	vault.metrics.cache.hit.rate Preprocessing Prometheus pattern: `VALUE(vault_cache_hit)` ⛔️Custom on fail: Discard value Change per second
Vault: Cache miss, rate	Number of times a value was not in the LRU cache. The results in a read from the configured storage.	Dependent item	vault.metrics.cache.miss.rate Preprocessing Prometheus pattern: `VALUE(vault_cache_miss)` ⛔️Custom on fail: Discard value Change per second
Vault: Cache write, rate	Number of times a value was written to the LRU cache.	Dependent item	vault.metrics.cache.write.rate Preprocessing Prometheus pattern: `VALUE(vault_cache_write)` ⛔️Custom on fail: Discard value Change per second
Vault: Check token, rate	Number of token checks handled by Vault core.	Dependent item	vault.metrics.core.check.token.rate Preprocessing Prometheus pattern: `VALUE(vault_core_check_token_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Fetch ACL and token, rate	Number of ACL and corresponding token entry fetches handled by Vault core.	Dependent item	vault.metrics.core.fetch.aclandtoken Preprocessing Prometheus pattern: `VALUE(vault_core_fetch_acl_and_token_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Requests, rate	Number of requests handled by Vault core.	Dependent item	vault.metrics.core.handle.request Preprocessing Prometheus pattern: `VALUE(vault_core_handle_request_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Leadership setup failed, counter	Cluster leadership setup failures which have occurred in a highly available Vault cluster.	Dependent item	vault.metrics.core.leadership.setup_failed Preprocessing Prometheus to JSON: `vault_core_leadership_setup_failed` JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0`
Vault: Leadership setup lost, counter	Cluster leadership losses which have occurred in a highly available Vault cluster.	Dependent item	vault.metrics.core.leadership_lost Preprocessing Prometheus to JSON: `vault_core_leadership_lost_count` JSON Path: `$[?(@.name=="vault_core_leadership_lost_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Post-unseal ops, counter	Duration of time taken by post-unseal operations handled by Vault core.	Dependent item	vault.metrics.core.post_unseal Preprocessing Prometheus pattern: `VALUE(vault_core_post_unseal_count)` ⛔️Custom on fail: Discard value
Vault: Pre-seal ops, counter	Duration of time taken by pre-seal operations.	Dependent item	vault.metrics.core.pre_seal Preprocessing Prometheus pattern: `VALUE(vault_core_pre_seal_count)` ⛔️Custom on fail: Discard value
Vault: Requested seal ops, counter	Duration of time taken by requested seal operations.	Dependent item	vault.metrics.core.sealwithrequest Preprocessing Prometheus pattern: `VALUE(vault_core_seal_with_request_count)` ⛔️Custom on fail: Discard value
Vault: Seal ops, counter	Duration of time taken by seal operations.	Dependent item	vault.metrics.core.seal Preprocessing Prometheus pattern: `VALUE(vault_core_seal_count)` ⛔️Custom on fail: Discard value
Vault: Internal seal ops, counter	Duration of time taken by internal seal operations.	Dependent item	vault.metrics.core.seal_internal Preprocessing Prometheus pattern: `VALUE(vault_core_seal_internal_count)` ⛔️Custom on fail: Discard value
Vault: Leadership step downs, counter	Cluster leadership step down.	Dependent item	vault.metrics.core.step_down Preprocessing Prometheus to JSON: `vault_core_step_down_count` JSON Path: `$[?(@.name=="vault_core_step_down_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Unseal ops, counter	Duration of time taken by unseal operations.	Dependent item	vault.metrics.core.unseal Preprocessing Prometheus pattern: `VALUE(vault_core_unseal_count)` ⛔️Custom on fail: Discard value
Vault: Fetch lease times, counter	Time taken to fetch lease times.	Dependent item	vault.metrics.expire.fetch.lease.times Preprocessing Prometheus pattern: `VALUE(vault_expire_fetch_lease_times_count)` ⛔️Custom on fail: Discard value
Vault: Fetch lease times by token, counter	Time taken to fetch lease times by token.	Dependent item	vault.metrics.expire.fetch.lease.times.by_token Preprocessing Prometheus pattern: `VALUE(vault_expire_fetch_lease_times_by_token_count)` ⛔️Custom on fail: Discard value
Vault: Number of expiring leases	Number of all leases which are eligible for eventual expiry.	Dependent item	vault.metrics.expire.num_leases Preprocessing Prometheus pattern: `VALUE(vault_expire_num_leases)` ⛔️Custom on fail: Discard value
Vault: Expire revoke, count	Time taken to revoke a token.	Dependent item	vault.metrics.expire.revoke Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_count)` ⛔️Custom on fail: Discard value
Vault: Expire revoke force, count	Time taken to forcibly revoke a token.	Dependent item	vault.metrics.expire.revoke.force Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_force_count)` ⛔️Custom on fail: Discard value
Vault: Expire revoke prefix, count	Tokens revoke on a prefix.	Dependent item	vault.metrics.expire.revoke.prefix Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_prefix_count)` ⛔️Custom on fail: Discard value
Vault: Revoke secrets by token, count	Time taken to revoke all secrets issued with a given token.	Dependent item	vault.metrics.expire.revoke.by_token Preprocessing Prometheus pattern: `VALUE(vault_expire_revoke_by_token_count)` ⛔️Custom on fail: Discard value
Vault: Expire renew, count	Time taken to renew a lease.	Dependent item	vault.metrics.expire.renew Preprocessing Prometheus pattern: `VALUE(vault_expire_renew_count)` ⛔️Custom on fail: Discard value
Vault: Renew token, count	Time taken to renew a token which does not need to invoke a logical backend.	Dependent item	vault.metrics.expire.renew_token Preprocessing Prometheus pattern: `VALUE(vault_expire_renew_token_count)` ⛔️Custom on fail: Discard value
Vault: Register ops, count	Time taken for register operations.	Dependent item	vault.metrics.expire.register Preprocessing Prometheus pattern: `VALUE(vault_expire_register_count)` ⛔️Custom on fail: Discard value
Vault: Register auth ops, count	Time taken for register authentication operations which create lease entries without lease ID.	Dependent item	vault.metrics.expire.register.auth Preprocessing Prometheus pattern: `VALUE(vault_expire_register_auth_count)` ⛔️Custom on fail: Discard value
Vault: Policy GET ops, rate	Number of operations to get a policy.	Dependent item	vault.metrics.policy.get_policy.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_get_policy_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Policy LIST ops, rate	Number of operations to list policies.	Dependent item	vault.metrics.policy.list_policies.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_list_policies_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Policy DELETE ops, rate	Number of operations to delete a policy.	Dependent item	vault.metrics.policy.delete_policy.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_delete_policy_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Policy SET ops, rate	Number of operations to set a policy.	Dependent item	vault.metrics.policy.set_policy.rate Preprocessing Prometheus pattern: `VALUE(vault_policy_set_policy_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Token create, count	The time taken to create a token.	Dependent item	vault.metrics.token.create Preprocessing Prometheus pattern: `VALUE(vault_token_create_count)` ⛔️Custom on fail: Discard value
Vault: Token createAccessor, count	The time taken to create a token accessor.	Dependent item	vault.metrics.token.createAccessor Preprocessing Prometheus pattern: `VALUE(vault_token_createAccessor_count)` ⛔️Custom on fail: Discard value
Vault: Token lookup, rate	Number of token look up.	Dependent item	vault.metrics.token.lookup.rate Preprocessing Prometheus pattern: `VALUE(vault_token_lookup_count)` ⛔️Custom on fail: Discard value Change per second
Vault: Token revoke, count	The time taken to look up a token.	Dependent item	vault.metrics.token.revoke Preprocessing Prometheus pattern: `VALUE(vault_token_revoke_count)` ⛔️Custom on fail: Discard value
Vault: Token revoke tree, count	Time taken to revoke a token tree.	Dependent item	vault.metrics.token.revoke.tree Preprocessing Prometheus pattern: `VALUE(vault_token_revoke_tree_count)` ⛔️Custom on fail: Discard value
Vault: Token store, count	Time taken to store an updated token entry without writing to the secondary index.	Dependent item	vault.metrics.token.store Preprocessing Prometheus pattern: `VALUE(vault_token_store_count)` ⛔️Custom on fail: Discard value
Vault: Runtime allocated bytes	Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value.	Dependent item	vault.metrics.runtime.alloc.bytes Preprocessing Prometheus pattern: `VALUE(vault_runtime_alloc_bytes)` ⛔️Custom on fail: Discard value
Vault: Runtime freed objects	Number of freed objects.	Dependent item	vault.metrics.runtime.free.count Preprocessing Prometheus pattern: `VALUE(vault_runtime_free_count)` ⛔️Custom on fail: Discard value
Vault: Runtime heap objects	Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting.	Dependent item	vault.metrics.runtime.heap.objects Preprocessing Prometheus pattern: `VALUE(vault_runtime_heap_objects)` ⛔️Custom on fail: Discard value
Vault: Runtime malloc count	Cumulative count of allocated heap objects.	Dependent item	vault.metrics.runtime.malloc.count Preprocessing Prometheus pattern: `VALUE(vault_runtime_malloc_count)` ⛔️Custom on fail: Discard value
Vault: Runtime num goroutines	Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting.	Dependent item	vault.metrics.runtime.num_goroutines Preprocessing Prometheus pattern: `VALUE(vault_runtime_num_goroutines)` ⛔️Custom on fail: Discard value
Vault: Runtime sys bytes	Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system.	Dependent item	vault.metrics.runtime.sys.bytes Preprocessing Prometheus pattern: `VALUE(vault_runtime_sys_bytes)` ⛔️Custom on fail: Discard value
Vault: Runtime GC pause, total	The total garbage collector pause time since Vault was last started.	Dependent item	vault.metrics.total.gc.pause Preprocessing Prometheus pattern: `VALUE(vault_runtime_total_gc_pause_ns)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
Vault: Runtime GC runs, total	Total number of garbage collection runs since Vault was last started.	Dependent item	vault.metrics.runtime.total.gc.runs Preprocessing Prometheus pattern: `VALUE(vault_runtime_total_gc_runs)` ⛔️Custom on fail: Discard value
Vault: Token count, total	Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes.	Dependent item	vault.metrics.token Preprocessing Prometheus to JSON: `vault_token_count` JSON Path: `$[?(@.name=="vault_token_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Token count by auth, total	Total number of service tokens that were created by an auth method.	Dependent item	vault.metrics.token.by_auth Preprocessing Prometheus to JSON: `vault_token_count_by_auth` JSON Path: `$[?(@.name=="vault_token_count_by_auth")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Token count by policy, total	Total number of service tokens that have a policy attached.	Dependent item	vault.metrics.token.by_policy Preprocessing Prometheus to JSON: `vault_token_count_by_policy` JSON Path: `$[?(@.name=="vault_token_count_by_policy")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Token count by ttl, total	Number of service tokens, grouped by the TTL range they were assigned at creation.	Dependent item	vault.metrics.token.by_ttl Preprocessing Prometheus to JSON: `vault_token_count_by_ttl` JSON Path: `$[?(@.name=="vault_token_count_by_ttl")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Token creation, rate	Number of service or batch tokens created.	Dependent item	vault.metrics.token.creation.rate Preprocessing Prometheus to JSON: `vault_token_creation` JSON Path: `$[?(@.name=="vault_token_creation")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Vault: Secret kv entries	Number of entries in each key-value secret engine.	Dependent item	vault.metrics.secret.kv.count Preprocessing Prometheus to JSON: `vault_secret_kv_count` JSON Path: `$[?(@.name=="vault_secret_kv_count")].value.sum()` ⛔️Custom on fail: Set value to: `0`
Vault: Token secret lease creation, rate	Counts the number of leases created by secret engines.	Dependent item	vault.metrics.secret.lease.creation.rate Preprocessing Prometheus to JSON: `vault_secret_lease_creation` JSON Path: `$[?(@.name=="vault_secret_lease_creation")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second

Triggers

Name	Description	Expression	Severity
Vault: Vault server is sealed	https://www.vaultproject.io/docs/concepts/seal	`last(/HashiCorp Vault by HTTP/vault.health.sealed)=1`\|Average
Vault: Version has changed	Vault version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0`\|Info	Manual close: Yes
Vault: Vault server is not responding		`last(/HashiCorp Vault by HTTP/vault.health.check)=0`\|High
Vault: Failed to get metrics		`length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0`\|Warning	Depends on: Vault: Vault server is sealed
Vault: Current number of open files is too high		`min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN}`\|Warning
Vault: has been restarted	Uptime is less than 10 minutes.	`last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m`\|Info	Manual close: Yes
Vault: High frequency of leadership setup failures	There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h.	`(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN}`\|Average
Vault: High frequency of leadership losses	There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h.	`(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN}`\|Average
Vault: High frequency of leadership step downs	There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h.	`(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN}`\|Average

LLD rule Storage metrics discovery

Name	Description	Type	Key and additional info
Storage metrics discovery	Storage backend metrics discovery.	Dependent item	vault.storage.discovery

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info

Vault: Storage [{#STORAGE}] {#OPERATION} ops, rate

Number of a {#OPERATION} operation against the {#STORAGE} storage backend.

Dependent item

vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}]

Preprocessing

Prometheus pattern: VALUE({#PATTERN_C})
⛔️Custom on fail: Discard value
Change per second

LLD rule Mountpoint metrics discovery

Name	Description	Type	Key and additional info
Mountpoint metrics discovery	Mountpoint metrics discovery.	Dependent item	vault.mountpoint.discovery

Item prototypes for Mountpoint metrics discovery

Name Description Type Key and additional info

Vault: Rollback attempt [{#MOUNTPOINT}] ops, rate

Number of operations to perform a rollback operation on the given mount point.

Dependent item

vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}]

Preprocessing

Prometheus pattern: VALUE({#PATTERN_C})
⛔️Custom on fail: Discard value
Change per second

Vault: Route rollback [{#MOUNTPOINT}] ops, rate

Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors.

Dependent item

vault.metrics.route.rollback.rate[{#MOUNTPOINT}]

Preprocessing

Prometheus pattern: VALUE({#PATTERN_C})
⛔️Custom on fail: Discard value
Change per second

LLD rule WAL metrics discovery

Name	Description	Type	Key and additional info
WAL metrics discovery	Discovery for WAL metrics.	Dependent item	vault.wal.discovery

Item prototypes for WAL metrics discovery

Name	Description	Type	Key and additional info
Vault: Delete WALs, count{#SINGLETON}	Time taken to delete a Write Ahead Log (WAL).	Dependent item	vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_deletewals_count)` ⛔️Custom on fail: Discard value
Vault: GC deleted WAL{#SINGLETON}	Number of Write Ahead Logs (WAL) deleted during each garbage collection run.	Dependent item	vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_gc_deleted)` ⛔️Custom on fail: Discard value
Vault: WALs on disk, total{#SINGLETON}	Total Number of Write Ahead Logs (WAL) on disk.	Dependent item	vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_gc_total)` ⛔️Custom on fail: Discard value
Vault: Load WALs, count{#SINGLETON}	Time taken to load a Write Ahead Log (WAL).	Dependent item	vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_loadWAL_count)` ⛔️Custom on fail: Discard value
Vault: Persist WALs, count{#SINGLETON}	Time taken to persist a Write Ahead Log (WAL).	Dependent item	vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_persistwals_count)` ⛔️Custom on fail: Discard value
Vault: Flush ready WAL, count{#SINGLETON}	Time taken to flush a ready Write Ahead Log (WAL) to storage.	Dependent item	vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(vault_wal_flushready_count)` ⛔️Custom on fail: Discard value

LLD rule Replication metrics discovery

Name	Description	Type	Key and additional info
Replication metrics discovery	Discovery for replication metrics.	Dependent item	vault.replication.discovery

Item prototypes for Replication metrics discovery

Name	Description	Type	Key and additional info
Vault: Stream WAL missing guard, count{#SINGLETON}	Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found.	Dependent item	vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(logshipper_streamWALs_missing_guard)` ⛔️Custom on fail: Discard value
Vault: Stream WAL guard found, count{#SINGLETON}	Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found.	Dependent item	vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(logshipper_streamWALs_guard_found)` ⛔️Custom on fail: Discard value
Vault: Merkle commit index{#SINGLETON}	The last committed index in the Merkle Tree.	Dependent item	vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_merkle_commit_index)` ⛔️Custom on fail: Discard value
Vault: Last WAL{#SINGLETON}	The index of the last WAL.	Dependent item	vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_wal_last_wal)` ⛔️Custom on fail: Discard value
Vault: Last DR WAL{#SINGLETON}	The index of the last DR WAL.	Dependent item	vault.metrics.replication.wal.lastdrwal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_wal_last_dr_wal)` ⛔️Custom on fail: Discard value
Vault: Last performance WAL{#SINGLETON}	The index of the last Performance WAL.	Dependent item	vault.metrics.replication.wal.lastperformancewal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_wal_last_performance_wal)` ⛔️Custom on fail: Discard value
Vault: Last remote WAL{#SINGLETON}	The index of the last remote WAL.	Dependent item	vault.metrics.replication.fsm.lastremotewal[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(replication_fsm_last_remote_wal)` ⛔️Custom on fail: Discard value

LLD rule Token metrics discovery

Name	Description	Type	Key and additional info
Token metrics discovery	Tokens metrics discovery.	Dependent item	vault.tokens.discovery

Item prototypes for Token metrics discovery

Name Description Type Key and additional info

Vault: Token [{#TOKEN_NAME}] error

Token lookup error text.

Dependent item

vault.tokenviaaccessor.error["{#ACCESSOR}"]

Preprocessing

JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].error.first()
Discard unchanged with heartbeat: 1h

Vault: Token [{#TOKEN_NAME}] has TTL

The Token has TTL.

Dependent item

vault.tokenviaaccessor.has_ttl["{#ACCESSOR}"]

Preprocessing

JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].has_ttl.first()
Boolean to decimal
Discard unchanged with heartbeat: 1h

Vault: Token [{#TOKEN_NAME}] TTL

The TTL period of the token.

Dependent item

vault.tokenviaaccessor.ttl["{#ACCESSOR}"]

Preprocessing

JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].ttl.first()

Trigger prototypes for Token metrics discovery

Name	Expression	Severity
Vault: Token [{#TOKEN_NAME}] lookup error occurred	`length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0`\|Warning	Depends on: Vault: Vault server is sealed
Vault: Token [{#TOKEN_NAME}] will expire soon	`last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT}`\|Average
Vault: Token [{#TOKEN_NAME}] will expire soon	`last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN}`\|Warning	Depends on: Vault: Token [{#TOKEN_NAME}] will expire soon

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_truenas_core_snmp

View README Download JSON

TrueNAS CORE by SNMP

Overview

Template for monitoring TrueNAS CORE by SNMP.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

TrueNAS CORE 12.0-U8
TrueNAS CORE 13.0-U5.3

Configuration

Setup

Import the template into Zabbix.
Enable SNMP daemon at Services in TrueNAS CORE web interface: https://www.truenas.com/docs/core/uireference/services/snmpscreen/
Link the template to the host.

Macros used

Name	Description	Default
{$CPU.UTIL.CRIT}	Threshold of CPU utilization for warning trigger in %.	`90`
{$ICMPLOSSWARN}	Threshold of ICMP packets loss for warning trigger in %.	`20`
{$ICMPRESPONSETIME_WARN}	Threshold of average ICMP response time for warning trigger in seconds.	`0.15`
{$IF.ERRORS.WARN}	Threshold of error packets rate for warning trigger. Can be used with interface name as context.	`2`
{$IF.UTIL.MAX}	Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context.	`90`
{$IFCONTROL}	Macro for operational state of the interface for link down trigger. Can be used with interface name as context.	`1`
{$LOADAVGPER_CPU.MAX.WARN}	Load per CPU considered sustainable. Tune if needed.	`1.5`
{$MEMORY.AVAILABLE.MIN}	Threshold of available memory for trigger in bytes.	`20M`
{$MEMORY.UTIL.MAX}	Threshold of memory utilization for trigger in %	`90`
{$NET.IF.IFADMINSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*`
{$NET.IF.IFADMINSTATUS.NOT_MATCHES}	Ignore down(2) administrative status	`^2$`
{$NET.IF.IFALIAS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFALIAS.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFDESCR.MATCHES}	This macro used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFDESCR.NOT_MATCHES}	This macro used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFNAME.NOT_MATCHES}	This macro used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFOPERSTATUS.MATCHES}	This macro used in filters of network interfaces discovery rule.	`^.*$`
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}	Ignore notPresent(6)	`^6$`
{$NET.IF.IFTYPE.MATCHES}	This macro used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFTYPE.NOT_MATCHES}	This macro used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$SNMP.TIMEOUT}	The time interval for SNMP availability trigger.	`5m`
{$SWAP.PFREE.MIN.WARN}	Threshold of free swap space for warning trigger in %.	`50`
{$VFS.DEV.DEVNAME.MATCHES}	This macro is used in block devices discovery. Can be overridden on the host or linked template level	`.+`
{$VFS.DEV.DEVNAME.NOT_MATCHES}	This macro is used in block devices discovery. Can be overridden on the host or linked template level	`Macro too long. Please see the template.`
{$DATASET.NAME.MATCHES}	This macro is used in datasets discovery. Can be overridden on the host or linked template level	`.+`
{$DATASET.NAME.NOT_MATCHES}	This macro is used in datasets discovery. Can be overridden on the host or linked template level	`^(boot\|.+\.system(.+)?$)`
{$ZPOOL.PUSED.MAX.WARN}	Threshold of used pool space for warning trigger in %.	`80`
{$ZPOOL.FREE.MIN.WARN}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$ZPOOL.PUSED.MAX.CRIT}	Threshold of used pool space for average severity trigger in %.	`90`
{$ZPOOL.FREE.MIN.CRIT}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$DATASET.PUSED.MAX.WARN}	Threshold of used dataset space for warning trigger in %.	`80`
{$DATASET.FREE.MIN.WARN}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$DATASET.PUSED.MAX.CRIT}	Threshold of used dataset space for average severity trigger in %.	`90`
{$DATASET.FREE.MIN.CRIT}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`5G`
{$TEMPERATURE.MAX.WARN}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`50`
{$TEMPERATURE.MAX.CRIT}	This macro is used for trigger expression. It can be overridden on the host or linked on the template level.	`65`

Items

Name	Description	Type	Key and additional info
TrueNAS CORE: ICMP ping	Host accessibility by ICMP. 0 - ICMP ping fails. 1 - ICMP ping successful.	Simple check	icmpping
TrueNAS CORE: ICMP loss	Percentage of lost packets.	Simple check	icmppingloss
TrueNAS CORE: ICMP response time	ICMP ping response time (in seconds).	Simple check	icmppingsec
TrueNAS CORE: System contact details	MIB: SNMPv2-MIB The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string.	SNMP agent	system.contact Preprocessing Discard unchanged with heartbeat: `6h`
TrueNAS CORE: System description	MIB: SNMPv2-MIB System description of the host.	SNMP agent	system.descr Preprocessing Discard unchanged with heartbeat: `6h`
TrueNAS CORE: System location	MIB: SNMPv2-MIB The physical location of this node. If the location is unknown, the value is the zero-length string.	SNMP agent	system.location Preprocessing Discard unchanged with heartbeat: `6h`
TrueNAS CORE: System name	MIB: SNMPv2-MIB The host name of the system.	SNMP agent	system.name Preprocessing Discard unchanged with heartbeat: `6h`
TrueNAS CORE: System object ID	MIB: SNMPv2-MIB The vendor authoritative identification of the network management subsystem contained in the entity. This value is allocated within the SMI enterprises subtree (1.3.6.1.4.1) and provides an easy and unambiguous means for determining what kind of box is being managed.	SNMP agent	system.objectid Preprocessing Discard unchanged with heartbeat: `6h`
TrueNAS CORE: Uptime	MIB: HOST-RESOURCES-MIB The amount of time since this host was last initialized. Note that this is different from sysUpTime in the SNMPv2-MIB [RFC1907] because sysUpTime is the uptime of the network management portion of the system.	SNMP agent	system.uptime Preprocessing Custom multiplier: `0.01`
TrueNAS CORE: SNMP traps (fallback)	The item is used to collect all SNMP traps unmatched by other snmptrap items.	SNMP trap	snmptrap.fallback
TrueNAS CORE: SNMP agent availability	Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown	Zabbix internal	zabbix[host,snmp,available]
TrueNAS CORE: Interrupts per second	MIB: UCD-SNMP-MIB Number of interrupts processed.	SNMP agent	system.cpu.intr Preprocessing Change per second
TrueNAS CORE: Context switches per second	MIB: UCD-SNMP-MIB Number of context switches.	SNMP agent	system.cpu.switches Preprocessing Change per second
TrueNAS CORE: Load average (1m avg)	MIB: UCD-SNMP-MIB The 1 minute load averages.	SNMP agent	system.cpu.load.avg1
TrueNAS CORE: Load average (5m avg)	MIB: UCD-SNMP-MIB The 5 minutes load averages.	SNMP agent	system.cpu.load.avg5
TrueNAS CORE: Load average (15m avg)	MIB: UCD-SNMP-MIB The 15 minutes load averages.	SNMP agent	system.cpu.load.avg15
TrueNAS CORE: Number of CPUs	MIB: HOST-RESOURCES-MIB Count the number of CPU cores by counting number of cores discovered in hrProcessorTable using LLD.	SNMP agent	system.cpu.num Preprocessing JavaScript: `The text is too long. Please see the template.`
TrueNAS CORE: Free memory	MIB: UCD-SNMP-MIB The amount of real/physical memory currently unused or available.	SNMP agent	vm.memory.free Preprocessing Custom multiplier: `1024`
TrueNAS CORE: Memory (buffers)	MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as memory buffers.	SNMP agent	vm.memory.buffers Preprocessing Custom multiplier: `1024`
TrueNAS CORE: Memory (cached)	MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as cached memory.	SNMP agent	vm.memory.cached Preprocessing Custom multiplier: `1024`
TrueNAS CORE: Total memory	MIB: UCD-SNMP-MIB The total memory expressed in bytes.	SNMP agent	vm.memory.total Preprocessing Custom multiplier: `1024`
TrueNAS CORE: Available memory	Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP.	Calculated	vm.memory.available
TrueNAS CORE: Memory utilization	Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP.	Calculated	vm.memory.util
TrueNAS CORE: Total swap space	MIB: UCD-SNMP-MIB The total amount of swap space configured for this host.	SNMP agent	system.swap.total Preprocessing Custom multiplier: `1024`
TrueNAS CORE: Free swap space	MIB: UCD-SNMP-MIB The amount of swap space currently unused or available.	SNMP agent	system.swap.free Preprocessing Custom multiplier: `1024`
TrueNAS CORE: Free swap space in %	The free space of the swap volume/file expressed in %.	Calculated	system.swap.pfree Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `100`
TrueNAS CORE: ARC size	MIB: FREENAS-MIB ARC size in bytes.	SNMP agent	truenas.zfs.arc.size Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: ARC metadata size	MIB: FREENAS-MIB ARC metadata size used in bytes.	SNMP agent	truenas.zfs.arc.meta Preprocessing Custom multiplier: `1024`
TrueNAS CORE: ARC data size	MIB: FREENAS-MIB ARC data size used in bytes.	SNMP agent	truenas.zfs.arc.data Preprocessing Custom multiplier: `1024`
TrueNAS CORE: ARC hits	MIB: FREENAS-MIB Total amount of cache hits in the ARC per second.	SNMP agent	truenas.zfs.arc.hits Preprocessing Change per second
TrueNAS CORE: ARC misses	MIB: FREENAS-MIB Total amount of cache misses in the ARC per second.	SNMP agent	truenas.zfs.arc.misses Preprocessing Change per second
TrueNAS CORE: ARC target size of cache	MIB: FREENAS-MIB ARC target size of cache in bytes.	SNMP agent	truenas.zfs.arc.c Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: ARC target size of MRU	MIB: FREENAS-MIB ARC target size of MRU in bytes.	SNMP agent	truenas.zfs.arc.p Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: ARC cache hit ratio	MIB: FREENAS-MIB ARC cache hit ration percentage.	SNMP agent	truenas.zfs.arc.hit.ratio
TrueNAS CORE: ARC cache miss ratio	MIB: FREENAS-MIB ARC cache miss ration percentage.	SNMP agent	truenas.zfs.arc.miss.ratio
TrueNAS CORE: L2ARC hits	MIB: FREENAS-MIB Hits to the L2 cache per second.	SNMP agent	truenas.zfs.l2arc.hits Preprocessing Change per second
TrueNAS CORE: L2ARC misses	MIB: FREENAS-MIB Misses to the L2 cache per second.	SNMP agent	truenas.zfs.l2arc.misses Preprocessing Change per second
TrueNAS CORE: L2ARC read rate	MIB: FREENAS-MIB Read rate from L2 cache in bytes per second.	SNMP agent	truenas.zfs.l2arc.read Preprocessing Change per second
TrueNAS CORE: L2ARC write rate	MIB: FREENAS-MIB Write rate from L2 cache in bytes per second.	SNMP agent	truenas.zfs.l2arc.write Preprocessing Change per second
TrueNAS CORE: L2ARC size	MIB: FREENAS-MIB L2ARC size in bytes.	SNMP agent	truenas.zfs.l2arc.size Preprocessing Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: ZIL operations 1 second	MIB: FREENAS-MIB The ops column parsed from the command zilstat 1 1.	SNMP agent	truenas.zfs.zil.ops1
TrueNAS CORE: ZIL operations 5 seconds	MIB: FREENAS-MIB The ops column parsed from the command zilstat 5 1.	SNMP agent	truenas.zfs.zil.ops5
TrueNAS CORE: ZIL operations 10 seconds	MIB: FREENAS-MIB The ops column parsed from the command zilstat 10 1.	SNMP agent	truenas.zfs.zil.ops10

Triggers

Name	Description	Expression	Severity
TrueNAS CORE: Unavailable by ICMP ping	Last three attempts returned timeout. Please check device connectivity.	`max(/TrueNAS CORE by SNMP/icmpping,#3)=0`\|High
TrueNAS CORE: High ICMP ping loss	ICMP packets loss detected.	`min(/TrueNAS CORE by SNMP/icmppingloss,5m)>{$ICMP_LOSS_WARN} and min(/TrueNAS CORE by SNMP/icmppingloss,5m)<100`\|Warning	Depends on: TrueNAS CORE: Unavailable by ICMP ping
TrueNAS CORE: High ICMP ping response time	Average ICMP response time is too big.	`avg(/TrueNAS CORE by SNMP/icmppingsec,5m)>{$ICMP_RESPONSE_TIME_WARN}`\|Warning	Depends on: TrueNAS CORE: Unavailable by ICMP ping
TrueNAS CORE: System name has changed	The name of the system has changed. Acknowledge to close the problem manually.	`last(/TrueNAS CORE by SNMP/system.name,#1)<>last(/TrueNAS CORE by SNMP/system.name,#2) and length(last(/TrueNAS CORE by SNMP/system.name))>0`\|Info	Manual close: Yes
TrueNAS CORE: Host has been restarted	Uptime is less than 10 minutes.	`last(/TrueNAS CORE by SNMP/system.uptime)<10m`\|Info	Manual close: Yes
TrueNAS CORE: No SNMP data collection	SNMP is not available for polling. Please check device connectivity and SNMP settings.	`max(/TrueNAS CORE by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0`\|Warning	Depends on: TrueNAS CORE: Unavailable by ICMP ping
TrueNAS CORE: Load average is too high	The load average per CPU is too high. The system may be slow to respond.	`min(/TrueNAS CORE by SNMP/system.cpu.load.avg1,5m)/last(/TrueNAS CORE by SNMP/system.cpu.num)>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/TrueNAS CORE by SNMP/system.cpu.load.avg5)>0 and last(/TrueNAS CORE by SNMP/system.cpu.load.avg15)>0`\|Average
TrueNAS CORE: Lack of available memory	The system is running out of memory.	`min(/TrueNAS CORE by SNMP/vm.memory.available,5m)<{$MEMORY.AVAILABLE.MIN} and last(/TrueNAS CORE by SNMP/vm.memory.total)>0`\|Average
TrueNAS CORE: High memory utilization	The system is running out of free memory.	`min(/TrueNAS CORE by SNMP/vm.memory.util,5m)>{$MEMORY.UTIL.MAX}`\|Average	Depends on: TrueNAS CORE: Lack of available memory
TrueNAS CORE: High swap space usage	If there is no swap configured, this trigger is ignored.	`min(/TrueNAS CORE by SNMP/system.swap.pfree,5m)<{$SWAP.PFREE.MIN.WARN} and last(/TrueNAS CORE by SNMP/system.swap.total)>0`\|Warning	Depends on: TrueNAS CORE: Lack of available memory TrueNAS CORE: High memory utilization

LLD rule CPU discovery

Name Description Type Key and additional info

CPU discovery

This discovery will create set of per core CPU metrics from UCD-SNMP-MIB, using {#CPU.COUNT} in preprocessing. That's the only reason why LLD is used.

Dependent item

cpu.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for CPU discovery

Name	Description	Type	Key and additional info
TrueNAS CORE: CPU idle time	MIB: UCD-SNMP-MIB The time the CPU has spent doing nothing.	SNMP agent	system.cpu.idle[{#SNMPINDEX}]
TrueNAS CORE: CPU system time	MIB: UCD-SNMP-MIB The time the CPU has spent running the kernel and its processes.	SNMP agent	system.cpu.system[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
TrueNAS CORE: CPU user time	MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that are not niced.	SNMP agent	system.cpu.user[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
TrueNAS CORE: CPU nice time	MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that have been niced.	SNMP agent	system.cpu.nice[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
TrueNAS CORE: CPU iowait time	MIB: UCD-SNMP-MIB The amount of time the CPU has been waiting for I/O to complete.	SNMP agent	system.cpu.iowait[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
TrueNAS CORE: CPU interrupt time	MIB: UCD-SNMP-MIB The amount of time the CPU has been servicing hardware interrupts.	SNMP agent	system.cpu.interrupt[{#SNMPINDEX}] Preprocessing Change per second JavaScript: `The text is too long. Please see the template.`
TrueNAS CORE: CPU utilization	The CPU utilization expressed in %.	Dependent item	system.cpu.util[{#SNMPINDEX}] Preprocessing JavaScript: `//Calculate utilization<br>return (100 - value)`

Trigger prototypes for CPU discovery

Name	Description	Expression	Severity	Dependencies and additional info
TrueNAS CORE: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/TrueNAS CORE by SNMP/system.cpu.util[{#SNMPINDEX}],5m)>{$CPU.UTIL.CRIT}`\|Warning	Depends on: TrueNAS CORE: Load average is too high

LLD rule Block devices discovery

Name	Description	Type	Key and additional info
Block devices discovery	Block devices are discovered from UCD-DISKIO-MIB::diskIOTable (http://net-snmp.sourceforge.net/docs/mibs/ucdDiskIOMIB.html#diskIOTable).	SNMP agent	vfs.dev.discovery

Item prototypes for Block devices discovery

Name

Description

Type

Key and additional info

TrueNAS CORE: [{#DEVNAME}]: Disk read rate

MIB: UCD-DISKIO-MIB

The number of read accesses from this device since boot.

SNMP agent

vfs.dev.read.rate[{#SNMPINDEX}]

Preprocessing

Change per second

TrueNAS CORE: [{#DEVNAME}]: Disk write rate

MIB: UCD-DISKIO-MIB

The number of write accesses from this device since boot.

SNMP agent

vfs.dev.write.rate[{#SNMPINDEX}]

Preprocessing

Change per second

TrueNAS CORE: [{#DEVNAME}]: Disk utilization

MIB: UCD-DISKIO-MIB

The 1 minute average load of disk (%).

SNMP agent

vfs.dev.util[{#SNMPINDEX}]

LLD rule Network interfaces discovery

Name	Description	Type	Key and additional info
Network interfaces discovery	Discovering interfaces from IF-MIB.	SNMP agent	net.if.discovery

Item prototypes for Network interfaces discovery

Name	Description	Type	Key and additional info
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded	MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.discards[{#SNMPINDEX}] Preprocessing Change per second
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.errors[{#SNMPINDEX}] Preprocessing Change per second
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Bits received	MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded	MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.discards[{#SNMPINDEX}] Preprocessing Change per second
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.errors[{#SNMPINDEX}] Preprocessing Change per second
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Bits sent	MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Speed	MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of `n' then the speed of the interface is somewhere in the range of`n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero.	SNMP agent	net.if.speed[{#SNMPINDEX}] Preprocessing Custom multiplier: `1000000` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Operational status	MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components.	SNMP agent	net.if.status[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Interface type	MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention.	SNMP agent	net.if.type[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`

Trigger prototypes for Network interfaces discovery

Name	Description	Expression	Severity
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High input error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/TrueNAS CORE by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/TrueNAS CORE by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High output error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/TrueNAS CORE by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/TrueNAS CORE by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before	This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.	change(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])<>2)\|Info	Depends on: TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down	This trigger expression works as follows: 1. It can be triggered if the operations status is down. 2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.	`{$IFCONTROL:"{#IFNAME}"}=1 and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])=2)`\|Average

LLD rule ZFS pools discovery

Name Description Type Key and additional info

ZFS pools discovery

ZFS pools discovery from FREENAS-MIB.

SNMP agent

truenas.zfs.pools.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for ZFS pools discovery

Name	Description	Type	Key and additional info
TrueNAS CORE: Pool [{#POOLNAME}]: Total space	MIB: FREENAS-MIB The size of the storage pool in bytes.	SNMP agent	truenas.zpool.size.total[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: Pool [{#POOLNAME}]: Used space	MIB: FREENAS-MIB The used size of the storage pool in bytes.	SNMP agent	truenas.zpool.used[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}`
TrueNAS CORE: Pool [{#POOLNAME}]: Available space	MIB: FREENAS-MIB The available size of the storage pool in bytes.	SNMP agent	truenas.zpool.avail[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}`
TrueNAS CORE: Pool [{#POOLNAME}]: Usage in %	The used size of the storage pool in %.	Calculated	truenas.zpool.pused[{#POOLNAME}]
TrueNAS CORE: Pool [{#POOLNAME}]: Health	MIB: FREENAS-MIB The current health of the containing pool, as reported by zpool status.	SNMP agent	truenas.zpool.health[{#POOLNAME}] Preprocessing Discard unchanged with heartbeat: `1h`
TrueNAS CORE: Pool [{#POOLNAME}]: Read operations rate	MIB: FREENAS-MIB The number of read I/O operations sent to the pool or device, including metadata requests (averaged since system booted).	SNMP agent	truenas.zpool.read.ops[{#POOLNAME}] Preprocessing Change per second
TrueNAS CORE: Pool [{#POOLNAME}]: Write operations rate	MIB: FREENAS-MIB The number of write I/O operations sent to the pool or device (averaged since system booted).	SNMP agent	truenas.zpool.write.ops[{#POOLNAME}] Preprocessing Change per second
TrueNAS CORE: Pool [{#POOLNAME}]: Read rate	MIB: FREENAS-MIB The bandwidth of all read operations (including metadata), expressed as units per second (averaged since system booted).	SNMP agent	truenas.zpool.read.bytes[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}` Change per second
TrueNAS CORE: Pool [{#POOLNAME}]: Write rate	MIB: FREENAS-MIB The bandwidth of all write operations, expressed as units per second (averaged since system booted).	SNMP agent	truenas.zpool.write.bytes[{#POOLNAME}] Preprocessing Custom multiplier: `{#POOL_ALLOC_UNITS}` Change per second

Trigger prototypes for ZFS pools discovery

Name	Description	Expression	Severity
TrueNAS CORE: Pool [{#POOLNAME}]: Very high space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"}%.` 2. The second condition - the pool free space is less than `{$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"}`\|Average
TrueNAS CORE: Pool [{#POOLNAME}]: High space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"}%.` 2. The second condition - the pool free space is less than `{$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"}`\|Warning	Depends on: TrueNAS CORE: Pool [{#POOLNAME}]: Very high space usage
TrueNAS CORE: Pool [{#POOLNAME}]: Status is not online	Please check pool status.	`last(/TrueNAS CORE by SNMP/truenas.zpool.health[{#POOLNAME}]) <> 0`\|Average

LLD rule ZFS datasets discovery

Name Description Type Key and additional info

ZFS datasets discovery

ZFS datasets discovery from FREENAS-MIB.

SNMP agent

truenas.zfs.dataset.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for ZFS datasets discovery

Name	Description	Type	Key and additional info
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Total space	MIB: FREENAS-MIB The size of the dataset in bytes.	SNMP agent	truenas.dataset.size.total[{#DATASET_NAME}] Preprocessing Custom multiplier: `{#DATASET_ALLOC_UNITS}` Discard unchanged with heartbeat: `1h`
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Used space	MIB: FREENAS-MIB The used size of the dataset in bytes.	SNMP agent	truenas.dataset.used[{#DATASET_NAME}] Preprocessing Custom multiplier: `{#DATASET_ALLOC_UNITS}`
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Available space	MIB: FREENAS-MIB The available size of the dataset in bytes.	SNMP agent	truenas.dataset.avail[{#DATASET_NAME}] Preprocessing Custom multiplier: `{#DATASET_ALLOC_UNITS}`
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Usage in %	The used size of the dataset in %.	Calculated	truenas.dataset.pused[{#DATASET_NAME}]

Trigger prototypes for ZFS datasets discovery

Name	Description	Expression	Severity	Dependencies and additional info
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Very high space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"}%.` 2. The second condition - the dataset free space is less than `{$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"}`\|Average
TrueNAS CORE: Dataset [{#DATASET_NAME}]: High space usage	Two conditions should match: 1. The first condition - utilization of the space should be above `{$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"}%.` 2. The second condition - the dataset free space is less than `{$DATASET.FREE.MIN.WARN:"{#POOLNAME}"}`.	`min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.WARN:"{#POOLNAME}"}`\|Warning	Depends on: TrueNAS CORE: Dataset [{#DATASET_NAME}]: Very high space usage

LLD rule ZFS volumes discovery

Name Description Type Key and additional info

ZFS volumes discovery

ZFS volumes discovery from FREENAS-MIB.

SNMP agent

truenas.zfs.zvols.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for ZFS volumes discovery

Name Description Type Key and additional info

TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Total space

MIB: FREENAS-MIB

The size of the ZFS volume in bytes.

SNMP agent

truenas.zvol.size.total[{#ZVOL_NAME}]

Preprocessing

Custom multiplier: {#ZVOL_ALLOC_UNITS}
Discard unchanged with heartbeat: 1h

TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Used space

MIB: FREENAS-MIB

The used size of the ZFS volume in bytes.

SNMP agent

truenas.zvol.used[{#ZVOL_NAME}]

Preprocessing

Custom multiplier: {#ZVOL_ALLOC_UNITS}

TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Available space

MIB: FREENAS-MIB

The available of the ZFS volume in bytes.

SNMP agent

truenas.zvol.avail[{#ZVOL_NAME}]

Preprocessing

Custom multiplier: {#ZVOL_ALLOC_UNITS}

LLD rule Disks temperature discovery

Name Description Type Key and additional info

Disks temperature discovery

Disks temperature discovery from FREENAS-MIB.

SNMP agent

truenas.disk.temp.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for Disks temperature discovery

Name Description Type Key and additional info

TrueNAS CORE: Disk [{#DISK_NAME}]: Temperature

MIB: FREENAS-MIB

The temperature of this HDD in mC.

SNMP agent

truenas.disk.temp[{#DISK_NAME}]

Preprocessing

Custom multiplier: 0.001
Discard unchanged with heartbeat: 1h

Trigger prototypes for Disks temperature discovery

Name	Description	Expression	Severity	Dependencies and additional info
TrueNAS CORE: Disk [{#DISK_NAME}]: Average disk temperature is too high	Disk temperature is high.	`avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.CRIT:"{#DISK_NAME}"}`\|Average
TrueNAS CORE: Disk [{#DISK_NAME}]: Average disk temperature is too high	Disk temperature is high.	`avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.WARN:"{#DISK_NAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_travis_ci_http

View README Download JSON

Travis CI by HTTP

Overview

The template to monitor Travis CI by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Travis CI API V3 2021

Configuration

Setup

You must set {$TRAVIS.API.TOKEN} and {$TRAVIS.API.URL} macros. {$TRAVIS.API.TOKEN} is a Travis API authentication token located in User -> Settings -> API authentication. {$TRAVIS.API.URL} could be in 2 different variations:

for a private project : api.travis-ci.com
for an enterprise projects: api.example.com (where you replace example.com with the domain Travis CI is running on)

Macros used

Name	Description	Default
{$TRAVIS.API.TOKEN}	Travis API Token
{$TRAVIS.API.URL}	Travis API URL	`api.travis-ci.com`
{$TRAVIS.BUILDS.SUCCESS.PERCENT}	Percent of successful builds in the repo (for trigger expression)	`80`

Items

Name	Description	Type	Key and additional info
Travis: Get repos	Getting repos using Travis API.	HTTP agent	travis.get_repos
Travis: Get builds	Getting builds using Travis API.	HTTP agent	travis.get_builds
Travis: Get jobs	Getting jobs using Travis API.	HTTP agent	travis.get_jobs
Travis: Get health	Getting home JSON using Travis API.	HTTP agent	travis.get_health Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `0` JavaScript: `The text is too long. Please see the template.`
Travis: Jobs passed	Total count of passed jobs in all repos.	Dependent item	travis.jobs.total Preprocessing JSON Path: `$.jobs.length()`
Travis: Jobs active	Active jobs in all repos.	Dependent item	travis.jobs.active Preprocessing JSON Path: `$.jobs[?(@.state == "started")].length()` ⛔️Custom on fail: Set value to: `0`
Travis: Jobs in queue	Jobs in queue in all repos.	Dependent item	travis.jobs.queue Preprocessing JSON Path: `$.jobs[?(@.state == "received")].length()` ⛔️Custom on fail: Set value to: `0`
Travis: Builds	Total count of builds in all repos.	Dependent item	travis.builds.total Preprocessing JSON Path: `$.builds.length()`
Travis: Builds duration	Sum of all builds durations in all repos.	Dependent item	travis.builds.duration Preprocessing JSON Path: `$..duration.sum()` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Travis: Service is unavailable	Travis API is unavailable. Please check if the correct macros are set.	`last(/Travis CI by HTTP/travis.get_health)=0`\|High	Manual close: Yes
Travis: Failed to fetch home page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Travis CI by HTTP/travis.get_health,30m)=1`\|Warning	Manual close: Yes

LLD rule Repos metrics discovery

Name Description Type Key and additional info

Repos metrics discovery

Metrics for Repos statistics.

Dependent item

travis.repos.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Repos metrics discovery

Name	Description	Type	Key and additional info
Travis: Repo [{#SLUG}]: Get builds	Getting builds of {#SLUG} using Travis API.	HTTP agent	travis.repo.get_builds[{#SLUG}]
Travis: Repo [{#SLUG}]: Get caches	Getting caches of {#SLUG} using Travis API.	HTTP agent	travis.repo.get_caches[{#SLUG}]
Travis: Repo [{#SLUG}]: Cache files	Count of cache files in {#SLUG} repo.	Dependent item	travis.repo.caches.files[{#SLUG}] Preprocessing JSON Path: `$.caches.length()`
Travis: Repo [{#SLUG}]: Cache size	Total size of cache files in {#SLUG} repo.	Dependent item	travis.repo.caches.size[{#SLUG}] Preprocessing JSON Path: `$.caches..size.sum()` ⛔️Custom on fail: Set value to: `0`
Travis: Repo [{#SLUG}]: Builds passed	Count of all passed builds in {#SLUG} repo.	Dependent item	travis.repo.builds.passed[{#SLUG}] Preprocessing JavaScript: `The text is too long. Please see the template.`
Travis: Repo [{#SLUG}]: Builds failed	Count of all failed builds in {#SLUG} repo.	Dependent item	travis.repo.builds.failed[{#SLUG}] Preprocessing JavaScript: `The text is too long. Please see the template.`
Travis: Repo [{#SLUG}]: Builds total	Count of total builds in {#SLUG} repo.	Dependent item	travis.repo.builds.total[{#SLUG}] Preprocessing JSON Path: `$.builds.length()`
Travis: Repo [{#SLUG}]: Builds passed, %	Percent of passed builds in {#SLUG} repo.	Calculated	travis.repo.builds.passed.pct[{#SLUG}]
Travis: Repo [{#SLUG}]: Description	Description of Travis repo (git project description).	Dependent item	travis.repo.description[{#SLUG}] Preprocessing JSON Path: `$.repositories[?(@.slug == "{#SLUG}")].description.first()` Discard unchanged with heartbeat: `1h`
Travis: Repo [{#SLUG}]: Last build duration	Last build duration in {#SLUG} repo.	Dependent item	travis.repo.last_build.duration[{#SLUG}] Preprocessing JSON Path: `$.builds[0].duration` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Travis: Repo [{#SLUG}]: Last build state	Last build state in {#SLUG} repo.	Dependent item	travis.repo.last_build.state[{#SLUG}] Preprocessing JSON Path: `$.builds[0].state` Discard unchanged with heartbeat: `1h`
Travis: Repo [{#SLUG}]: Last build number	Last build number in {#SLUG} repo.	Dependent item	travis.repo.last_build.number[{#SLUG}] Preprocessing JSON Path: `$.builds[0].number` Discard unchanged with heartbeat: `1h`
Travis: Repo [{#SLUG}]: Last build id	Last build id in {#SLUG} repo.	Dependent item	travis.repo.last_build.id[{#SLUG}] Preprocessing JSON Path: `$.builds[0].id` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Repos metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Travis: Repo [{#SLUG}]: Percent of successful builds	Low successful builds rate.	`last(/Travis CI by HTTP/travis.repo.builds.passed.pct[{#SLUG}])<{$TRAVIS.BUILDS.SUCCESS.PERCENT}`\|Warning	Manual close: Yes
Travis: Repo [{#SLUG}]: Last build status is 'errored'	Last build status is errored.	`find(/Travis CI by HTTP/travis.repo.last_build.state[{#SLUG}],,"like","errored")=1`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_tomcat_jmx

View README Download JSON

Apache Tomcat by JMX

Overview

This template is designed for the effortless deployment of Apache Tomcat monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Apache Tomcat 8.5.59

Configuration

Setup

Metrics are collected by JMX.

Enable and configure JMX access to Apache Tomcat. See documentation for instructions (chose your version).
If your Tomcat installation require authentication for JMX, set values in host macros {$TOMCAT.USERNAME} and {$TOMCAT.PASSWORD}.
You can set custom macro values and add macros with context for specific metrics following macro description.

Macros used

Name	Description	Default
{$TOMCAT.USER}	User for JMX
{$TOMCAT.PASSWORD}	Password for JMX
{$TOMCAT.LLD.FILTER.REQUEST_PROCESSOR.MATCHES}	Filter for discoverable global request processors.	`.*`
{$TOMCAT.LLD.FILTER.REQUESTPROCESSOR.NOTMATCHES}	Filter to exclude global request processors.	`CHANGE_IF_NEEDED`
{$TOMCAT.LLD.FILTER.MANAGER.MATCHES}	Filter for discoverable managers.	`.*`
{$TOMCAT.LLD.FILTER.MANAGER.NOT_MATCHES}	Filter to exclude managers.	`CHANGE_IF_NEEDED`
{$TOMCAT.LLD.FILTER.THREAD_POOL.MATCHES}	Filter for discoverable thread pools.	`.*`
{$TOMCAT.LLD.FILTER.THREADPOOL.NOTMATCHES}	Filter to exclude thread pools.	`CHANGE_IF_NEEDED`
{$TOMCAT.THREADS.MAX.PCT}	Threshold for busy worker threads trigger. Can be used with {#JMXNAME} as context.	`75`
{$TOMCAT.THREADS.MAX.TIME}	The time during which the number of busy threads can exceed the threshold. Can be used with {#JMXNAME} as context.	`5m`

Items

Name Description Type Key and additional info

Tomcat: Version

The version of the Tomcat.

JMX agent

jmx["Catalina:type=Server",serverInfo]

Preprocessing

Discard unchanged with heartbeat: 1d

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Tomcat: Version has been changed	The Tomcat version has changed. Acknowledge to close the problem manually.	`last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#1)<>last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#2) and length(last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo]))>0`\|Info	Manual close: Yes

LLD rule Global request processors discovery

Name	Description	Type	Key and additional info
Global request processors discovery	Discovery for GlobalRequestProcessor	JMX agent	jmx.discovery[beans,"Catalina:type=GlobalRequestProcessor,name=*"]

Item prototypes for Global request processors discovery

Name	Description	Type	Key and additional info
{#JMXNAME}: Bytes received per second	Bytes received rate by processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},bytesReceived] Preprocessing Change per second
{#JMXNAME}: Bytes sent per second	Bytes sent rate by processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},bytesSent] Preprocessing Change per second
{#JMXNAME}: Errors per second	Error rate of request processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},errorCount] Preprocessing Change per second
{#JMXNAME}: Requests per second	Rate of requests served by request processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},requestCount] Preprocessing Change per second
{#JMXNAME}: Requests processing time	The total time to process all incoming requests of request processor {#JMXNAME}	JMX agent	jmx[{#JMXOBJ},processingTime] Preprocessing Custom multiplier: `0.001`

LLD rule Protocol handlers discovery

Name	Description	Type	Key and additional info
Protocol handlers discovery	Discovery for ProtocolHandler	JMX agent	jmx.discovery[attributes,"Catalina:type=ProtocolHandler,port=*"]

Item prototypes for Protocol handlers discovery

Name Description Type Key and additional info

{#JMXVALUE}: Gzip compression status

Gzip compression status on {#JMXNAME}. Enabling gzip compression may save server bandwidth.

JMX agent

jmx[{#JMXOBJ},compression]

Preprocessing

Discard unchanged with heartbeat: 1h

Trigger prototypes for Protocol handlers discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#JMXVALUE}: Gzip compression is disabled	gzip compression is disabled for connector {#JMXVALUE}.	`find(/Apache Tomcat by JMX/jmx[{#JMXOBJ},compression],,"like","off") = 1`\|Info	Manual close: Yes

LLD rule Thread pools discovery

Name	Description	Type	Key and additional info
Thread pools discovery	Discovery for ThreadPool	JMX agent	jmx.discovery[beans,"Catalina:type=ThreadPool,name=*"]

Item prototypes for Thread pools discovery

Name Description Type Key and additional info

{#JMXNAME}: Threads count

Amount of threads the thread pool has right now, both busy and free.

JMX agent

jmx[{#JMXOBJ},currentThreadCount]

Preprocessing

Discard unchanged with heartbeat: 10m

{#JMXNAME}: Threads limit

Limit of the threads count. When currentThreadsBusy counter reaches the maxThreads limit, no more requests could be handled, and the application chokes.

JMX agent

jmx[{#JMXOBJ},maxThreads]

Preprocessing

Discard unchanged with heartbeat: 10m

{#JMXNAME}: Threads busy

Number of the requests that are being currently handled.

JMX agent

jmx[{#JMXOBJ},currentThreadsBusy]

Trigger prototypes for Thread pools discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#JMXNAME}: Busy worker threads count is high	When current threads busy counter reaches the limit, no more requests could be handled, and the application chokes.	`min(/Apache Tomcat by JMX/jmx[{#JMXOBJ},currentThreadsBusy],{$TOMCAT.THREADS.MAX.TIME:"{#JMXNAME}"})>last(/Apache Tomcat by JMX/jmx[{#JMXOBJ},maxThreads])*{$TOMCAT.THREADS.MAX.PCT:"{#JMXNAME}"}/100`\|High

LLD rule Contexts discovery

Name	Description	Type	Key and additional info
Contexts discovery	Discovery for contexts	JMX agent	jmx.discovery[beans,"Catalina:type=Manager,host=,context="]

Item prototypes for Contexts discovery

Name	Description	Type	Key and additional info
{#JMXHOST}{#JMXCONTEXT}: Sessions active	Active sessions of the application.	JMX agent	jmx[{#JMXOBJ},activeSessions]
{#JMXHOST}{#JMXCONTEXT}: Sessions active maximum so far	Maximum number of active sessions so far.	JMX agent	jmx[{#JMXOBJ},maxActive]
{#JMXHOST}{#JMXCONTEXT}: Sessions created per second	Rate of sessions created by this application per second.	JMX agent	jmx[{#JMXOBJ},sessionCounter] Preprocessing Change per second
{#JMXHOST}{#JMXCONTEXT}: Sessions rejected per second	Rate of sessions we rejected due to maxActive being reached.	JMX agent	jmx[{#JMXOBJ},rejectedSessions] Preprocessing Change per second
{#JMXHOST}{#JMXCONTEXT}: Sessions allowed maximum	The maximum number of active Sessions allowed, or -1 for no limit.	JMX agent	jmx[{#JMXOBJ},maxActiveSessions]

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_systemd

View README Download JSON

Systemd by Zabbix agent 2

Overview

This template is designed for the effortless deployment of Systemd monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Systemd 219

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Systemd monitoring plugin.
Set filters with macros if you want to override default filter parameters.

Macros used

Name	Description	Default
{$SYSTEMD.NAME.SOCKET.MATCHES}	Filter of systemd socket units by name	`.*`
{$SYSTEMD.NAME.SOCKET.NOT_MATCHES}	Filter of systemd socket units by name	`CHANGE_IF_NEEDED`
{$SYSTEMD.ACTIVESTATE.SOCKET.MATCHES}	Filter of systemd socket units by active state	`active`
{$SYSTEMD.ACTIVESTATE.SOCKET.NOT_MATCHES}	Filter of systemd socket units by active state	`CHANGE_IF_NEEDED`
{$SYSTEMD.UNITFILESTATE.SOCKET.MATCHES}	Filter of systemd socket units by unit file state	`enabled`
{$SYSTEMD.UNITFILESTATE.SOCKET.NOT_MATCHES}	Filter of systemd socket units by unit file state	`CHANGE_IF_NEEDED`
{$SYSTEMD.NAME.SERVICE.MATCHES}	Filter of systemd service units by name	`.*`
{$SYSTEMD.NAME.SERVICE.NOT_MATCHES}	Filter of systemd service units by name	`CHANGE_IF_NEEDED`
{$SYSTEMD.ACTIVESTATE.SERVICE.MATCHES}	Filter of systemd service units by active state	`active`
{$SYSTEMD.ACTIVESTATE.SERVICE.NOT_MATCHES}	Filter of systemd service units by active state	`CHANGE_IF_NEEDED`
{$SYSTEMD.UNITFILESTATE.SERVICE.MATCHES}	Filter of systemd service units by unit file state	`enabled`
{$SYSTEMD.UNITFILESTATE.SERVICE.NOT_MATCHES}	Filter of systemd service units by unit file state	`CHANGE_IF_NEEDED`

LLD rule Service units discovery

Name	Description	Type	Key and additional info
Service units discovery	Discover systemd service units and their details.	Zabbix agent	systemd.unit.discovery[service]

Item prototypes for Service units discovery

Name	Description	Type	Key and additional info
{#UNIT.NAME}: Get unit info	Returns all properties of a systemd service unit. Unit description: {#UNIT.DESCRIPTION}.	Zabbix agent	systemd.unit.get["{#UNIT.NAME}"]
{#UNIT.NAME}: Active state	State value that reflects whether the unit is currently active or not. The following states are currently defined: "active", "reloading", "inactive", "failed", "activating", and "deactivating".	Dependent item	systemd.service.active_state["{#UNIT.NAME}"] Preprocessing JSON Path: `$.ActiveState.state` Discard unchanged with heartbeat: `30m`
{#UNIT.NAME}: Load state	State value that reflects whether the configuration file of this unit has been loaded. The following states are currently defined: "loaded", "error", and "masked".	Dependent item	systemd.service.load_state["{#UNIT.NAME}"] Preprocessing JSON Path: `$.LoadState.state` Discard unchanged with heartbeat: `30m`
{#UNIT.NAME}: Unit file state	Encodes the install state of the unit file of FragmentPath. It currently knows the following states: "enabled", "enabled-runtime", "linked", "linked-runtime", "masked", "masked-runtime", "static", "disabled", and "invalid".	Dependent item	systemd.service.unitfile_state["{#UNIT.NAME}"] Preprocessing JSON Path: `$.UnitFileState.state` Discard unchanged with heartbeat: `30m`
{#UNIT.NAME}: Active time	Number of seconds since unit entered the active state.	Dependent item	systemd.service.uptime["{#UNIT.NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for Service units discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#UNIT.NAME}: Service is not running		`last(/Systemd by Zabbix agent 2/systemd.service.active_state["{#UNIT.NAME}"])<>1`\|Warning	Manual close: Yes
{#UNIT.NAME}: has been restarted	Uptime is less than 10 minutes.	`last(/Systemd by Zabbix agent 2/systemd.service.uptime["{#UNIT.NAME}"])<10m`\|Info	Manual close: Yes

LLD rule Socket units discovery

Name	Description	Type	Key and additional info
Socket units discovery	Discover systemd socket units and their details.	Zabbix agent	systemd.unit.discovery[socket]

Item prototypes for Socket units discovery

Name Description Type Key and additional info

{#UNIT.NAME}: Get unit info

Returns all properties of a systemd socket unit.

Unit description: {#UNIT.DESCRIPTION}.

Zabbix agent

systemd.unit.get["{#UNIT.NAME}",Socket]

{#UNIT.NAME}: Connections accepted per sec

The number of accepted socket connections (NAccepted) per second.

Dependent item

systemd.socket.conn_accepted.rate["{#UNIT.NAME}"]

Preprocessing

JSON Path: $.NAccepted
Change per second

{#UNIT.NAME}: Connections connected

The current number of socket connections (NConnections).

Dependent item

systemd.socket.conn_count["{#UNIT.NAME}"]

Preprocessing

JSON Path: $.NConnections

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_squid_snmp

View README Download JSON

Squid by SNMP

Overview

This template is designed for the effortless deployment of Squid monitoring by Zabbix via SNMP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Squid 3.5.12

Configuration

Setup

Setup Squid

Enable SNMP support following official documentation. Required parameters in squid.conf:

snmp_port <port_number>
acl <zbx_acl_name> snmp_community <community_name>
snmp_access allow <zbx_acl_name> <zabbix_server_ip>

Setup Zabbix

1. Import the template templateappsquid_snmp.yaml into Zabbix.

2. Set values for {$SQUID.SNMP.COMMUNITY}, {$SQUID.SNMP.PORT} and {$SQUID.HTTP.PORT} as configured in squid.conf.

3. Link the imported template to a host with Squid.

4. Add SNMPv2 interface to Squid host. Set Port as {$SQUID.SNMP.PORT} and SNMP community as {$SQUID.SNMP.COMMUNITY}.

Macros used

Name	Description	Default
{$SQUID.SNMP.PORT}	snmp_port configured in squid.conf (Default: 3401)	`3401`
{$SQUID.HTTP.PORT}	http_port configured in squid.conf (Default: 3128)	`3128`
{$SQUID.SNMP.COMMUNITY}	SNMP community allowed by ACL in squid.conf	`public`
{$SQUID.FILE.DESC.WARN.MIN}	The threshold for minimum number of available file descriptors	`100`
{$SQUID.PAGE.FAULT.WARN}	The threshold for sys page faults rate in percent of received HTTP requests	`90`

Items

Name	Description	Type	Key and additional info
Squid: Service ping		Simple check	net.tcp.service[tcp,,{$SQUID.HTTP.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
Squid: Uptime	The Uptime of the cache in timeticks (in hundredths of a second) with preprocessing	SNMP agent	squid[cacheUptime] Preprocessing Custom multiplier: `0.01`
Squid: Version	Cache Software Version	SNMP agent	squid[cacheVersionId] Preprocessing Discard unchanged with heartbeat: `6h`
Squid: CPU usage	The percentage use of the CPU	SNMP agent	squid[cacheCpuUsage]
Squid: Memory maximum resident size	Maximum Resident Size	SNMP agent	squid[cacheMaxResSize] Preprocessing Custom multiplier: `1024`
Squid: Memory maximum cache size	The value of the cache_mem parameter	SNMP agent	squid[cacheMemMaxSize] Preprocessing Custom multiplier: `1048576`
Squid: Memory cache usage	Total accounted memory	SNMP agent	squid[cacheMemUsage] Preprocessing Custom multiplier: `1024`
Squid: Cache swap low water mark	Cache Swap Low Water Mark	SNMP agent	squid[cacheSwapLowWM]
Squid: Cache swap high water mark	Cache Swap High Water Mark	SNMP agent	squid[cacheSwapHighWM]
Squid: Cache swap directory size	The total of the cache_dir space allocated	SNMP agent	squid[cacheSwapMaxSize] Preprocessing Custom multiplier: `1048576`
Squid: Cache swap current size	Storage Swap Size	SNMP agent	squid[cacheCurrentSwapSize]
Squid: File descriptor count - current used	Number of file descriptors in use	SNMP agent	squid[cacheCurrentFileDescrCnt]
Squid: File descriptor count - current maximum	Highest number of file descriptors in use	SNMP agent	squid[cacheCurrentFileDescrMax]
Squid: File descriptor count - current reserved	Reserved number of file descriptors	SNMP agent	squid[cacheCurrentResFileDescrCnt]
Squid: File descriptor count - current available	Available number of file descriptors	SNMP agent	squid[cacheCurrentUnusedFDescrCnt]
Squid: Byte hit ratio per 1 minute	Byte Hit Ratios	SNMP agent	squid[cacheRequestByteRatio.1]
Squid: Byte hit ratio per 5 minutes	Byte Hit Ratios	SNMP agent	squid[cacheRequestByteRatio.5]
Squid: Byte hit ratio per 1 hour	Byte Hit Ratios	SNMP agent	squid[cacheRequestByteRatio.60]
Squid: Request hit ratio per 1 minute	Byte Hit Ratios	SNMP agent	squid[cacheRequestHitRatio.1]
Squid: Request hit ratio per 5 minutes	Byte Hit Ratios	SNMP agent	squid[cacheRequestHitRatio.5]
Squid: Request hit ratio per 1 hour	Byte Hit Ratios	SNMP agent	squid[cacheRequestHitRatio.60]
Squid: Sys page faults per second	Page faults with physical I/O	SNMP agent	squid[cacheSysPageFaults] Preprocessing Change per second
Squid: HTTP requests received per second	Number of HTTP requests received	SNMP agent	squid[cacheProtoClientHttpRequests] Preprocessing Change per second
Squid: HTTP traffic received per second	Number of HTTP traffic received from clients	SNMP agent	squid[cacheHttpInKb] Preprocessing Custom multiplier: `1024` Change per second
Squid: HTTP traffic sent per second	Number of HTTP traffic sent to clients	SNMP agent	squid[cacheHttpOutKb] Preprocessing Custom multiplier: `1024` Change per second
Squid: HTTP Hits sent from cache per second	Number of HTTP Hits sent to clients from cache	SNMP agent	squid[cacheHttpHits] Preprocessing Change per second
Squid: HTTP Errors sent per second	Number of HTTP Errors sent to clients	SNMP agent	squid[cacheHttpErrors] Preprocessing Change per second
Squid: ICP messages sent per second	Number of ICP messages sent	SNMP agent	squid[cacheIcpPktsSent] Preprocessing Change per second
Squid: ICP messages received per second	Number of ICP messages received	SNMP agent	squid[cacheIcpPktsRecv] Preprocessing Change per second
Squid: ICP traffic transmitted per second	Number of ICP traffic transmitted	SNMP agent	squid[cacheIcpKbSent] Preprocessing Custom multiplier: `1024` Change per second
Squid: ICP traffic received per second	Number of ICP traffic received	SNMP agent	squid[cacheIcpKbRecv] Preprocessing Custom multiplier: `1024` Change per second
Squid: DNS server requests per second	Number of external dns server requests	SNMP agent	squid[cacheDnsRequests] Preprocessing Change per second
Squid: DNS server replies per second	Number of external dns server replies	SNMP agent	squid[cacheDnsReplies] Preprocessing Change per second
Squid: FQDN cache requests per second	Number of FQDN Cache requests	SNMP agent	squid[cacheFqdnRequests] Preprocessing Change per second
Squid: FQDN cache hits per second	Number of FQDN Cache hits	SNMP agent	squid[cacheFqdnHits] Preprocessing Change per second
Squid: FQDN cache misses per second	Number of FQDN Cache misses	SNMP agent	squid[cacheFqdnMisses] Preprocessing Change per second
Squid: IP cache requests per second	Number of IP Cache requests	SNMP agent	squid[cacheIpRequests] Preprocessing Change per second
Squid: IP cache hits per second	Number of IP Cache hits	SNMP agent	squid[cacheIpHits] Preprocessing Change per second
Squid: IP cache misses per second	Number of IP Cache misses	SNMP agent	squid[cacheIpMisses] Preprocessing Change per second
Squid: Objects count	Number of objects stored by the cache	SNMP agent	squid[cacheNumObjCount]
Squid: Objects LRU expiration age	Storage LRU Expiration Age	SNMP agent	squid[cacheCurrentLRUExpiration] Preprocessing Custom multiplier: `0.01`
Squid: Objects unlinkd requests	Requests given to unlinkd	SNMP agent	squid[cacheCurrentUnlinkRequests]
Squid: HTTP all service time per 5 minutes	HTTP all service time per 5 minutes	SNMP agent	squid[cacheHttpAllSvcTime.5] Preprocessing Custom multiplier: `0.001`
Squid: HTTP all service time per hour	HTTP all service time per hour	SNMP agent	squid[cacheHttpAllSvcTime.60] Preprocessing Custom multiplier: `0.001`
Squid: HTTP miss service time per 5 minutes	HTTP miss service time per 5 minutes	SNMP agent	squid[cacheHttpMissSvcTime.5] Preprocessing Custom multiplier: `0.001`
Squid: HTTP miss service time per hour	HTTP miss service time per hour	SNMP agent	squid[cacheHttpMissSvcTime.60] Preprocessing Custom multiplier: `0.001`
Squid: HTTP hit service time per 5 minutes	HTTP hit service time per 5 minutes	SNMP agent	squid[cacheHttpHitSvcTime.5] Preprocessing Custom multiplier: `0.001`
Squid: HTTP hit service time per hour	HTTP hit service time per hour	SNMP agent	squid[cacheHttpHitSvcTime.60] Preprocessing Custom multiplier: `0.001`
Squid: ICP query service time per 5 minutes	ICP query service time per 5 minutes	SNMP agent	squid[cacheIcpQuerySvcTime.5] Preprocessing Custom multiplier: `0.001`
Squid: ICP query service time per hour	ICP query service time per hour	SNMP agent	squid[cacheIcpQuerySvcTime.60] Preprocessing Custom multiplier: `0.001`
Squid: ICP reply service time per 5 minutes	ICP reply service time per 5 minutes	SNMP agent	squid[cacheIcpReplySvcTime.5] Preprocessing Custom multiplier: `0.001`
Squid: ICP reply service time per hour	ICP reply service time per hour	SNMP agent	squid[cacheIcpReplySvcTime.60] Preprocessing Custom multiplier: `0.001`
Squid: DNS service time per 5 minutes	DNS service time per 5 minutes	SNMP agent	squid[cacheDnsSvcTime.5] Preprocessing Custom multiplier: `0.001`
Squid: DNS service time per hour	DNS service time per hour	SNMP agent	squid[cacheDnsSvcTime.60] Preprocessing Custom multiplier: `0.001`

Triggers

Name	Description	Expression	Severity
Squid: Port {$SQUID.HTTP.PORT} is down		`last(/Squid by SNMP/net.tcp.service[tcp,,{$SQUID.HTTP.PORT}])=0`\|Average	Manual close: Yes
Squid: Squid has been restarted	Uptime is less than 10 minutes.	`last(/Squid by SNMP/squid[cacheUptime])<10m`\|Info	Manual close: Yes
Squid: Squid version has been changed	Squid version has changed. Acknowledge to close the problem manually.	`last(/Squid by SNMP/squid[cacheVersionId],#1)<>last(/Squid by SNMP/squid[cacheVersionId],#2) and length(last(/Squid by SNMP/squid[cacheVersionId]))>0`\|Info	Manual close: Yes
Squid: Swap usage is more than low watermark		`last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapLowWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100`\|Warning
Squid: Swap usage is more than high watermark		`last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapHighWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100`\|High
Squid: Squid is running out of file descriptors		`last(/Squid by SNMP/squid[cacheCurrentUnusedFDescrCnt])<{$SQUID.FILE.DESC.WARN.MIN}`\|Warning
Squid: High sys page faults rate		`avg(/Squid by SNMP/squid[cacheSysPageFaults],5m)>avg(/Squid by SNMP/squid[cacheProtoClientHttpRequests],5m)/100*{$SQUID.PAGE.FAULT.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_sharepoint_http

View README Download JSON

Microsoft SharePoint by HTTP

Overview

This template is designed for the effortless deployment of Microsoft SharePoint monitoring by Zabbix via HTTP and doesn't require any external scripts.

SharePoint includes a Representational State Transfer (REST) service. Developers can perform read operations from their SharePoint Add-ins, solutions, and client applications, using REST web technologies and standard Open Data Protocol (OData) syntax. Details in https://docs.microsoft.com/ru-ru/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service?tabs=csom

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

SharePoint Server 2019

Configuration

Setup

Create a new host. Define macros according to your Sharepoint web portal. It is recommended to fill in the values of the filter macros to avoid getting redundant data.

Macros used

Name	Description	Default
{$SHAREPOINT.USER}
{$SHAREPOINT.PASSWORD}
{$SHAREPOINT.URL}	Portal page URL. For example http://sharepoint.companyname.local/
{$SHAREPOINT.LLD.FILTER.NAME.MATCHES}	Filter of discoverable dictionaries by name.	`.*`
{$SHAREPOINT.LLD.FILTER.FULL_PATH.MATCHES}	Filter of discoverable dictionaries by full path.	`^/`
{$SHAREPOINT.LLD.FILTER.TYPE.MATCHES}	Filter of discoverable types.	`FOLDER`
{$SHAREPOINT.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered dictionaries by name.	`CHANGE_IF_NEEDED`
{$SHAREPOINT.LLD.FILTER.FULLPATH.NOTMATCHES}	Filter to exclude discovered dictionaries by full path.	`CHANGE_IF_NEEDED`
{$SHAREPOINT.LLD.FILTER.TYPE.NOT_MATCHES}	Filter to exclude discovered types.	`CHANGE_IF_NEEDED`
{$SHAREPOINT.ROOT}		`/Shared Documents`
{$SHAREPOINT.LLD_INTERVAL}		`3h`
{$SHAREPOINT.GET_INTERVAL}		`1m`
{$SHAREPOINT.MAXHEALTHSCORE}	Must be in the range from 0 to 10 in details: https://docs.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-wsshp/c60ddeb6-4113-4a73-9e97-26b5c3907d33	`5`

Items

Name	Description	Type	Key and additional info
Sharepoint: Get directory structure	Used to get directory structure information	Script	sharepoint.get_dir Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"status":520,"data":{},"time":0}`
Sharepoint: Get directory structure: Status	HTTP response (status) code. Indicates whether the HTTP request was successfully completed. Additional information is available in the server log file.	Dependent item	sharepoint.get_dir.status Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set error to: `DISCARD_VALUE` Discard unchanged with heartbeat: `3h`
Sharepoint: Get directory structure: Exec time	The time taken to execute the script for obtaining the data structure (in ms). Less is better.	Dependent item	sharepoint.get_dir.time Preprocessing JSON Path: `$.time` ⛔️Custom on fail: Set error to: `DISCARD_VALUE` Discard unchanged with heartbeat: `3h`
Sharepoint: Health score	This item specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput.	HTTP agent	sharepoint.health_score Preprocessing Regular expression: `X-SharePointHealthScore\b:\s(\d+) \1` In range: `0 -> 10` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity
Sharepoint: Error getting directory structure.	Error getting directory structure. Check the Zabbix server log for more details.	`last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.status)<>200`\|Warning	Manual close: Yes
Sharepoint: Server responds slowly to API request		`last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.time)>2000`\|Warning	Manual close: Yes
Sharepoint: Bad health score		`last(/Microsoft SharePoint by HTTP/sharepoint.health_score)>"{$SHAREPOINT.MAX_HEALTH_SCORE}"`\|Average

LLD rule Directory discovery

Name Description Type Key and additional info

Directory discovery

Script

sharepoint.directory.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for Directory discovery

Name Description Type Key and additional info

Sharepoint: Size ({#SHAREPOINT.LLD.FULL_PATH})

Size of:

{#SHAREPOINT.LLD.FULL_PATH}

Dependent item

sharepoint.size["{#SHAREPOINT.LLD.FULL_PATH}"]

Preprocessing

JSON Path: {{#SHAREPOINT.LLD.JSON_PATH}.regsub("(.*)", \1)}.meta.size
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 24h

Sharepoint: Modified ({#SHAREPOINT.LLD.FULL_PATH})

Date of change:

{#SHAREPOINT.LLD.FULL_PATH}

Dependent item

sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Sharepoint: Created ({#SHAREPOINT.LLD.FULL_PATH})

Date of creation:

{#SHAREPOINT.LLD.FULL_PATH}

Dependent item

sharepoint.created["{#SHAREPOINT.LLD.FULL_PATH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Trigger prototypes for Directory discovery

Name	Description	Expression	Severity	Dependencies and additional info
Sharepoint: Sharepoint object is changed	Updated date of modification of folder / file	`last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#1)<>last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_rabbitmq_http

View README Download JSON

RabbitMQ cluster by HTTP

Overview

This template is developed to monitor the messaging broker RabbitMQ cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See the RabbitMQ documentation for the instructions.
Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

Set the hostname or IP address of the RabbitMQ cluster host in the {$RABBITMQ.API.CLUSTER_HOST} macro. You can also change the port in the {$RABBITMQ.API.PORT} macro and the scheme in the {$RABBITMQ.API.SCHEME} macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER} and {$RABBITMQ.API.PASSWORD}.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.API.CLUSTER_HOST}	The hostname or IP of the API endpoint for the RabbitMQ cluster.	`<SET CLUSTER API HOST>`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
RabbitMQ: Get overview	The HTTP API endpoint that returns cluster-wide metrics.	HTTP agent	rabbitmq.get_overview
RabbitMQ: Get exchanges	The HTTP API endpoint that returns exchanges metrics.	HTTP agent	rabbitmq.get_exchanges
RabbitMQ: Connections total	The total number of connections.	Dependent item	rabbitmq.overview.object_totals.connections Preprocessing JSON Path: `$.object_totals.connections`
RabbitMQ: Channels total	The total number of channels.	Dependent item	rabbitmq.overview.object_totals.channels Preprocessing JSON Path: `$.object_totals.channels`
RabbitMQ: Queues total	The total number of queues.	Dependent item	rabbitmq.overview.object_totals.queues Preprocessing JSON Path: `$.object_totals.queues`
RabbitMQ: Consumers total	The total number of consumers.	Dependent item	rabbitmq.overview.object_totals.consumers Preprocessing JSON Path: `$.object_totals.consumers`
RabbitMQ: Exchanges total	The total number of exchanges.	Dependent item	rabbitmq.overview.object_totals.exchanges Preprocessing JSON Path: `$.object_totals.exchanges`
RabbitMQ: Messages total	The total number of messages (ready, plus unacknowledged).	Dependent item	rabbitmq.overview.queue_totals.messages Preprocessing JSON Path: `$.queue_totals.messages` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages ready for delivery	The number of messages ready for delivery.	Dependent item	rabbitmq.overview.queue_totals.messages.ready Preprocessing JSON Path: `$.queue_totals.messages_ready` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages unacknowledged	The number of unacknowledged messages.	Dependent item	rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing JSON Path: `$.queue_totals.messages_unacknowledged` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack.rate Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.overview.messages.confirm Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.overview.messages.confirm.rate Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get.rate Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages published	The count of published messages.	Dependent item	rabbitmq.overview.messages.publish Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.overview.messages.publish.rate Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in.rate Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out.rate Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable.rate Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned redeliver	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned redeliver per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver.rate Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: Failed to fetch overview data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ cluster by HTTP/rabbitmq.get_overview,30m)=1`\|Warning	Manual close: Yes

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name Description Type Key and additional info

RabbitMQ: Healthcheck: alarms in effect in the cluster{#SINGLETON}

Responds a 200 OK if there are no alarms in effect in the cluster, otherwise responds with a 503 Service Unavailable.

HTTP agent

rabbitmq.healthcheck.alarms[{#SINGLETON}]

Preprocessing

Regular expression: HTTP\/1\.1\b\s(\d+) \1
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: There are active alarms in the cluster	This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ cluster by HTTP/rabbitmq.healthcheck.alarms[{#SINGLETON}])=0`\|Average

LLD rule Exchanges discovery

Name Description Type Key and additional info

Exchanges discovery

The metrics for an individual exchange.

Dependent item

rabbitmq.exchanges.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Exchanges discovery

Name	Description	Type	Key and additional info
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `The text is too long. Please see the template.`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

RabbitMQ node by HTTP

Overview

This template is developed to monitor the messaging broker RabbitMQ node by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See the RabbitMQ documentation for the instructions.
Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

Set the hostname or IP address of the RabbitMQ node host in the {$RABBITMQ.API.HOST} macro. You can also change the port in the {$RABBITMQ.API.PORT} macro and the scheme in the {$RABBITMQ.API.SCHEME} macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER} and {$RABBITMQ.API.PASSWORD}.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.CLUSTER.NAME}	The name of the RabbitMQ cluster.	`rabbit`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.API.HOST}	The hostname or IP of the API endpoint for the RabbitMQ.	`<SET NODE API HOST>`
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`
{$RABBITMQ.RESPONSE_TIME.MAX.WARN}	The maximum response time by the RabbitMQ expressed in seconds for a trigger expression.	`10`
{$RABBITMQ.MESSAGES.MAX.WARN}	The maximum number of messages in the queue for a trigger expression.	`1000`

Items

Name	Description	Type	Key and additional info
RabbitMQ: Service ping		Simple check	net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
RabbitMQ: Get node overview	The HTTP API endpoint that returns cluster-wide metrics.	HTTP agent	rabbitmq.getnodeoverview Preprocessing Discard unchanged with heartbeat: `1h`
RabbitMQ: Get nodes	The HTTP API endpoint that returns metrics of the nodes.	HTTP agent	rabbitmq.get_nodes
RabbitMQ: Get queues	The HTTP API endpoint that returns metrics of the queues metrics.	HTTP agent	rabbitmq.get_queues
RabbitMQ: Management plugin version	The version of the management plugin in use.	Dependent item	rabbitmq.node.overview.management_version Preprocessing JSON Path: `$.management_version` Discard unchanged with heartbeat: `1d`
RabbitMQ: RabbitMQ version	The version of the RabbitMQ on the node, which processed this request.	Dependent item	rabbitmq.node.overview.rabbitmq_version Preprocessing JSON Path: `$.rabbitmq_version` Discard unchanged with heartbeat: `1d`
RabbitMQ: Used file descriptors	The descriptors of the used file.	Dependent item	rabbitmq.node.fd_used Preprocessing JSON Path: `$.fd_used`
RabbitMQ: Free disk space	The current free disk space.	Dependent item	rabbitmq.node.disk_free Preprocessing JSON Path: `$.disk_free`
RabbitMQ: Disk free limit	The free space limit of a disk expressed in bytes.	Dependent item	rabbitmq.node.diskfreelimit Preprocessing JSON Path: `$.disk_free_limit`
RabbitMQ: Memory used	The memory usage expressed in bytes.	Dependent item	rabbitmq.node.mem_used Preprocessing JSON Path: `$.mem_used`
RabbitMQ: Memory limit	The memory usage with high watermark properties expressed in bytes.	Dependent item	rabbitmq.node.mem_limit Preprocessing JSON Path: `$.mem_limit`
RabbitMQ: Runtime run queue	The average number of Erlang processes waiting to run.	Dependent item	rabbitmq.node.run_queue Preprocessing JSON Path: `$.run_queue`
RabbitMQ: Sockets used	The number of file descriptors used as sockets.	Dependent item	rabbitmq.node.sockets_used Preprocessing JSON Path: `$.sockets_used`
RabbitMQ: Sockets available	The file descriptors available for use as sockets.	Dependent item	rabbitmq.node.sockets_total Preprocessing JSON Path: `$.sockets_total`
RabbitMQ: Number of network partitions	The number of network partitions, which this node "sees".	Dependent item	rabbitmq.node.partitions Preprocessing JSON Path: `$.partitions` JavaScript: `return JSON.parse(value).length;`
RabbitMQ: Is running	It "sees" whether the node is running or not.	Dependent item	rabbitmq.node.running Preprocessing JSON Path: `$.running` Boolean to decimal
RabbitMQ: Memory alarm	It checks whether the host has a memory alarm or not.	Dependent item	rabbitmq.node.mem_alarm Preprocessing JSON Path: `$.mem_alarm` Boolean to decimal
RabbitMQ: Disk free alarm	It checks whether the node has a disk alarm or not.	Dependent item	rabbitmq.node.diskfreealarm Preprocessing JSON Path: `$.disk_free_alarm` Boolean to decimal
RabbitMQ: Uptime	Uptime expressed in milliseconds.	Dependent item	rabbitmq.node.uptime Preprocessing JSON Path: `$.uptime` Custom multiplier: `0.001`
RabbitMQ: Service response time		Simple check	net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"]

Triggers

Name	Description	Expression	Severity
RabbitMQ: Service is down		`last(/RabbitMQ node by HTTP/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0`\|Average	Manual close: Yes
RabbitMQ: Failed to fetch nodes data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ node by HTTP/rabbitmq.get_nodes,30m)=1`\|Warning	Manual close: Yes Depends on: RabbitMQ: Service is down
RabbitMQ: Version has changed	RabbitMQ version has changed. Acknowledge to close the problem manually.	`last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version))>0`\|Info	Manual close: Yes
RabbitMQ: Number of network partitions is too high	For more details see Detecting Network Partitions.	`min(/RabbitMQ node by HTTP/rabbitmq.node.partitions,5m)>0`\|Warning
RabbitMQ: Node is not running	RabbitMQ node is not running.	`max(/RabbitMQ node by HTTP/rabbitmq.node.running,5m)=0`\|Average	Depends on: RabbitMQ: Service is down
RabbitMQ: Memory alarm	For more details see Memory Alarms.	`last(/RabbitMQ node by HTTP/rabbitmq.node.mem_alarm)=1`\|Average
RabbitMQ: Free disk space alarm	For more details see Free Disk Space Alarms.	`last(/RabbitMQ node by HTTP/rabbitmq.node.disk_free_alarm)=1`\|Average
RabbitMQ: Host has been restarted	Uptime is less than 10 minutes.	`last(/RabbitMQ node by HTTP/rabbitmq.node.uptime)<10m`\|Info	Manual close: Yes
RabbitMQ: Service response time is too high		`min(/RabbitMQ node by HTTP/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: RabbitMQ: Service is down

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name	Description	Type	Key and additional info
RabbitMQ: Healthcheck: local alarms in effect on this node{#SINGLETON}	It responds with a status code `200 OK` if there are no alarms in effect in the cluster. Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.local_alarms[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: expiration date on the certificates{#SINGLETON}	It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code `200 OK` if all the certificates are valid (have not expired). Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: virtual hosts on this node{#SINGLETON}	It responds with It responds with a status code `200 OK` if all virtual hosts are running on the target node. Otherwise it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON}	It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code `200 OK` if there are no such classic mirrored queues. Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.mirror_sync[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: queues with minimum online quorum{#SINGLETON}	It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code `200 OK` if there are no such quorum queues. Otherwise, it responds with a status code `503 Service Unavailable`.	HTTP agent	rabbitmq.healthcheck.quorum[{#SINGLETON}] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression
RabbitMQ: There are active alarms in the node	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.local_alarms[{#SINGLETON}])=0`\|Average
RabbitMQ: There are valid TLS certificates expiring in the next month	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}])=0`\|Average
RabbitMQ: There are not running virtual hosts	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}])=0`\|Average
RabbitMQ: There are queues that could potentially lose data if this node goes offline.	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.mirror_sync[{#SINGLETON}])=0`\|Average
RabbitMQ: There are queues that would lose their quorum and availability if this node is shut down.	This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.quorum[{#SINGLETON}])=0`\|Average

LLD rule Health Check 3.8.9- discovery

Name Description Type Key and additional info

Health Check 3.8.9- discovery

Specific metrics for the versions: up to and including 3.8.4.

Dependent item

rabbitmq.healthcheck.v389.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.9- discovery

Name Description Type Key and additional info

RabbitMQ: Healthcheck{#SINGLETON}

It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect.

HTTP agent

rabbitmq.healthcheck[{#SINGLETON}]

Preprocessing

JSON Path: $.status
Boolean to decimal
⛔️Custom on fail: Set value to: 0

Trigger prototypes for Health Check 3.8.9- discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: Node healthcheck failed	For more details see Health Checks.	`last(/RabbitMQ node by HTTP/rabbitmq.healthcheck[{#SINGLETON}])=0`\|Average

LLD rule Queues discovery

Name Description Type Key and additional info

Queues discovery

The metrics for an individual queue.

Dependent item

rabbitmq.queues.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Queues discovery

Name	Description	Type	Key and additional info
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$[?(@.name == "{#QUEUE}" && @.vhost == "{#VHOST}")].first()`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages total	The count of total messages in the queue.	Dependent item	rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages per second	The count of total messages per second in the queue.	Dependent item	rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_details.rate`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Consumers	The number of consumers.	Dependent item	rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.consumers`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Memory	The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures.	Dependent item	rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.memory`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready	The number of messages ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready per second	The number of messages per second ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready_details.rate`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged	The number of messages delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second	The number of messages per second delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged_details.rate`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second	The number of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered	The count of messages delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second	The count of messages (per second) delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second	The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published per second	The rate of published messages per second.	Dependent item	rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered	The count of subset of messages in the `deliver_get` queue with the `redelivered` flag set.	Dependent item	rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second	The rate of messages redelivered per second.	Dependent item	rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Trigger prototypes for Queues discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: Too many messages in queue [{#VHOST}][{#QUEUE}]		`min(/RabbitMQ node by HTTP/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_rabbitmq_agent

View README Download JSON

RabbitMQ cluster by Zabbix agent

Overview

This template is developed to monitor the messaging broker RabbitMQ by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template RabbitMQ Cluster — collects metrics by polling RabbitMQ management plugin with Zabbix agent.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.

Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

A login name and password are also supported in macros functions:

{$RABBITMQ.API.USER}
{$RABBITMQ.API.PASSWORD}

If your cluster consists of several nodes, it is recommended to assign the cluster template to a separate balancing host. In the case of a single-node installation, you can assign the cluster template to one host with a node template.

If you use another API endpoint, then don't forget to change {$RABBITMQ.API.CLUSTER_HOST} macro.

Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.API.CLUSTER_HOST}	The hostname or IP of the API endpoint for the RabbitMQ cluster.	`127.0.0.1`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES}	This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
RabbitMQ: Get overview	The HTTP API endpoint that returns cluster-wide metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
RabbitMQ: Get exchanges	The HTTP API endpoint that returns exchanges metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/exchanges"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
RabbitMQ: Connections total	The total number of connections.	Dependent item	rabbitmq.overview.object_totals.connections Preprocessing JSON Path: `$.object_totals.connections`
RabbitMQ: Channels total	The total number of channels.	Dependent item	rabbitmq.overview.object_totals.channels Preprocessing JSON Path: `$.object_totals.channels`
RabbitMQ: Queues total	The total number of queues.	Dependent item	rabbitmq.overview.object_totals.queues Preprocessing JSON Path: `$.object_totals.queues`
RabbitMQ: Consumers total	The total number of consumers.	Dependent item	rabbitmq.overview.object_totals.consumers Preprocessing JSON Path: `$.object_totals.consumers`
RabbitMQ: Exchanges total	The total number of exchanges.	Dependent item	rabbitmq.overview.object_totals.exchanges Preprocessing JSON Path: `$.object_totals.exchanges`
RabbitMQ: Messages total	The total number of messages (ready, plus unacknowledged).	Dependent item	rabbitmq.overview.queue_totals.messages Preprocessing JSON Path: `$.queue_totals.messages` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages ready for delivery	The number of messages ready for delivery.	Dependent item	rabbitmq.overview.queue_totals.messages.ready Preprocessing JSON Path: `$.queue_totals.messages_ready` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages unacknowledged	The number of unacknowledged messages.	Dependent item	rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing JSON Path: `$.queue_totals.messages_unacknowledged` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.overview.messages.ack.rate Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.overview.messages.confirm Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.overview.messages.confirm.rate Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.overview.messages.deliver_get.rate Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages published	The count of published messages.	Dependent item	rabbitmq.overview.messages.publish Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.overview.messages.publish.rate Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.overview.messages.publish_in.rate Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.overview.messages.publish_out.rate Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.overview.messages.return_unroutable.rate Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned redeliver	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Messages returned redeliver per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.overview.messages.redeliver.rate Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: Failed to fetch overview data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"],30m)=1`\|Warning	Manual close: Yes

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name Description Type Key and additional info

RabbitMQ: Healthcheck: alarms in effect in the cluster{#SINGLETON}

It responds with a status code 200 OK if there are no alarms in effect in the cluster.

Otherwise, it responds with a status code 503 Service Unavailable.

Zabbix agent

web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"]

Preprocessing

Regular expression: HTTP\/1\.1\b\s(\d+) \1
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: There are active alarms in the cluster	This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"])=0`\|Average

LLD rule Exchanges discovery

Name Description Type Key and additional info

Exchanges discovery

The metrics for an individual exchange.

Dependent item

rabbitmq.exchanges.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Exchanges discovery

Name	Description	Type	Key and additional info
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `The text is too long. Please see the template.`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second	The rate of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed	The count of confirmed messages.	Dependent item	rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second	The rate of messages confirmed per second.	Dependent item	rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.confirm_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second	The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second	The rate of messages published per second.	Dependent item	rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in	The count of messages published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second	The rate of messages (per second) published from the channels into this overview.	Dependent item	rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_in_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out	The count of messages published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second	The rate of messages (per second) published from this overview into queues.	Dependent item	rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.publish_out_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable	The count of messages returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second	The rate of messages (per second) returned to a publisher as unroutable.	Dependent item	rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.return_unroutable_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered	The count of subset of messages in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second	The rate of subset of messages (per second) in the `deliver_get`, which had the `redelivered` flag set.	Dependent item	rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

RabbitMQ node by Zabbix agent

Overview

This template is developed to monitor RabbitMQ by Zabbix that works without any external scripts.

Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template RabbitMQ Node — (Zabbix version >= 4.2) collects metrics by polling RabbitMQ management plugin with Zabbix agent.

It also uses Zabbix agent to collect RabbitMQ Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

RabbitMQ 3.5.7, 3.7.7, 3.7.17, 3.7.18, 3.8.5, 3.8.12

Configuration

Setup

Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.

Create a user to monitor the service:

rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions  -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring

A login name and password are also supported in macros functions:

{$RABBITMQ.API.USER}
{$RABBITMQ.API.PASSWORD}

If you use another API endpoint, then don't forget to change {$RABBITMQ.API.HOST} macro. Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$RABBITMQ.API.USER}		`zbx_monitor`
{$RABBITMQ.API.PASSWORD}		`zabbix`
{$RABBITMQ.CLUSTER.NAME}	The name of the RabbitMQ cluster.	`rabbit`
{$RABBITMQ.API.PORT}	The port of the RabbitMQ API endpoint.	`15672`
{$RABBITMQ.API.SCHEME}	The request scheme, which may be HTTP or HTTPS.	`http`
{$RABBITMQ.API.HOST}	The hostname or IP of the API endpoint for the RabbitMQ.	`127.0.0.1`
{$RABBITMQ.PROCESS_NAME}	The process name filter for the RabbitMQ process discovery.	`beam.smp`
{$RABBITMQ.PROCESS.NAME.PARAMETER}	The process name of the RabbitMQ server used in the item key `proc.get`. It could be specified if the correct process name is known.
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`.*`
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES}	This macro is used in the discovery of queues. It can be overridden at host level or its linked template level.	`CHANGE_IF_NEEDED`
{$RABBITMQ.RESPONSE_TIME.MAX.WARN}	The maximum response time by the RabbitMQ expressed in seconds for a trigger expression.	`10`
{$RABBITMQ.MESSAGES.MAX.WARN}	The maximum number of messages in the queue for a trigger expression.	`1000`

Items

Name	Description	Type	Key and additional info
RabbitMQ: Service ping		Zabbix agent	net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
RabbitMQ: Get node overview	The HTTP API endpoint that returns cluster-wide metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing Regular expression: `\n\s?\n(.*) \1` Discard unchanged with heartbeat: `1h`
RabbitMQ: Get nodes	The HTTP API endpoint that returns metrics of the nodes.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
RabbitMQ: Get queues	The HTTP API endpoint that returns metrics of the queues metrics.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/queues"] Preprocessing Regular expression: `\n\s?\n(.*) \1`
RabbitMQ: Management plugin version	The version of the management plugin in use.	Dependent item	rabbitmq.node.overview.management_version Preprocessing JSON Path: `$.management_version` Discard unchanged with heartbeat: `1d`
RabbitMQ: RabbitMQ version	The version of the RabbitMQ on the node, which processed this request.	Dependent item	rabbitmq.node.overview.rabbitmq_version Preprocessing JSON Path: `$.rabbitmq_version` Discard unchanged with heartbeat: `1d`
RabbitMQ: Used file descriptors	The descriptors of the used file.	Dependent item	rabbitmq.node.fd_used Preprocessing JSON Path: `$.fd_used`
RabbitMQ: Free disk space	The current free disk space.	Dependent item	rabbitmq.node.disk_free Preprocessing JSON Path: `$.disk_free`
RabbitMQ: Memory used	The memory usage expressed in bytes.	Dependent item	rabbitmq.node.mem_used Preprocessing JSON Path: `$.mem_used`
RabbitMQ: Memory limit	The memory usage with high watermark properties expressed in bytes.	Dependent item	rabbitmq.node.mem_limit Preprocessing JSON Path: `$.mem_limit`
RabbitMQ: Disk free limit	The free space limit of a disk expressed in bytes.	Dependent item	rabbitmq.node.diskfreelimit Preprocessing JSON Path: `$.disk_free_limit`
RabbitMQ: Runtime run queue	The average number of Erlang processes waiting to run.	Dependent item	rabbitmq.node.run_queue Preprocessing JSON Path: `$.run_queue`
RabbitMQ: Sockets used	The number of file descriptors used as sockets.	Dependent item	rabbitmq.node.sockets_used Preprocessing JSON Path: `$.sockets_used`
RabbitMQ: Sockets available	The file descriptors available for use as sockets.	Dependent item	rabbitmq.node.sockets_total Preprocessing JSON Path: `$.sockets_total`
RabbitMQ: Number of network partitions	The number of network partitions, which this node "sees".	Dependent item	rabbitmq.node.partitions Preprocessing JSON Path: `$.partitions` JavaScript: `return JSON.parse(value).length;`
RabbitMQ: Is running	It "sees" whether the node is running or not.	Dependent item	rabbitmq.node.running Preprocessing JSON Path: `$.running` Boolean to decimal
RabbitMQ: Memory alarm	It checks whether the host has a memory alarm or not.	Dependent item	rabbitmq.node.mem_alarm Preprocessing JSON Path: `$.mem_alarm` Boolean to decimal
RabbitMQ: Disk free alarm	It checks whether the node has a disk alarm or not.	Dependent item	rabbitmq.node.diskfreealarm Preprocessing JSON Path: `$.disk_free_alarm` Boolean to decimal
RabbitMQ: Uptime	Uptime expressed in milliseconds.	Dependent item	rabbitmq.node.uptime Preprocessing JSON Path: `$.uptime` Custom multiplier: `0.001`
RabbitMQ: Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$RABBITMQ.PROCESS.NAME.PARAMETER},,,summary]
RabbitMQ: Service response time		Zabbix agent	net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"]

Triggers

Name	Description	Expression	Severity
RabbitMQ: Version has changed	RabbitMQ version has changed. Acknowledge to close the problem manually.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version))>0`\|Info	Manual close: Yes
RabbitMQ: Number of network partitions is too high	For more details see Detecting Network Partitions.	`min(/RabbitMQ node by Zabbix agent/rabbitmq.node.partitions,5m)>0`\|Warning
RabbitMQ: Memory alarm	For more details see Memory Alarms.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.mem_alarm)=1`\|Average
RabbitMQ: Free disk space alarm	For more details see Free Disk Space Alarms.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.disk_free_alarm)=1`\|Average
RabbitMQ: Host has been restarted	Uptime is less than 10 minutes.	`last(/RabbitMQ node by Zabbix agent/rabbitmq.node.uptime)<10m`\|Info	Manual close: Yes

LLD rule RabbitMQ process discovery

Name	Description	Type	Key and additional info
RabbitMQ process discovery	The discovery of the RabbitMQ summary processes.	Dependent item	rabbitmq.proc.discovery

Item prototypes for RabbitMQ process discovery

Name	Description	Type	Key and additional info
RabbitMQ: Get process data	The summary metrics aggregated by a process {#RABBITMQ.NAME}.	Dependent item	rabbitmq.proc.get[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#RABBITMQ.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#RABBITMQ.NAME} data`
RabbitMQ: Number of running processes	The number of running processes {#RABBITMQ.NAME}.	Dependent item	rabbitmq.proc.num[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
RabbitMQ: Memory usage (rss)	The summary of resident set size memory used by a process {#RABBITMQ.NAME} expressed in bytes.	Dependent item	rabbitmq.proc.rss[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
RabbitMQ: Memory usage (vsize)	The summary of virtual memory used by a process {#RABBITMQ.NAME} expressed in bytes.	Dependent item	rabbitmq.proc.vmem[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
RabbitMQ: Memory usage, %	The percentage of real memory used by a process {#RABBITMQ.NAME}.	Dependent item	rabbitmq.proc.pmem[{#RABBITMQ.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
RabbitMQ: CPU utilization	The percentage of the CPU utilization by a process {#RABBITMQ.NAME}.	Zabbix agent	proc.cpu.util[{#RABBITMQ.NAME}]

Trigger prototypes for RabbitMQ process discovery

Name	Description	Expression	Severity
RabbitMQ: Process is not running		`last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])=0`\|High
RabbitMQ: Failed to fetch nodes data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"],30m)=1 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Warning	Manual close: Yes Depends on: RabbitMQ: Process is not running
RabbitMQ: Service is down		`last(/RabbitMQ node by Zabbix agent/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Average	Manual close: Yes
RabbitMQ: Node is not running	RabbitMQ node is not running.	`max(/RabbitMQ node by Zabbix agent/rabbitmq.node.running,5m)=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Average	Depends on: RabbitMQ: Service is down
RabbitMQ: Service response time is too high		`min(/RabbitMQ node by Zabbix agent/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0`\|Warning	Manual close: Yes Depends on: RabbitMQ: Service is down

LLD rule Health Check 3.8.10+ discovery

Name Description Type Key and additional info

Health Check 3.8.10+ discovery

Specific metrics for the versions: up to and including 3.8.10.

Dependent item

rabbitmq.healthcheck.v3810.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.10+ discovery

Name	Description	Type	Key and additional info
RabbitMQ: Healthcheck: local alarms in effect on this node{#SINGLETON}	It responds with a status code `200 OK` if there are no local alarms in effect on the target node. Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: expiration date on the certificates{#SINGLETON}	It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code `200 OK` if all the certificates are valid (have not expired). Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: virtual hosts on this node{#SINGLETON}	It responds with It responds with a status code `200 OK` if all virtual hosts are running on the target node. Otherwise it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON}	It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code `200 OK` if there are no such classic mirrored queues. Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
RabbitMQ: Healthcheck: queues with minimum online quorum{#SINGLETON}	It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code `200 OK` if there are no such quorum queues. Otherwise, it responds with a status code `503 Service Unavailable`.	Zabbix agent	web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"] Preprocessing Regular expression: `HTTP\/1\.1\b\s(\d+) \1` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Health Check 3.8.10+ discovery

Name	Description	Expression
RabbitMQ: There are active alarms in the node	It checks the active alarms in the nodes via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"])=0`\|Average
RabbitMQ: There are valid TLS certificates expiring in the next month	It checks if there are valid TLS certificates expiring in the next month. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"])=0`\|Average
RabbitMQ: There are not running virtual hosts	It checks if there are not running virtual hosts via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"])=0`\|Average
RabbitMQ: There are queues that could potentially lose data if this node goes offline.	It checks whether there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"])=0`\|Average
RabbitMQ: There are queues that would lose their quorum and availability if this node is shut down.	It checks if there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"])=0`\|Average

LLD rule Health Check 3.8.9- discovery

Name Description Type Key and additional info

Health Check 3.8.9- discovery

Specific metrics for the versions: up to and including 3.8.4.

Dependent item

rabbitmq.healthcheck.v389.discovery

Preprocessing

JSON Path: $.management_version
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Health Check 3.8.9- discovery

Name Description Type Key and additional info

RabbitMQ: Healthcheck{#SINGLETON}

It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect.

Zabbix agent

web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"]

Preprocessing

Regular expression: \n\s?\n(.*) \1
JSON Path: $.status
Boolean to decimal
⛔️Custom on fail: Set value to: 0

Trigger prototypes for Health Check 3.8.9- discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: Node healthcheck failed	For more details see Health Checks.	`last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"])=0`\|Average

LLD rule Queues discovery

Name Description Type Key and additional info

Queues discovery

The metrics for an individual queue.

Dependent item

rabbitmq.queues.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Queues discovery

Name	Description	Type	Key and additional info
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Get data	The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics	Dependent item	rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$[?(@.name == "{#QUEUE}" && @.vhost == "{#VHOST}")].first()`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages total	The count of total messages in the queue.	Dependent item	rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages per second	The count of total messages per second in the queue.	Dependent item	rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_details.rate`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Consumers	The number of consumers.	Dependent item	rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.consumers`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Memory	The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures.	Dependent item	rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.memory`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready	The number of messages ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready per second	The number of messages per second ready to be delivered to clients.	Dependent item	rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_ready_details.rate`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged	The number of messages delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second	The number of messages per second delivered to clients but not yet acknowledged.	Dependent item	rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.messages_unacknowledged_details.rate`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged	The number of messages delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second	The number of messages (per second) delivered to clients and acknowledged.	Dependent item	rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.ack_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered	The count of messages delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second	The count of messages (per second) delivered to consumers in acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered	The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second	The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to `basic.get`: in acknowledgement mode and in no-acknowledgement mode.	Dependent item	rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.deliver_get_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published	The count of published messages.	Dependent item	rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published per second	The rate of published messages per second.	Dependent item	rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.publish_details.rate` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered	The count of subset of messages in the `deliver_get` queue with the `redelivered` flag set.	Dependent item	rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver` ⛔️Custom on fail: Set value to: `0`
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second	The rate of messages redelivered per second.	Dependent item	rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing JSON Path: `$.message_stats.redeliver_details.rate` ⛔️Custom on fail: Set value to: `0`

Trigger prototypes for Queues discovery

Name	Description	Expression	Severity	Dependencies and additional info
RabbitMQ: Too many messages in queue [{#VHOST}][{#QUEUE}]		`min(/RabbitMQ node by Zabbix agent/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_proxmox

View README Download JSON

Proxmox VE by HTTP

Overview

This template is designed for the effortless deployment of Proxmox VE monitoring by Zabbix via HTTP and doesn't require any external scripts.

Proxmox VE uses a REST like API. The concept is described in Resource Oriented Architecture (ROA).

Check the API documentation for details.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Proxmox VE

Configuration

Setup

Create an API token for the monitoring user. Important note: for security reasons, it is recommended to create a separate user (Datacenter - Permissions).

Please provide the necessary access levels for both the User and the Token:

Check: ["perm","/",["Sys.Audit"]]
Check: ["perm","/storage",["Datastore.Audit"]]
Check: ["perm","/vms",["VM.Audit"]]

Copy the resulting Token ID and Secret into the host macros {$PVE.TOKEN.ID} and {$PVE.TOKEN.SECRET}.
Set the hostname or IP address of the Proxmox API VE host in the {$PVE.URL.HOST} macro. You can also change the API port in the {$PVE.URL.PORT} macro if necessary.

Macros used

Name	Description	Default
{$PVE.URL.HOST}	The hostname or IP address of the Proxmox VE API host.	`<SET PVE HOST>`
{$PVE.URL.PORT}	The API uses the HTTPS protocol and the server listens to port 8006 by default.	`8006`
{$PVE.TOKEN.ID}	API tokens allow stateless access to most parts of the REST API by another system, software or API client.	`USER@REALM!TOKENID`
{$PVE.TOKEN.SECRET}	Secret key.	`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`
{$PVE.ROOT.PUSE.MAX.WARN}	Maximum used root space in percentage.	`90`
{$PVE.MEMORY.PUSE.MAX.WARN}	Maximum used memory in percentage.	`90`
{$PVE.CPU.PUSE.MAX.WARN}	Maximum used CPU in percentage.	`90`
{$PVE.SWAP.PUSE.MAX.WARN}	Maximum used swap space in percentage.	`90`
{$PVE.VM.MEMORY.PUSE.MAX.WARN}	Maximum used memory in percentage.	`90`
{$PVE.VM.CPU.PUSE.MAX.WARN}	Maximum used CPU in percentage.	`90`
{$PVE.LXC.MEMORY.PUSE.MAX.WARN}	Maximum used memory in percentage.	`90`
{$PVE.LXC.CPU.PUSE.MAX.WARN}	Maximum used CPU in percentage.	`90`
{$PVE.STORAGE.PUSE.MAX.WARN}	Maximum used storage space in percentage.	`90`

Items

Name Description Type Key and additional info

Proxmox: Get cluster resources

Resources index.

HTTP agent

proxmox.cluster.resources

Preprocessing

Check for not supported value
⛔️Custom on fail: Set value to: Error getting data

Proxmox: Get cluster status

Get cluster status information.

HTTP agent

proxmox.cluster.status

Preprocessing

Check for not supported value
⛔️Custom on fail: Set value to: Error getting data

Proxmox: API service status

Get API service status.

Script

proxmox.api.available

Preprocessing

Discard unchanged with heartbeat: 12h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Proxmox: API service not available	The API service is not available. Check your network and authorization settings.	`last(/Proxmox VE by HTTP/proxmox.api.available) <> 200`\|High

LLD rule Cluster discovery

Name	Description	Type	Key and additional info
Cluster discovery		Dependent item	proxmox.cluster.discovery

Item prototypes for Cluster discovery

Name Description Type Key and additional info

Proxmox: Cluster [{#RESOURCE.NAME}]: Quorate

Indicates if there is a majority of nodes online to make decisions.

Dependent item

proxmox.cluster.quorate[{#RESOURCE.NAME}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 10m

Trigger prototypes for Cluster discovery

Name	Description	Expression	Severity	Dependencies and additional info
Proxmox: Cluster [{#RESOURCE.NAME}] not quorum	Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes.	`last(/Proxmox VE by HTTP/proxmox.cluster.quorate[{#RESOURCE.NAME}]) <> 1`\|High

LLD rule Node discovery

Name	Description	Type	Key and additional info
Node discovery		Dependent item	proxmox.node.discovery

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Proxmox: Node [{#NODE.NAME}]: Status	Indicates if the node is online or offline.	Dependent item	proxmox.node.online[{#NODE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Status	Read node status.	HTTP agent	proxmox.node.status[{#NODE.NAME}]
Proxmox: Node [{#NODE.NAME}]: RRD statistics	Read node RRD statistics.	HTTP agent	proxmox.node.rrd[{#NODE.NAME}] Preprocessing JavaScript: `The text is too long. Please see the template.`
Proxmox: Node [{#NODE.NAME}]: Time	Read server time and time zone settings.	HTTP agent	proxmox.node.time[{#NODE.NAME}]
Proxmox: Node [{#NODE.NAME}]: Uptime	The system uptime expressed in the following format: "N days, hh:mm:ss".	Dependent item	proxmox.node.uptime[{#NODE.NAME}] Preprocessing JSON Path: `$.data.uptime`
Proxmox: Node [{#NODE.NAME}]: PVE version	PVE manager version.	Dependent item	proxmox.node.pveversion[{#NODE.NAME}] Preprocessing JSON Path: `$.data.pveversion` Discard unchanged with heartbeat: `1d`
Proxmox: Node [{#NODE.NAME}]: Kernel version	Kernel version info.	Dependent item	proxmox.node.kernelversion[{#NODE.NAME}] Preprocessing JSON Path: `$.data.kversion` Discard unchanged with heartbeat: `1d`
Proxmox: Node [{#NODE.NAME}]: Root filesystem, used	Root filesystem usage.	Dependent item	proxmox.node.rootused[{#NODE.NAME}] Preprocessing JSON Path: `$.rootused` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Root filesystem, total	Root filesystem total.	Dependent item	proxmox.node.roottotal[{#NODE.NAME}] Preprocessing JSON Path: `$.roottotal` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Memory, used	Memory usage.	Dependent item	proxmox.node.memused[{#NODE.NAME}] Preprocessing JSON Path: `$.memused` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Memory, total	Memory total.	Dependent item	proxmox.node.memtotal[{#NODE.NAME}] Preprocessing JSON Path: `$.memtotal` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: CPU, usage	CPU usage.	Dependent item	proxmox.node.cpu[{#NODE.NAME}] Preprocessing JSON Path: `$.cpu` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Outgoing data, rate	Network usage.	Dependent item	proxmox.node.netout[{#NODE.NAME}] Preprocessing JSON Path: `$.netout` Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Incoming data, rate	Network usage.	Dependent item	proxmox.node.netin[{#NODE.NAME}] Preprocessing JSON Path: `$.netin` Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: CPU, loadavg	CPU average load.	Dependent item	proxmox.node.loadavg[{#NODE.NAME}] Preprocessing JSON Path: `$.loadavg` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: CPU, iowait	CPU iowait time.	Dependent item	proxmox.node.iowait[{#NODE.NAME}] Preprocessing JSON Path: `$.iowait` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Swap filesystem, total	Swap total.	Dependent item	proxmox.node.swaptotal[{#NODE.NAME}] Preprocessing JSON Path: `$.swaptotal` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Swap filesystem, used	Swap used.	Dependent item	proxmox.node.swapused[{#NODE.NAME}] Preprocessing JSON Path: `$.swapused` Discard unchanged with heartbeat: `10m`
Proxmox: Node [{#NODE.NAME}]: Time zone	Time zone.	Dependent item	proxmox.node.timezone[{#NODE.NAME}] Preprocessing JSON Path: `$.data.timezone` Discard unchanged with heartbeat: `12h`
Proxmox: Node [{#NODE.NAME}]: Localtime	Seconds since 1970-01-01 00:00:00 (local time).	Dependent item	proxmox.node.localtime[{#NODE.NAME}] Preprocessing JSON Path: `$.data.localtime`
Proxmox: Node [{#NODE.NAME}]: Time	Seconds since 1970-01-01 00:00:00 UTC.	Dependent item	proxmox.node.utctime[{#NODE.NAME}] Preprocessing JSON Path: `$.data.time`

Trigger prototypes for Node discovery

Name	Description	Expression	Severity
Proxmox: Node [{#NODE.NAME}] offline	Node offline.	`last(/Proxmox VE by HTTP/proxmox.node.online[{#NODE.NAME}]) <> 1`\|High
Proxmox: Node [{#NODE.NAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/Proxmox VE by HTTP/proxmox.node.uptime[{#NODE.NAME}])<10m`\|Info	Manual close: Yes Depends on: Proxmox: Node [{#NODE.NAME}] offline
Proxmox: Node [{#NODE.NAME}]: PVE manager has changed	Firmware version has changed. Acknowledge to close the problem manually.	`last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}]))>0`\|Info	Manual close: Yes
Proxmox: Node [{#NODE.NAME}]: Kernel version has changed	Firmware version has changed. Acknowledge to close the problem manually.	`last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}]))>0`\|Info	Manual close: Yes
Proxmox: Node [{#NODE.NAME}] high root filesystem space usage	Root filesystem space usage.	`min(/Proxmox VE by HTTP/proxmox.node.rootused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.roottotal[{#NODE.NAME}]) * 100 >{$PVE.ROOT.PUSE.MAX.WARN:"{#NODE.NAME}"}`\|Warning
Proxmox: Node [{#NODE.NAME}] high memory usage	Memory usage.	`min(/Proxmox VE by HTTP/proxmox.node.memused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.memtotal[{#NODE.NAME}]) * 100 >{$PVE.MEMORY.PUSE.MAX.WARN:"{#NODE.NAME}"}`\|Warning
Proxmox: Node [{#NODE.NAME}] high CPU usage	CPU usage.	`min(/Proxmox VE by HTTP/proxmox.node.cpu[{#NODE.NAME}],5m) > {$PVE.CPU.PUSE.MAX.WARN:"{#NODE.NAME}"}`\|Warning
Proxmox: Node [{#NODE.NAME}] high swap space usage	If there is no swap configured, this trigger is ignored.	`min(/Proxmox VE by HTTP/proxmox.node.swapused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) * 100 > {$PVE.SWAP.PUSE.MAX.WARN:"{#NODE.NAME}"} and last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) > 0`\|Warning

LLD rule Storage discovery

Name	Description	Type	Key and additional info
Storage discovery		Dependent item	proxmox.storage.discovery

Item prototypes for Storage discovery

Name	Description	Type	Key and additional info
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Type	More specific type, if available.	Dependent item	proxmox.node.plugintype[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Size	Storage size in bytes.	Dependent item	proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Content	Allowed storage content types.	Dependent item	proxmox.node.content[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Used	Used disk space in bytes.	Dependent item	proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`

Trigger prototypes for Storage discovery

Name	Description	Expression	Severity	Dependencies and additional info
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}] high filesystem space usage	Root filesystem space usage.	`min(/Proxmox VE by HTTP/proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}]) * 100 >{$PVE.STORAGE.PUSE.MAX.WARN:"{#NODE.NAME}/{#STORAGE.NAME}"}`\|Warning

LLD rule QEMU discovery

Name	Description	Type	Key and additional info
QEMU discovery		Dependent item	proxmox.qemu.discovery

Item prototypes for QEMU discovery

Name	Description	Type	Key and additional info
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk write, rate	Disk write.	Dependent item	proxmox.qemu.diskwrite[{#QEMU.ID}] Preprocessing JSON Path: `$.data.diskwrite` Change per second Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk read, rate	Disk read.	Dependent item	proxmox.qemu.diskread[{#QEMU.ID}] Preprocessing JSON Path: `$.data.diskread` Change per second Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory usage	Used memory in bytes.	Dependent item	proxmox.qemu.mem[{#QEMU.ID}] Preprocessing JSON Path: `$.data.mem` Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory total	The total memory expressed in bytes.	Dependent item	proxmox.qemu.maxmem[{#QEMU.ID}] Preprocessing JSON Path: `$.data.maxmem` Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Incoming data, rate	Incoming data rate.	Dependent item	proxmox.qemu.netin[{#QEMU.ID}] Preprocessing JSON Path: `$.data.netin` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Outgoing data, rate	Outgoing data rate.	Dependent item	proxmox.qemu.netout[{#QEMU.ID}] Preprocessing JSON Path: `$.data.netout` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: CPU usage	CPU load.	Dependent item	proxmox.qemu.cpu[{#QEMU.ID}] Preprocessing JSON Path: `$.data.cpu` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME}]: Get data	Get VM status data.	HTTP agent	proxmox.qemu.get.data[{#QEMU.ID}]
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Uptime	The system uptime expressed in the following format: "N days, hh:mm:ss".	Dependent item	proxmox.qemu.uptime[{#QEMU.ID}] Preprocessing JSON Path: `$.data.uptime`
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Status	Status of Virtual Machine.	Dependent item	proxmox.qemu.vmstatus[{#QEMU.ID}] Preprocessing JSON Path: `$.data.status`

Trigger prototypes for QEMU discovery

Name	Description	Expression	Severity
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high memory usage	Memory usage.	`min(/Proxmox VE by HTTP/proxmox.qemu.mem[{#QEMU.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.qemu.maxmem[{#QEMU.ID}]) * 100 >{$PVE.VM.MEMORY.PUSE.MAX.WARN:"{#QEMU.ID}"}`\|Warning
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high CPU usage	CPU usage.	`min(/Proxmox VE by HTTP/proxmox.qemu.cpu[{#QEMU.ID}],5m) > {$PVE.VM.CPU.PUSE.MAX.WARN:"{#QEMU.ID}"}`\|Warning
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/Proxmox VE by HTTP/proxmox.qemu.uptime[{#QEMU.ID}])<10m`\|Info	Manual close: Yes Depends on: Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running	VM state is not "running".	`last(/Proxmox VE by HTTP/proxmox.qemu.vmstatus[{#QEMU.ID}])<>"running"`\|Average

LLD rule LXC discovery

Name	Description	Type	Key and additional info
LXC discovery		Dependent item	proxmox.lxc.discovery

Item prototypes for LXC discovery

Name	Description	Type	Key and additional info
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME}]: Get data	Get LXC status data.	HTTP agent	proxmox.lxc.get.data[{#LXC.ID}]
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Uptime	The system uptime expressed in the following format: "N days, hh:mm:ss".	Dependent item	proxmox.lxc.uptime[{#LXC.ID}] Preprocessing JSON Path: `$.data.uptime`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Status	Status of LXC container.	Dependent item	proxmox.lxc.vmstatus[{#LXC.ID}] Preprocessing JSON Path: `$.data.status`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk write, rate	Disk write.	Dependent item	proxmox.lxc.diskwrite[{#LXC.ID}] Preprocessing JSON Path: `$.data.diskwrite` Change per second Discard unchanged with heartbeat: `10m`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk read, rate	Disk read.	Dependent item	proxmox.lxc.diskread[{#LXC.ID}] Preprocessing JSON Path: `$.data.diskread` Change per second Discard unchanged with heartbeat: `10m`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory usage	Used memory in bytes.	Dependent item	proxmox.lxc.mem[{#LXC.ID}] Preprocessing JSON Path: `$.data.mem` Discard unchanged with heartbeat: `10m`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory total	The total memory expressed in bytes.	Dependent item	proxmox.lxc.maxmem[{#LXC.ID}] Preprocessing JSON Path: `$.data.maxmem` Discard unchanged with heartbeat: `10m`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Incoming data, rate	Incoming data rate.	Dependent item	proxmox.lxc.netin[{#LXC.ID}] Preprocessing JSON Path: `$.data.netin` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Outgoing data, rate	Outgoing data rate.	Dependent item	proxmox.lxc.netout[{#LXC.ID}] Preprocessing JSON Path: `$.data.netout` Change per second Custom multiplier: `8` Discard unchanged with heartbeat: `10m`
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: CPU usage	CPU load.	Dependent item	proxmox.lxc.cpu[{#LXC.ID}] Preprocessing JSON Path: `$.data.cpu` Custom multiplier: `100` Discard unchanged with heartbeat: `10m`

Trigger prototypes for LXC discovery

Name	Description	Expression	Severity
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/Proxmox VE by HTTP/proxmox.lxc.uptime[{#LXC.ID}])<10m`\|Info	Manual close: Yes Depends on: Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running	LXC state is not "running".	`last(/Proxmox VE by HTTP/proxmox.lxc.vmstatus[{#LXC.ID}])<>"running"`\|Average
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high memory usage	Memory usage.	`min(/Proxmox VE by HTTP/proxmox.lxc.mem[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxmem[{#LXC.ID}]) * 100 >{$PVE.LXC.MEMORY.PUSE.MAX.WARN:"{#LXC.ID}"}`\|Warning
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high CPU usage	CPU usage.	`min(/Proxmox VE by HTTP/proxmox.lxc.cpu[{#LXC.ID}],5m) > {$PVE.LXC.CPU.PUSE.MAX.WARN:"{#LXC.ID}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

module_process

View README Download JSON

OS processes by Zabbix agent

Overview

This template is designed to monitor processes by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. For example, by specifying "zabbix" as macro value, you can monitor all zabbix processes.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

CentOS Linux 8
Ubuntu 22.04.1 LTS

Configuration

Setup

Install and setup Zabbix agent.

Custom processes set in macros:

{$PROC.NAME.MATCHES}
{$PROC.NAME.NOT_MATCHES}

Macros used

Name	Description	Default
{$PROC.NAME.MATCHES}	This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level.	`<CHANGE VALUE>`
{$PROC.NAME.NOT_MATCHES}	This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level.	`<CHANGE VALUE>`

Items

Name	Description	Type	Key and additional info
OS: Get process summary	The summary of data metrics for all processes.	Zabbix agent	proc.get[,,,summary]

LLD rule Processes discovery

Name	Description	Type	Key and additional info
Processes discovery	Discovery of OS summary processes.	Dependent item	custom.proc.discovery

Item prototypes for Processes discovery

Name	Description	Type	Key and additional info
Process [{#NAME}]: Get data	Summary metrics collected during the process {#NAME}.	Dependent item	custom.proc.get[{#NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#NAME} data`
Process [{#NAME}]: Memory usage (rss)	The summary of Resident Set Size (RSS) memory used by the process {#NAME} in bytes.	Dependent item	custom.proc.rss[{#NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Memory usage (vsize)	The summary of virtual memory used by process {#NAME} in bytes.	Dependent item	custom.proc.vmem[{#NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Memory usage, %	The percentage of real memory used by the process {#NAME}.	Dependent item	custom.proc.pmem[{#NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Number of running processes	The number of running processes {#NAME}.	Dependent item	custom.proc.num[{#NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
Process [{#NAME}]: Number of threads	The number of threads {#NAME}.	Dependent item	custom.proc.thread[{#NAME}] Preprocessing JSON Path: `$.threads` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Number of page faults	The number of page faults {#NAME}.	Dependent item	custom.proc.page[{#NAME}] Preprocessing JSON Path: `$.page_faults` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Size of locked memory	The size of locked memory {#NAME}.	Dependent item	custom.proc.mem.locked[{#NAME}] Preprocessing JSON Path: `$.lck` ⛔️Custom on fail: Discard value
Process [{#NAME}]: Swap space used	The swap space used by {#NAME}.	Dependent item	custom.proc.swap[{#NAME}] Preprocessing JSON Path: `$.swap` ⛔️Custom on fail: Discard value

Trigger prototypes for Processes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Process [{#NAME}]: Process is not running		`last(/OS processes by Zabbix agent/custom.proc.num[{#NAME}])=0`\|High	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_php-fpm_http

View README Download JSON

PHP-FPM by HTTP

Overview

This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template PHP-FPM by Zabbix agent - collects metrics by polling PHP-FPM status-page with HTTP agent remotely.

Note that this solution supports HTTPS and redirects.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

PHP 7
PHP 8

Configuration

Setup

Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2, systemd service name - php8.2-fpm. Adjust the following instructions accordingly if needed.

Open the PHP-FPM configuration file and enable the status page as shown.
```
pm.status_path = /status
ping.path = /ping
```
Validate the syntax to ensure it is correct before you reload the service. Replace the <version> in the command if needed.
```
$ php-fpm -t
```
or
```
$ php-fpm<version> -t
```
Reload the php-fpm service to make the change active. Replace the <version> in the command if needed.
```
$ systemctl reload php-fpm
```
or
```
$ systemctl reload php<version>-fpm
```
Next, edit the configuration of your web server.

If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.

# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;

## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4;    # your IP here
# deny all;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}

If you use Apache, edit the configuration file of the virtual host and add the following location blocks.

<LocationMatch "/status">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>

<LocationMatch "/ping">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>

Check the web server configuration syntax. The command may vary depending on the OS distribution and web server.
```
$ nginx -t
```
or
```
$ httpd -t
```
or
```
$ apachectl configtest
```

Reload the web server configuration. The command may vary depending on the OS distribution and web server.
```
$ systemctl reload nginx
```
or
```
$ systemctl reload httpd
```
or
```
$ systemctl reload apache2
```
Verify that the pages are available with these commands.
```
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
```

If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE} macro.

If you use another web server port or scheme for the location of the PHP-FPM status/ping pages, don't forget to change the macros {$PHP_FPM.SCHEME} and {$PHP_FPM.PORT}.

Macros used

Name	Description	Default
{$PHP_FPM.PORT}	The port of the PHP-FPM status host or container.	`80`
{$PHP_FPM.SCHEME}	Request scheme which may be http or https	`http`
{$PHP_FPM.HOST}	The hostname or IP address of the PHP-FPM status for a host or container.	`localhost`
{$PHP_FPM.STATUS.PAGE}	The path of the PHP-FPM status page.	`status`
{$PHP_FPM.PING.PAGE}	The path of the PHP-FPM ping page.	`ping`
{$PHP_FPM.PING.REPLY}	The expected reply to the ping.	`pong`
{$PHP_FPM.QUEUE.WARN.MAX}	The maximum percent of the PHP-FPM queue usage for a trigger expression.	`80`

Items

Name	Description	Type	Key and additional info
PHP-FPM: Get ping page		HTTP agent	php-fpm.get_ping
PHP-FPM: Get status page		HTTP agent	php-fpm.get_status
PHP-FPM: Ping		Dependent item	php-fpm.ping Preprocessing Regular expression: `{$PHP_FPM.PING.REPLY}($	\r?\n) 1`</p><p>⛔️Custom on fail: Set value to:`0`
PHP-FPM: Processes, active	The total number of active processes.	Dependent item	php-fpm.processes_active Preprocessing JSON Path: `$.['active processes']`
PHP-FPM: Version	The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers.	Dependent item	php-fpm.version Preprocessing Regular expression: `^[.\s\S]*X-Powered-By: PHP/([.\d]{1,}) \1` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
PHP-FPM: Pool name	The name of the current pool.	Dependent item	php-fpm.name Preprocessing JSON Path: `$.pool` Discard unchanged with heartbeat: `3h`
PHP-FPM: Uptime	It indicates how long has this pool been running.	Dependent item	php-fpm.uptime Preprocessing JSON Path: `$.['start since']`
PHP-FPM: Start time	The time when this pool was started.	Dependent item	php-fpm.start_time Preprocessing JSON Path: `$.['start time']`
PHP-FPM: Processes, total	The total number of server processes currently running.	Dependent item	php-fpm.processes_total Preprocessing JSON Path: `$.['total processes']`
PHP-FPM: Processes, idle	The total number of idle processes.	Dependent item	php-fpm.processes_idle Preprocessing JSON Path: `$.['idle processes']`
PHP-FPM: Process manager	The method used by the process manager to control the number of child processes for this pool.	Dependent item	php-fpm.process_manager Preprocessing JSON Path: `$.['process manager']` Discard unchanged with heartbeat: `3h`
PHP-FPM: Processes, max active	The highest value of "active processes" since the PHP-FPM server was started.	Dependent item	php-fpm.processesmaxactive Preprocessing JSON Path: `$.['max active processes']`
PHP-FPM: Accepted connections per second	The number of accepted requests per second.	Dependent item	php-fpm.conn_accepted.rate Preprocessing JSON Path: `$.['accepted conn']` Change per second
PHP-FPM: Slow requests	The number of requests that has exceeded your `request_slowlog_timeout` value.	Dependent item	php-fpm.slow_requests Preprocessing JSON Path: `$.['slow requests']` Simple change
PHP-FPM: Listen queue	The current number of connections that have been initiated but not yet accepted.	Dependent item	php-fpm.listen_queue Preprocessing JSON Path: `$.['listen queue']`
PHP-FPM: Listen queue, max	The maximum number of requests in the queue of pending connections since this FPM pool was started.	Dependent item	php-fpm.listenqueuemax Preprocessing JSON Path: `$.['max listen queue']`
PHP-FPM: Listen queue, len	The size of the socket queue of pending connections.	Dependent item	php-fpm.listenqueuelen Preprocessing JSON Path: `$.['listen queue len']`
PHP-FPM: Queue usage	The utilization of the queue.	Calculated	php-fpm.listenqueueusage
PHP-FPM: Max children reached	The number of times that `pm.max_children` has been reached since the PHP-FPM pool was started.	Dependent item	php-fpm.max_children Preprocessing JSON Path: `$.['max children reached']` Simple change

Triggers

Name	Description	Expression	Severity
PHP-FPM: Service is down		`last(/PHP-FPM by HTTP/php-fpm.ping)=0 or nodata(/PHP-FPM by HTTP/php-fpm.ping,3m)=1`\|High	Manual close: Yes
PHP-FPM: Version has changed	The PHP-FPM version has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by HTTP/php-fpm.version,#1)<>last(/PHP-FPM by HTTP/php-fpm.version,#2) and length(last(/PHP-FPM by HTTP/php-fpm.version))>0`\|Info	Manual close: Yes
PHP-FPM: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/PHP-FPM by HTTP/php-fpm.uptime,30m)=1`\|Info	Manual close: Yes Depends on: PHP-FPM: Service is down
PHP-FPM: Pool has been restarted	Uptime is less than 10 minutes.	`last(/PHP-FPM by HTTP/php-fpm.uptime)<10m`\|Info	Manual close: Yes
PHP-FPM: Manager changed	The PHP-FPM manager has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by HTTP/php-fpm.process_manager,#1)<>last(/PHP-FPM by HTTP/php-fpm.process_manager,#2)`\|Info	Manual close: Yes
PHP-FPM: Detected slow requests	The PHP-FPM has detected a slow request. The slow request means that it took more time to execute than expected (defined in the configuration of your pool).	`min(/PHP-FPM by HTTP/php-fpm.slow_requests,#3)>0`\|Warning
PHP-FPM: Queue utilization is high	The queue for this pool has reached `{$PHP_FPM.QUEUE.WARN.MAX}%` of its maximum capacity. Items in the queue represent the current number of connections that have been initiated on this pool but not yet accepted.	`min(/PHP-FPM by HTTP/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_php-fpm_agent

View README Download JSON

PHP-FPM by Zabbix agent

Overview

This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix agent that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template PHP-FPM by Zabbix agent - collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.

Note that this template doesn't support HTTPS and redirects (limitations of web.page.get).

It also uses Zabbix agent to collect php-fpm Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

PHP 7
PHP 8

Configuration

Setup

Open the PHP-FPM configuration file and enable the status page as shown.
```
pm.status_path = /status
ping.path = /ping
```
Validate the syntax to ensure it is correct before you reload the service. Replace the <version> in the command if needed.
```
$ php-fpm -t
```
or
```
$ php-fpm<version> -t
```
Reload the php-fpm service to make the change active. Replace the <version> in the command if needed.
```
$ systemctl reload php-fpm
```
or
```
$ systemctl reload php<version>-fpm
```
Next, edit the configuration of your web server.

If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.

# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;

## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4;    # your IP here
# deny all;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}

If you use Apache, edit the configuration file of the virtual host and add the following location blocks.

<LocationMatch "/status">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>

<LocationMatch "/ping">
    Require ip 127.0.0.1
    # Require ip 1.2.3.4    # Your IP here
    # Adjust the path to the socket if needed
    ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>

Check the web server configuration syntax. The command may vary depending on the OS distribution and web server.
```
$ nginx -t
```
or
```
$ httpd -t
```
or
```
$ apachectl configtest
```

Reload the web server configuration. The command may vary depending on the OS distribution and web server.
```
$ systemctl reload nginx
```
or
```
$ systemctl reload httpd
```
or
```
$ systemctl reload apache2
```
Verify that the pages are available with these commands.
```
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
```

Depending on your OS distribution, the PHP-FPM process name may vary as well. Please check the actual name in the line "Name" from /proc/<pid>/status file (https://www.zabbix.com/documentation/6.4/manual/appendix/items/procmemnumnotes) and change the {$PHPFPM.PROCESS.NAME.PARAMETER} macro if needed.

If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE} macro.

If you use another web server port for the location of the PHP-FPM status/ping pages, don't forget to change the macro {$PHP_FPM.PORT}.

Macros used

Name	Description	Default
{$PHP_FPM.PORT}	The port of the PHP-FPM status host or container.	`80`
{$PHP_FPM.HOST}	The hostname or IP address of the PHP-FPM status for a host or container.	`localhost`
{$PHP_FPM.STATUS.PAGE}	The path of the PHP-FPM status page.	`status`
{$PHP_FPM.PING.PAGE}	The path of the PHP-FPM ping page.	`ping`
{$PHP_FPM.PING.REPLY}	The expected reply to the ping.	`pong`
{$PHP_FPM.QUEUE.WARN.MAX}	The maximum percent of the PHP-FPM queue usage for a trigger expression.	`80`
{$PHPFPM.PROCESSNAME}	The process name filter for the PHP-FPM process discovery. May vary depending on your OS distribution.	`php-fpm`
{$PHP_FPM.PROCESS.NAME.PARAMETER}	The process name of the PHP-FPM used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
PHP-FPM: Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$PHP_FPM.PROCESS.NAME.PARAMETER},,,summary]
PHP-FPM: php-fpm_ping		Zabbix agent	web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"]
PHP-FPM: Get status page		Zabbix agent	web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing Regular expression: `^[.\s\S]*({.+}) \1`
PHP-FPM: Ping		Dependent item	php-fpm.ping Preprocessing Regular expression: `{$PHP_FPM.PING.REPLY}($	\r?\n) 1`</p><p>⛔️Custom on fail: Set value to:`0`
PHP-FPM: Processes, active	The total number of active processes.	Dependent item	php-fpm.processes_active Preprocessing JSON Path: `$.['active processes']`
PHP-FPM: Version	The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers.	Dependent item	php-fpm.version Preprocessing Regular expression: `^[.\s\S]*X-Powered-By: PHP/([.\d]{1,}) \1` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
PHP-FPM: Pool name	The name of the current pool.	Dependent item	php-fpm.name Preprocessing JSON Path: `$.pool` Discard unchanged with heartbeat: `3h`
PHP-FPM: Uptime	It indicates how long has this pool been running.	Dependent item	php-fpm.uptime Preprocessing JSON Path: `$.['start since']`
PHP-FPM: Start time	The time when this pool was started.	Dependent item	php-fpm.start_time Preprocessing JSON Path: `$.['start time']`
PHP-FPM: Processes, total	The total number of server processes currently running.	Dependent item	php-fpm.processes_total Preprocessing JSON Path: `$.['total processes']`
PHP-FPM: Processes, idle	The total number of idle processes.	Dependent item	php-fpm.processes_idle Preprocessing JSON Path: `$.['idle processes']`
PHP-FPM: Queue usage	The utilization of the queue.	Calculated	php-fpm.listenqueueusage
PHP-FPM: Process manager	The method used by the process manager to control the number of child processes for this pool.	Dependent item	php-fpm.process_manager Preprocessing JSON Path: `$.['process manager']` Discard unchanged with heartbeat: `3h`
PHP-FPM: Processes, max active	The highest value of "active processes" since the PHP-FPM server was started.	Dependent item	php-fpm.processesmaxactive Preprocessing JSON Path: `$.['max active processes']`
PHP-FPM: Accepted connections per second	The number of accepted requests per second.	Dependent item	php-fpm.conn_accepted.rate Preprocessing JSON Path: `$.['accepted conn']` Change per second
PHP-FPM: Slow requests	The number of requests that has exceeded your `request_slowlog_timeout` value.	Dependent item	php-fpm.slow_requests Preprocessing JSON Path: `$.['slow requests']` Simple change
PHP-FPM: Listen queue	The current number of connections that have been initiated but not yet accepted.	Dependent item	php-fpm.listen_queue Preprocessing JSON Path: `$.['listen queue']`
PHP-FPM: Listen queue, max	The maximum number of requests in the queue of pending connections since this FPM pool was started.	Dependent item	php-fpm.listenqueuemax Preprocessing JSON Path: `$.['max listen queue']`
PHP-FPM: Listen queue, len	The size of the socket queue of pending connections.	Dependent item	php-fpm.listenqueuelen Preprocessing JSON Path: `$.['listen queue len']`
PHP-FPM: Max children reached	The number of times that `pm.max_children` has been reached since the PHP-FPM pool was started.	Dependent item	php-fpm.max_children Preprocessing JSON Path: `$.['max children reached']` Simple change

Triggers

Name	Description	Expression	Severity
PHP-FPM: Version has changed	The PHP-FPM version has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by Zabbix agent/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent/php-fpm.version))>0`\|Info	Manual close: Yes
PHP-FPM: Pool has been restarted	Uptime is less than 10 minutes.	`last(/PHP-FPM by Zabbix agent/php-fpm.uptime)<10m`\|Info	Manual close: Yes
PHP-FPM: Queue utilization is high	The queue for this pool has reached `{$PHP_FPM.QUEUE.WARN.MAX}%` of its maximum capacity. Items in the queue represent the current number of connections that have been initiated on this pool but not yet accepted.	`min(/PHP-FPM by Zabbix agent/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX}`\|Warning
PHP-FPM: Manager changed	The PHP-FPM manager has changed. Acknowledge to close the problem manually.	`last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#2)`\|Info	Manual close: Yes
PHP-FPM: Detected slow requests	The PHP-FPM has detected a slow request. The slow request means that it took more time to execute than expected (defined in the configuration of your pool).	`min(/PHP-FPM by Zabbix agent/php-fpm.slow_requests,#3)>0`\|Warning

LLD rule PHP-FPM process discovery

Name	Description	Type	Key and additional info
PHP-FPM process discovery	The discovery of the PHP-FPM summary processes.	Dependent item	php-fpm.proc.discovery

Item prototypes for PHP-FPM process discovery

Name	Description	Type	Key and additional info
PHP-FPM: Get process data	The summary metrics aggregated by a process `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.get[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#PHP_FPM.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#PHP_FPM.NAME} data`
PHP-FPM: Memory usage (rss)	The summary of resident set size memory used by a process `{#PHP_FPM.NAME}` expressed in bytes.	Dependent item	php-fpm.proc.rss[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
PHP-FPM: Memory usage (vsize)	The summary of virtual memory used by a process `{#PHP_FPM.NAME}` expressed in bytes.	Dependent item	php-fpm.proc.vmem[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
PHP-FPM: Memory usage, %	The percentage of real memory used by a process `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.pmem[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
PHP-FPM: Number of running processes	The number of running processes `{#PHP_FPM.NAME}`.	Dependent item	php-fpm.proc.num[{#PHP_FPM.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
PHP-FPM: CPU utilization	The percentage of the CPU utilization by a process `{#PHP_FPM.NAME}`.	Zabbix agent	proc.cpu.util[{#PHP_FPM.NAME}]

Trigger prototypes for PHP-FPM process discovery

Name	Description	Expression	Severity
PHP-FPM: Process is not running		`last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])=0`\|High
PHP-FPM: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/PHP-FPM by Zabbix agent/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0`\|Info	Manual close: Yes
PHP-FPM: Service is down		`(last(/PHP-FPM by Zabbix agent/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0`\|High	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_pfsense_snmp

View README Download JSON

PFSense by SNMP

Overview

Template for monitoring pfSense by SNMP

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

pfSense 2.5.0, 2.5.1, 2.5.2

Configuration

Setup

Import template into Zabbix
Enable SNMP daemon at Services in pfSense web interface https://docs.netgate.com/pfsense/en/latest/services/snmp.html
Setup firewall rule to get access from Zabbix proxy or Zabbix server by SNMP https://docs.netgate.com/pfsense/en/latest/firewall/index.html#managing-firewall-rules
Link template to the host

Macros used

Name	Description	Default
{$IF.ERRORS.WARN}	Threshold of error packets rate for warning trigger. Can be used with interface name as context.	`2`
{$IF.UTIL.MAX}	Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context.	`90`
{$IFCONTROL}	Macro for operational state of the interface for link down trigger. Can be used with interface name as context.	`1`
{$NET.IF.IFADMINSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*`
{$NET.IF.IFADMINSTATUS.NOT_MATCHES}	Ignore down(2) administrative status.	`^2$`
{$NET.IF.IFALIAS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFALIAS.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFDESCR.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFDESCR.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFNAME.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`(^pflog[0-9.]$\|^pfsync[0-9.]$)`
{$NET.IF.IFOPERSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*$`
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}	Ignore notPresent(6).	`^6$`
{$NET.IF.IFTYPE.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFTYPE.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$SNMP.TIMEOUT}	The time interval for SNMP availability trigger.	`5m`
{$STATE.TABLE.UTIL.MAX}	Threshold of state table utilization trigger in %.	`90`
{$SOURCE.TRACKING.TABLE.UTIL.MAX}	Threshold of source tracking table utilization trigger in %.	`90`

Items

Name	Description	Type	Key and additional info
PFSense: SNMP agent availability	Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown	Zabbix internal	zabbix[host,snmp,available]
PFSense: Packet filter running status	MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled.	SNMP agent	pfsense.pf.status
PFSense: States table current	MIB: BEGEMOT-PF-MIB Number of entries in the state table.	SNMP agent	pfsense.state.table.count
PFSense: States table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset.	SNMP agent	pfsense.state.table.limit
PFSense: States table utilization in %	Utilization of state table in %.	Calculated	pfsense.state.table.pused
PFSense: Source tracking table current	MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table.	SNMP agent	pfsense.source.tracking.table.count
PFSense: Source tracking table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset.	SNMP agent	pfsense.source.tracking.table.limit
PFSense: Source tracking table utilization in %	Utilization of source tracking table in %.	Calculated	pfsense.source.tracking.table.pused
PFSense: DHCP server status	MIB: HOST-RESOURCES-MIB The status of DHCP server process.	SNMP agent	pfsense.dhcpd.status Preprocessing Check for not supported value: ⛔️Custom on fail: Set value to: `0`
PFSense: DNS server status	MIB: HOST-RESOURCES-MIB The status of DNS server process.	SNMP agent	pfsense.dns.status Preprocessing Check for not supported value: ⛔️Custom on fail: Set value to: `0`
PFSense: State of nginx process	MIB: HOST-RESOURCES-MIB The status of nginx process.	SNMP agent	pfsense.nginx.status Preprocessing Check for not supported value: ⛔️Custom on fail: Set value to: `0`
PFSense: Packets matched a filter rule	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.match Preprocessing Change per second
PFSense: Packets with bad offset	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.bad.offset Preprocessing Change per second
PFSense: Fragmented packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.fragment Preprocessing Change per second
PFSense: Short packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.short Preprocessing Change per second
PFSense: Normalized packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.normalize Preprocessing Change per second
PFSense: Packets dropped due to memory limitation	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	pfsense.packets.mem.drop Preprocessing Change per second
PFSense: Firewall rules count	MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system.	SNMP agent	pfsense.rules.count

Triggers

Name	Description	Expression
PFSense: No SNMP data collection	SNMP is not available for polling. Please check device connectivity and SNMP settings.	`max(/PFSense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0`\|Warning
PFSense: Packet filter is not running	Please check PF status.	`last(/PFSense by SNMP/pfsense.pf.status)<>1`\|High
PFSense: State table usage is high	Please check the number of connections https://docs.netgate.com/pfsense/en/latest/config/advanced-firewall-nat.html#config-advanced-firewall-maxstates	`min(/PFSense by SNMP/pfsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX}`\|Warning
PFSense: Source tracking table usage is high	Please check the number of sticky connections https://docs.netgate.com/pfsense/en/latest/monitoring/status/firewall-states-sources.html	`min(/PFSense by SNMP/pfsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX}`\|Warning
PFSense: DHCP server is not running	Please check DHCP server settings https://docs.netgate.com/pfsense/en/latest/services/dhcp/index.html	`last(/PFSense by SNMP/pfsense.dhcpd.status)=0`\|Average
PFSense: DNS server is not running	Please check DNS server settings https://docs.netgate.com/pfsense/en/latest/services/dns/index.html	`last(/PFSense by SNMP/pfsense.dns.status)=0`\|Average
PFSense: Web server is not running	Please check nginx service status.	`last(/PFSense by SNMP/pfsense.nginx.status)=0`\|Average

LLD rule Network interfaces discovery

Name	Description	Type	Key and additional info
Network interfaces discovery	Discovering interfaces from IF-MIB.	SNMP agent	pfsense.net.if.discovery

Item prototypes for Network interfaces discovery

Name	Description	Type	Key and additional info
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded	MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.discards[{#SNMPINDEX}] Preprocessing Change per second:
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.errors[{#SNMPINDEX}] Preprocessing Change per second:
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Bits received	MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded	MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.discards[{#SNMPINDEX}] Preprocessing Change per second:
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.errors[{#SNMPINDEX}] Preprocessing Change per second:
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Bits sent	MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Speed	MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of `n' then the speed of the interface is somewhere in the range of`n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero.	SNMP agent	net.if.speed[{#SNMPINDEX}] Preprocessing Custom multiplier: `1000000` Discard unchanged with heartbeat: `1h`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Operational status	MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components.	SNMP agent	net.if.status[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Interface type	MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention.	SNMP agent	net.if.type[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Rules references count	MIB: BEGEMOT-PF-MIB The number of rules referencing this interface.	SNMP agent	net.if.rules.refs[{#SNMPINDEX}]
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second

Trigger prototypes for Network interfaces discovery

Name	Description	Expression	Severity
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/PFSense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/PFSense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/PFSense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/PFSense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before	This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.	change(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])<>2)\|Info	Depends on: PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down	This trigger expression works as follows: 1. It can be triggered if the operations status is down. 2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.	`{$IFCONTROL:"{#IFNAME}"}=1 and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])=2)`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_opnsense_snmp

View README Download JSON

OPNsense by SNMP

Overview

Template for monitoring OPNsense by SNMP

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

OPNsense 22.1.9

Configuration

Setup

Enable bsnmpd daemon by creating new config file "/etc/rc.conf.d/bsnmpd" with the following content:
bsnmpd_enable="YES"
Uncomment the following lines in "/etc/snmpd.config" file to enable required SNMP modules:
begemotSnmpdModulePath."hostres" = "/usr/lib/snmphostres.so"
begemotSnmpdModulePath."pf" = "/usr/lib/snmppf.so"
Start bsnmpd daemon with the following command:
/etc/rc.d/bsnmpd start
Setup a firewall rule to get access from Zabbix proxy or Zabbix server by SNMP (https://docs.opnsense.org/manual/firewall.html).
Link the template to a host.

Macros used

Name	Description	Default
{$IF.ERRORS.WARN}	Threshold of error packets rate for warning trigger. Can be used with interface name as context.	`2`
{$IF.UTIL.MAX}	Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context.	`90`
{$IFCONTROL}	Macro for operational state of the interface for link down trigger. Can be used with interface name as context.	`1`
{$NET.IF.IFADMINSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*`
{$NET.IF.IFADMINSTATUS.NOT_MATCHES}	Ignore down(2) administrative status.	`^2$`
{$NET.IF.IFALIAS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFALIAS.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFDESCR.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFDESCR.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$NET.IF.IFNAME.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`(^pflog[0-9.]$\|^pfsync[0-9.]$)`
{$NET.IF.IFOPERSTATUS.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`^.*$`
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}	Ignore notPresent(6).	`^6$`
{$NET.IF.IFTYPE.MATCHES}	This macro is used in filters of network interfaces discovery rule.	`.*`
{$NET.IF.IFTYPE.NOT_MATCHES}	This macro is used in filters of network interfaces discovery rule.	`CHANGE_IF_NEEDED`
{$SNMP.TIMEOUT}	The time interval for SNMP availability trigger.	`5m`
{$STATE.TABLE.UTIL.MAX}	Threshold of state table utilization trigger in %.	`90`
{$SOURCE.TRACKING.TABLE.UTIL.MAX}	Threshold of source tracking table utilization trigger in %.	`90`

Items

Name	Description	Type	Key and additional info
OPNsense: SNMP agent availability	Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown	Zabbix internal	zabbix[host,snmp,available]
OPNsense: Packet filter running status	MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled.	SNMP agent	opnsense.pf.status
OPNsense: States table current	MIB: BEGEMOT-PF-MIB Number of entries in the state table.	SNMP agent	opnsense.state.table.count
OPNsense: States table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset.	SNMP agent	opnsense.state.table.limit
OPNsense: States table utilization in %	Utilization of state table in %.	Calculated	opnsense.state.table.pused
OPNsense: Source tracking table current	MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table.	SNMP agent	opnsense.source.tracking.table.count
OPNsense: Source tracking table limit	MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset.	SNMP agent	opnsense.source.tracking.table.limit
OPNsense: Source tracking table utilization in %	Utilization of source tracking table in %.	Calculated	opnsense.source.tracking.table.pused
OPNsense: DHCP server status	MIB: HOST-RESOURCES-MIB The status of DHCP server process.	SNMP agent	opnsense.dhcpd.status Preprocessing Check for not supported value: ⛔️Custom on fail: Set value to: `0`
OPNsense: DNS server status	MIB: HOST-RESOURCES-MIB The status of DNS server process.	SNMP agent	opnsense.dns.status Preprocessing Check for not supported value: ⛔️Custom on fail: Set value to: `0`
OPNsense: Web server status	MIB: HOST-RESOURCES-MIB The status of lighttpd process.	SNMP agent	opnsense.lighttpd.status Preprocessing Check for not supported value: ⛔️Custom on fail: Set value to: `0`
OPNsense: Packets matched a filter rule	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.match Preprocessing Change per second
OPNsense: Packets with bad offset	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.bad.offset Preprocessing Change per second
OPNsense: Fragmented packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.fragment Preprocessing Change per second
OPNsense: Short packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.short Preprocessing Change per second
OPNsense: Normalized packets	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.normalize Preprocessing Change per second
OPNsense: Packets dropped due to memory limitation	MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory.	SNMP agent	opnsense.packets.mem.drop Preprocessing Change per second
OPNsense: Firewall rules count	MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system.	SNMP agent	opnsense.rules.count

Triggers

Name	Description	Expression
OPNsense: No SNMP data collection	SNMP is not available for polling. Please check device connectivity and SNMP settings.	`max(/OPNsense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0`\|Warning
OPNsense: Packet filter is not running	Please check PF status.	`last(/OPNsense by SNMP/opnsense.pf.status)<>1`\|High
OPNsense: State table usage is high	Please check the number of connections.	`min(/OPNsense by SNMP/opnsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX}`\|Warning
OPNsense: Source tracking table usage is high	Please check the number of sticky connections.	`min(/OPNsense by SNMP/opnsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX}`\|Warning
OPNsense: DHCP server is not running	Please check DHCP server settings.	`last(/OPNsense by SNMP/opnsense.dhcpd.status)=0`\|Average
OPNsense: DNS server is not running	Please check DNS server settings.	`last(/OPNsense by SNMP/opnsense.dns.status)=0`\|Average
OPNsense: Web server is not running	Please check lighttpd service status.	`last(/OPNsense by SNMP/opnsense.lighttpd.status)=0`\|Average

LLD rule Network interfaces discovery

Name	Description	Type	Key and additional info
Network interfaces discovery	Discovering interfaces from IF-MIB.	SNMP agent	opnsense.net.if.discovery

Item prototypes for Network interfaces discovery

Name	Description	Type	Key and additional info
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded	MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.discards[{#SNMPINDEX}] Preprocessing Change per second:
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in.errors[{#SNMPINDEX}] Preprocessing Change per second:
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Bits received	MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.in[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded	MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.discards[{#SNMPINDEX}] Preprocessing Change per second:
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors	MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out.errors[{#SNMPINDEX}] Preprocessing Change per second:
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Bits sent	MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime.	SNMP agent	net.if.out[{#SNMPINDEX}] Preprocessing Change per second: Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Speed	MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of `n' then the speed of the interface is somewhere in the range of`n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero.	SNMP agent	net.if.speed[{#SNMPINDEX}] Preprocessing Custom multiplier: `1000000` Discard unchanged with heartbeat: `1h`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Operational status	MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components.	SNMP agent	net.if.status[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Interface type	MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention.	SNMP agent	net.if.type[{#SNMPINDEX}] Preprocessing Discard unchanged with heartbeat: `6h`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Rules references count	MIB: BEGEMOT-PF-MIB The number of rules referencing this interface.	SNMP agent	net.if.rules.refs[{#SNMPINDEX}]
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed	MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked	MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed	MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface.	SNMP agent	net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked	MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface.	SNMP agent	net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing Change per second Custom multiplier: `8`
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface.	SNMP agent	net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface.	SNMP agent	net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed	MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface.	SNMP agent	net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing Change per second
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked	MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface.	SNMP agent	net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing Change per second

Trigger prototypes for Network interfaces discovery

Name	Description	Expression	Severity
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/OPNsense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/OPNsense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate	It recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.	`min(/OPNsense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage	The utilization of the network interface is close to its estimated maximum bandwidth.	`(avg(/OPNsense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0`\|Warning	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before	This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.	change(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])<>2)\|Info	Depends on: OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down	This trigger expression works as follows: 1. It can be triggered if the operations status is down. 2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.	`{$IFCONTROL:"{#IFNAME}"}=1 and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])=2)`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_openweathermap_http

View README Download JSON

OpenWeatherMap by HTTP

Overview

This template is designed for the effortless deployment of OpenWeatherMap monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

OpenWeatherMap API

Configuration

Setup

Create a host.
Link the template to the host.
Customize the values of {$OPENWEATHERMAP.API.TOKEN} and {$LOCATION} macros.
OpenWeatherMap API Tokens are available in your OpenWeatherMap account https://home.openweathermap.org/api_keys.
Locations can be set by few ways:
- by geo coordinates (for example: 56.95,24.0833)
- by location name (for example: Riga)
- by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz
- by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by | delimiter. For example: 43.81821,7.76115|Riga|2643743|94040,us. Please note that API requests by city name, zip-codes and city id will be deprecated soon.
Language and units macros can be customized too if necessary. List of available languages: https://openweathermap.org/current#multi. Available units of measurement are: standard, metric and imperial https://openweathermap.org/current#data.

Macros used

Name	Description	Default
{$OPENWEATHERMAP.API.TOKEN}	Specify openweathermap API key.
{$LANG}	List of available languages https://openweathermap.org/current#multi.	`en`
{$LOCATION}	Locations can be set by few ways: 1. by geo coordinates (for example: 56.95,24.0833) 2. by location name (for example: Riga) 3. by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz 4. by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by `\|` delimiter. For example: `43.81821,7.76115\|Riga\|2643743\|94040,us`. Please note that API requests by city name, zip-codes and city id will be deprecated soon.	`Riga`
{$OPENWEATHERMAP.API.ENDPOINT}	OpenWeatherMap API endpoint.	`api.openweathermap.org/data/2.5/weather?`
{$UNITS}	Available units of measurement are standard, metric and imperial https://openweathermap.org/current#data.	`metric`
{$OPENWEATHERMAP.DATA.TIMEOUT}	Response timeout for OpenWeatherMap API.	`3s`
{$TEMP.CRIT.HIGH}	Threshold for high temperature trigger.	`30`
{$TEMP.CRIT.LOW}	Threshold for low temperature trigger.	`-20`

Items

Name Description Type Key and additional info

Openweathermap: Get data

JSON array with result of OpenWeatherMap API requests.

Script

openweathermap.get.data

Openweathermap: Get data collection errors

Errors from get data requests by script item.

Dependent item

openweathermap.get.errors

Preprocessing

JSON Path: $.errors
Discard unchanged with heartbeat: 1h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Openweathermap: There are errors in requests to OpenWeatherMap API	Zabbix has received errors in requests to OpenWeatherMap API.	`length(last(/OpenWeatherMap by HTTP/openweathermap.get.errors))>0`\|Average	Manual close: Yes

LLD rule Locations discovery

Name Description Type Key and additional info

Locations discovery

Weather metrics discovery by location.

Dependent item

openweathermap.locations.discovery

Preprocessing

JSON Path: $.data
Does not match regular expression: \[\]
⛔️Custom on fail: Set error to: Failed to receive data about required locations from API
Discard unchanged with heartbeat: 1h

Item prototypes for Locations discovery

Name	Description	Type	Key and additional info
[{#LOCATION}, {#COUNTRY}]: Data	JSON with result of OpenWeatherMap API request by location.	Dependent item	openweathermap.location.data[{#ID}] Preprocessing JSON Path: `$.data.[?(@.id=='{#ID}')].first()`
[{#LOCATION}, {#COUNTRY}]: Atmospheric pressure	Atmospheric pressure in Pa.	Dependent item	openweathermap.pressure[{#ID}] Preprocessing JSON Path: `$.main.pressure` Custom multiplier: `100` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Cloudiness	Cloudiness in %.	Dependent item	openweathermap.clouds[{#ID}] Preprocessing JSON Path: `$.clouds.all` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Humidity	Humidity in %.	Dependent item	openweathermap.humidity[{#ID}] Preprocessing JSON Path: `$.main.humidity` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Rain volume for the last one hour	Rain volume for the lat one hour in m.	Dependent item	openweathermap.rain[{#ID}] Preprocessing JSON Path: `$.rain.1h` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Short weather status	Short weather status description.	Dependent item	openweathermap.description[{#ID}] Preprocessing JSON Path: `$.weather..description.first()` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Snow volume for the last one hour	Snow volume for the lat one hour in m.	Dependent item	openweathermap.snow[{#ID}] Preprocessing JSON Path: `$.snow.1h` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Temperature	Atmospheric temperature value.	Dependent item	openweathermap.temp[{#ID}] Preprocessing JSON Path: `$.main.temp` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Visibility	Visibility in m.	Dependent item	openweathermap.visibility[{#ID}] Preprocessing JSON Path: `$.visibility` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Wind direction	Wind direction in degrees.	Dependent item	openweathermap.wind.direction[{#ID}] Preprocessing JSON Path: `$.wind.deg` Discard unchanged with heartbeat: `1h`
[{#LOCATION}, {#COUNTRY}]: Wind speed	Wind speed value.	Dependent item	openweathermap.wind.speed[{#ID}] Preprocessing JSON Path: `$.wind.speed` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Locations discovery

Name	Description	Expression	Severity	Dependencies and additional info
[{#LOCATION}, {#COUNTRY}]: Temperature is too high	Temperature value is too high.	`min(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)>{$TEMP.CRIT.HIGH}`\|Average	Manual close: Yes
[{#LOCATION}, {#COUNTRY}]: Temperature is too low	Temperature value is too low.	`max(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)<{$TEMP.CRIT.LOW}`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nomad_http

View README Download JSON

HashiCorp Nomad by HTTP

Overview

This template is designed to monitor HashiCorp Nomad by Zabbix. It works without any external scripts. Currently the template supports Nomad servers and clients discovery.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Nomad version 1.5.6/1.6.0

Configuration

Setup

Create a synthetic Nomad host. It should be one of the Nomad cluster members, load-balancing service (if cluster is used) or a single node in a selected Nomad region.
Define the {$NOMAD.ENDPOINT.API.URL} macro value with correct web protocol, host and port.
Prepare an ACL token with node:read, namespace:read-job, agent:read and management permissions applied. Define the {$NOMAD.TOKEN} macro value. > Refer to the vendor documentation about Nomad native ACL or Nomad Vault-generated tokens if you have the HashiCorp Vault integration configured.

Additional information:

Synthetic Nomad host will be used just as an endpoint for servers and clients discovery (general cluster information), it will not be monitored as a Nomad server or client, so that to prevent duplicate entities.
If you're not using ACL - skip 3rd setup step.
The Nomad servers/clients discovery is limited by region. If you're using multi-region cluster- create one synthetic host per region.
The Nomad server/client templates are ready for separate usage. Feel free to use if you prefer manual host creation.

Useful links

Macros used

Name	Description	Default
{$NOMAD.ENDPOINT.API.URL}	API endpoint URL for one of the Nomad cluster members.	`http://localhost:4646`
{$NOMAD.TOKEN}	Nomad authentication token.	`<PUT YOUR AUTH TOKEN>`
{$NOMAD.DATA.TIMEOUT}	Response timeout for an API.	`15s`
{$NOMAD.HTTP.PROXY}	Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used.
{$NOMAD.API.RESPONSE.SUCCESS}	HTTP API successful response code. Availability triggers threshold. Change, if needed.	`200`
{$NOMAD.SERVER.NAME.MATCHES}	The filter to include HashiCorp Nomad servers by name.	`.*`
{$NOMAD.SERVER.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad servers by name.	`CHANGE_IF_NEEDED`
{$NOMAD.SERVER.DC.MATCHES}	The filter to include HashiCorp Nomad servers by datacenter belonging.	`.*`
{$NOMAD.SERVER.DC.NOT_MATCHES}	The filter to exclude HashiCorp Nomad servers by datacenter belonging.	`CHANGE_IF_NEEDED`
{$NOMAD.CLIENT.NAME.MATCHES}	The filter to include HashiCorp Nomad clients by name.	`.*`
{$NOMAD.CLIENT.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad clients by name.	`CHANGE_IF_NEEDED`
{$NOMAD.CLIENT.DC.MATCHES}	The filter to include HashiCorp Nomad clients by datacenter belonging.	`.*`
{$NOMAD.CLIENT.DC.NOT_MATCHES}	The filter to exclude HashiCorp Nomad clients by datacenter belonging.	`CHANGE_IF_NEEDED`
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.MATCHES}	The filter to include HashiCorp Nomad clients by scheduling eligibility.	`.*`
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.NOT_MATCHES}	The filter to exclude HashiCorp Nomad clients by scheduling eligibility.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
HashiCorp Nomad: Nomad clients get	Nomad clients data in raw format.	HTTP agent	nomad.client.nodes.get Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
HashiCorp Nomad: Client nodes API response	Client nodes API response message.	Dependent item	nomad.client.nodes.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad: Nomad servers get	Nomad servers data in raw format.	Script	nomad.server.nodes.get
HashiCorp Nomad: Server-related APIs response	Server-related (`operator/raft/configuration`, `agent/members`) APIs error response message.	Dependent item	nomad.server.api.response Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: `HTTP/1.1 200 OK` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad: Region	Current cluster region.	Dependent item	nomad.region Preprocessing JSON Path: `$..region.first()`
HashiCorp Nomad: Nomad servers count	Nomad servers count.	Dependent item	nomad.servers.count Preprocessing JSON Path: `$[?(@.Name)].length()`
HashiCorp Nomad: Nomad clients count	Nomad clients count.	Dependent item	nomad.clients.count Preprocessing JSON Path: `$.body[?(@.Name)].length()`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Nomad: Client nodes API connection has failed	Client nodes API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad by HTTP/nomad.client.nodes.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes
HashiCorp Nomad: Server-related API connection has failed	Server-related API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad by HTTP/nomad.server.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes

LLD rule Clients discovery

Name Description Type Key and additional info

Clients discovery

Client nodes discovery.

Dependent item

nomad.clients.discovery

Preprocessing

JSON Path: $.body
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 1h

LLD rule Servers discovery

Name Description Type Key and additional info

Servers discovery

Server nodes discovery.

Dependent item

nomad.servers.discovery

Preprocessing

Check for error in JSON: $.error
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 1h

HashiCorp Nomad Client by HTTP

Overview

This template is designed to monitor HashiCorp Nomad clients by Zabbix. It works without any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Nomad version 1.5.6/1.6.0

Configuration

Setup

Enable telemetry in HashiCorp Nomad agent configuration file. Set the Prometheus metrics format. >Refer to the vendor documentation.
Prepare an ACL token with node:read, namespace:read-job permissions applied. Define the {$NOMAD.TOKEN} macro value. > Refer to the vendor documentation about Nomad native ACL or Nomad Vault-generated tokens if you're using integration with HashiCorp Vault.
Set the values for the {$NOMAD.CLIENT.API.SCHEME} and {$NOMAD.CLIENT.API.PORT} macros to define the common Nomad API web schema and connection port.

Additional information:

You have to prepare an additional ACL token only if you wish to monitor Nomad clients as separate entities. If you're using clients discovery - token will be inherited from the master host linked to the HashiCorp Nomad by HTTP template.
If you're not using ACL - skip 2nd setup step.
The Nomad clients use the default web schema - HTTP and default API port - 4646. If you're using clients discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.CLIENT.API.SCHEME:NECESSARY.IP}} or/and {{$NOMAD.CLIENT.API.PORT:NECESSARY.IP}} on master host or template level.
Some metrics may not be collected depending on your HashiCorp Nomad agent version and configuration.

Useful links:

Macros used

Name	Description	Default
{$NOMAD.CLIENT.API.SCHEME}	Nomad client API scheme.	`http`
{$NOMAD.CLIENT.API.PORT}	Nomad client API port.	`4646`
{$NOMAD.TOKEN}	Nomad authentication token.	`<PUT YOUR AUTH TOKEN>`
{$NOMAD.DATA.TIMEOUT}	Response timeout for an API.	`15s`
{$NOMAD.HTTP.PROXY}	Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used.
{$NOMAD.API.RESPONSE.SUCCESS}	HTTP API successful response code. Availability triggers threshold. Change, if needed.	`200`
{$NOMAD.CLIENT.RPC.PORT}	Nomad RPC service port.	`4647`
{$NOMAD.CLIENT.SERF.PORT}	Nomad serf service port.	`4648`
{$NOMAD.CLIENT.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$NOMAD.DISK.NAME.MATCHES}	The filter to include HashiCorp Nomad client disks by name.	`.*`
{$NOMAD.DISK.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client disks by name.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.NAME.MATCHES}	The filter to include HashiCorp Nomad client jobs by name.	`.*`
{$NOMAD.JOB.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by name.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.NAMESPACE.MATCHES}	The filter to include HashiCorp Nomad client jobs by namespace.	`.*`
{$NOMAD.JOB.NAMESPACE.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by namespace.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.TYPE.MATCHES}	The filter to include HashiCorp Nomad client jobs by type.	`.*`
{$NOMAD.JOB.TYPE.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by type.	`CHANGE_IF_NEEDED`
{$NOMAD.JOB.TASK.GROUP.MATCHES}	The filter to include HashiCorp Nomad client jobs by task group belonging.	`.*`
{$NOMAD.JOB.TASK.GROUP.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client jobs by task group belonging.	`CHANGE_IF_NEEDED`
{$NOMAD.DRIVER.NAME.MATCHES}	The filter to include HashiCorp Nomad client drivers by name.	`.*`
{$NOMAD.DRIVER.NAME.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client drivers by name.	`CHANGE_IF_NEEDED`
{$NOMAD.DRIVER.DETECT.MATCHES}	The filter to include HashiCorp Nomad client drivers by detection state. Possible filtering values: `true`, `false`.	`.*`
{$NOMAD.DRIVER.DETECT.NOT_MATCHES}	The filter to exclude HashiCorp Nomad client drivers by detection state. Possible filtering values: `true`, `false`.	`CHANGE_IF_NEEDED`
{$NOMAD.CPU.UTIL.MIN}	CPU utilization threshold. Measured as a percentage.	`90`
{$NOMAD.RAM.AVAIL.MIN}	CPU utilization threshold. Measured as a percentage.	`5`
{$NOMAD.INODES.FREE.MIN.WARN}	Warning threshold of the filesystem metadata utilization. Measured as a percentage.	`20`
{$NOMAD.INODES.FREE.MIN.CRIT}	Critical threshold of the filesystem metadata utilization. Measured as a percentage.	`10`

Items

Name	Description	Type	Key and additional info
HashiCorp Nomad Client: Telemetry get	Telemetry data in raw format.	HTTP agent	nomad.client.data.get Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
HashiCorp Nomad Client: Metrics	Nomad client metrics in raw format.	Dependent item	nomad.client.metrics.get Preprocessing JSON Path: `$.body` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: Monitoring API response	Monitoring API response message.	Dependent item	nomad.client.data.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: Service [rpc] state	Current [rpc] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: Service [serf] state	Current [serf] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: CPU allocated	Total amount of CPU shares the scheduler has allocated to tasks.	Dependent item	nomad.client.allocated.cpu Preprocessing Prometheus pattern: `VALUE(nomad_client_allocated_cpu)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: CPU unallocated	Total amount of CPU shares free for the scheduler to allocate to tasks.	Dependent item	nomad.client.unallocated.cpu Preprocessing Prometheus pattern: `VALUE(nomad_client_unallocated_cpu)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: Memory allocated	Total amount of memory the scheduler has allocated to tasks.	Dependent item	nomad.client.allocated.memory Preprocessing Prometheus pattern: `VALUE(nomad_client_allocated_memory)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
HashiCorp Nomad Client: Memory unallocated	Total amount of memory free for the scheduler to allocate to tasks.	Dependent item	nomad.client.unallocated.memory Preprocessing Prometheus pattern: `VALUE(nomad_client_unallocated_memory)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
HashiCorp Nomad Client: Disk allocated	Total amount of disk space the scheduler has allocated to tasks.	Dependent item	nomad.client.allocated.disk Preprocessing Prometheus pattern: `VALUE(nomad_client_allocated_disk)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
HashiCorp Nomad Client: Disk unallocated	Total amount of disk space free for the scheduler to allocate to tasks.	Dependent item	nomad.client.unallocated.disk Preprocessing Prometheus pattern: `VALUE(nomad_client_unallocated_disk)` ⛔️Custom on fail: Discard value Custom multiplier: `1.0E+6`
HashiCorp Nomad Client: Allocations blocked	Number of allocations waiting for previous versions.	Dependent item	nomad.client.allocations.blocked Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_blocked)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Client: Allocations migrating	Number of allocations migrating data from previous versions.	Dependent item	nomad.client.allocations.migrating Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_migrating)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Client: Allocations pending	Number of allocations pending (received by the client but not yet running).	Dependent item	nomad.client.allocations.pending Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_pending)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Client: Allocations starting	Number of allocations starting.	Dependent item	nomad.client.allocations.start Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_start)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Client: Allocations running	Number of allocations running.	Dependent item	nomad.client.allocations.running Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_running)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Client: Allocations terminal	Number of allocations terminal.	Dependent item	nomad.client.allocations.terminal Preprocessing Prometheus pattern: `VALUE(nomad_client_allocations_terminal)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Client: Allocations failed, rate	Number of allocations failed.	Dependent item	nomad.client.allocations.failed Preprocessing Prometheus pattern: `SUM(nomad_client_allocs_failed)` ⛔️Custom on fail: Set value to: `0` Change per second Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: Allocations completed, rate	Number of allocations completed.	Dependent item	nomad.client.allocations.complete Preprocessing Prometheus pattern: `SUM(nomad_client_allocs_complete)` ⛔️Custom on fail: Set value to: `0` Change per second Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: Allocations restarted, rate	Number of allocations restarted.	Dependent item	nomad.client.allocations.restart Preprocessing Prometheus pattern: `SUM(nomad_client_allocs_restart)` ⛔️Custom on fail: Set value to: `0` Change per second Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: Allocations OOM killed	Number of allocations OOM killed.	Dependent item	nomad.client.allocations.oom_killed Preprocessing Prometheus pattern: `VALUE(nomad_client_allocs_oom_killed)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: CPU idle utilization	CPU utilization in idle state.	Dependent item	nomad.client.cpu.idle Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_idle)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: CPU system utilization	CPU utilization in system space.	Dependent item	nomad.client.cpu.system Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_system)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: CPU total utilization	Total CPU utilization.	Dependent item	nomad.client.cpu.total Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_total)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: CPU user utilization	CPU utilization in user space.	Dependent item	nomad.client.cpu.user Preprocessing Prometheus pattern: `AVG(nomad_client_host_cpu_user)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: Memory available	Total amount of memory available to processes which includes free and cached memory.	Dependent item	nomad.client.memory.available Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_available)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Client: Memory free	Amount of memory which is free.	Dependent item	nomad.client.memory.free Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_free)`
HashiCorp Nomad Client: Memory size	Total amount of physical memory on the node.	Dependent item	nomad.client.memory.total Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_total)`
HashiCorp Nomad Client: Memory used	Amount of memory used by processes.	Dependent item	nomad.client.memory.used Preprocessing Prometheus pattern: `VALUE(nomad_client_host_memory_used)`
HashiCorp Nomad Client: Uptime	Uptime of the host running the Nomad client.	Dependent item	nomad.client.uptime Preprocessing Prometheus pattern: `VALUE(nomad_client_uptime)`
HashiCorp Nomad Client: Node info get	Node info data in raw format.	HTTP agent	nomad.client.node.info.get Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
HashiCorp Nomad Client: Nomad client version	Nomad client version.	Dependent item	nomad.client.version Preprocessing JSON Path: `$.body..Version.first()`
HashiCorp Nomad Client: Nodes API response	Nodes API response message.	Dependent item	nomad.client.node.info.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Client: Allocated jobs get	Allocated jobs data in raw format.	HTTP agent	nomad.client.job.allocs.get Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
HashiCorp Nomad Client: Allocations API response	Allocations API response message.	Dependent item	nomad.client.job.allocs.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
HashiCorp Nomad Client: Monitoring API connection has failed	Monitoring API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Client by HTTP/nomad.client.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes
HashiCorp Nomad Client: Service [rpc] is down	Cannot establish the connection to [rpc] service port {$NOMAD.CLIENT.RPC.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Client: Service [serf] is down	Cannot establish the connection to [serf] service port {$NOMAD.CLIENT.SERF.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Client: OOM killed allocations found	OOM killed allocations found.	`last(/HashiCorp Nomad Client by HTTP/nomad.client.allocations.oom_killed) > 0`\|Warning	Manual close: Yes
HashiCorp Nomad Client: High CPU utilization	CPU utilization is too high. The system might be slow to respond.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.cpu.total, 10m) >= {$NOMAD.CPU.UTIL.MIN}`\|Average
HashiCorp Nomad Client: High memory utilization	RAM utilization is too high. The system might be slow to respond.	`(min(/HashiCorp Nomad Client by HTTP/nomad.client.memory.available, 10m) / last(/HashiCorp Nomad Client by HTTP/nomad.client.memory.total))*100 <= {$NOMAD.RAM.AVAIL.MIN}`\|Average
HashiCorp Nomad Client: The host has been restarted	The host uptime is less than 10 minutes.	`last(/HashiCorp Nomad Client by HTTP/nomad.client.uptime) < 10m`\|Warning	Manual close: Yes
HashiCorp Nomad Client: Nomad client version has changed	Nomad client version has changed.	`change(/HashiCorp Nomad Client by HTTP/nomad.client.version)<>0`\|Info	Manual close: Yes
HashiCorp Nomad Client: Nodes API connection has failed	Nodes API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Client by HTTP/nomad.client.node.info.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes Depends on: HashiCorp Nomad Client: Monitoring API connection has failed
HashiCorp Nomad Client: Allocations API connection has failed	Allocations API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Client by HTTP/nomad.client.job.allocs.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes Depends on: HashiCorp Nomad Client: Monitoring API connection has failed

LLD rule Drivers discovery

Name Description Type Key and additional info

Drivers discovery

Client drivers discovery.

Dependent item

nomad.client.drivers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Drivers discovery

Name Description Type Key and additional info

HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] state

Driver [{#DRIVER.NAME}] state.

Dependent item

nomad.client.driver.state["{#DRIVER.NAME}"]

Preprocessing

JSON Path: $.body..Drivers.{#DRIVER.NAME}.Healthy.first()
Boolean to decimal
Discard unchanged with heartbeat: 1h

HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state

Driver [{#DRIVER.NAME}] detection state.

Dependent item

nomad.client.driver.detected["{#DRIVER.NAME}"]

Preprocessing

JSON Path: $.body..Drivers.{#DRIVER.NAME}.Detected.first()
Boolean to decimal

Trigger prototypes for Drivers discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] is in unhealthy state	The [{#DRIVER.NAME}] driver detected, but its state is unhealthy.	`last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.state["{#DRIVER.NAME}"]) = 0 and last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) = 1`\|Warning	Manual close: Yes
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state has changed	The [{#DRIVER.NAME}] driver detection state has changed.	`change(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) <> 0`\|Info	Manual close: Yes

LLD rule Physical disks discovery

Name Description Type Key and additional info

Physical disks discovery

Physical disks discovery.

Dependent item

nomad.client.disk.discovery

Preprocessing

Prometheus to JSON: nomad_client_host_disk_available{disk=~".*"}

Item prototypes for Physical disks discovery

Name	Description	Type	Key and additional info
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space available	Amount of space which is available on ["{#DEV.NAME}"] disk.	Dependent item	nomad.client.disk.available["{#DEV.NAME}"] Preprocessing Prometheus pattern: `VALUE(nomad_client_host_disk_available{disk="{#DEV.NAME}"})`
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] inodes utilization	Disk space consumed by the inodes on ["{#DEV.NAME}"] disk.	Dependent item	nomad.client.disk.inodes_percent["{#DEV.NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] size	Total size of the ["{#DEV.NAME}"] device.	Dependent item	nomad.client.disk.size["{#DEV.NAME}"] Preprocessing Prometheus pattern: `VALUE(nomad_client_host_disk_size{disk="{#DEV.NAME}"})`
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space utilization	Percentage of disk ["{#DEV.NAME}"] space used.	Dependent item	nomad.client.disk.used_percent["{#DEV.NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space used	Amount of disk ["{#DEV.NAME}"] space which has been used.	Dependent item	nomad.client.disk.used["{#DEV.NAME}"] Preprocessing Prometheus pattern: `VALUE(nomad_client_host_disk_used{disk="{#DEV.NAME}"})`

Trigger prototypes for Physical disks discovery

Name	Description	Expression	Severity
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device	It may become impossible to write to a disk if there are no index nodes left. The following error messages may be returned as symptoms, even though the free space: - No space left on device; - Disk is full.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.WARN:"{#DEV.NAME}"}`\|Warning	Manual close: Yes Depends on: HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device	It may become impossible to write to a disk if there are no index nodes left. The following error messages may be returned as symptoms, even though the free space: - No space left on device; - Disk is full.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.CRIT:"{#DEV.NAME}"}`\|Average	Manual close: Yes
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization	High disk [{#DEV.NAME}] utilization.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.WARN:"{#DEV.NAME}"}`\|Warning	Manual close: Yes Depends on: HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization	High disk [{#DEV.NAME}] utilization.	`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.CRIT:"{#DEV.NAME}"}`\|Average	Manual close: Yes

LLD rule Allocated jobs discovery

Name Description Type Key and additional info

Allocated jobs discovery

Allocated jobs discovery.

Dependent item

nomad.client.alloc.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Allocated jobs discovery

Name	Description	Type	Key and additional info
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU allocated	Total CPU resources allocated by the ["{#JOB.NAME}"] job across all cores.	Dependent item	nomad.client.allocs.cpu.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU system utilization	Total CPU resources consumed by the ["{#JOB.NAME}"] job in system space.	Dependent item	nomad.client.allocs.cpu.system["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU user utilization	Total CPU resources consumed by the ["{#JOB.NAME}"] job in user space.	Dependent item	nomad.client.allocs.cpu.user["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU total utilization	Total CPU resources consumed by the ["{#JOB.NAME}"] job across all cores.	Dependent item	nomad.client.allocs.cpu.total_percent["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU throttled periods time	Total number of CPU periods that the ["{#JOB.NAME}"] job was throttled.	Dependent item	nomad.client.allocs.cpu.throttled_periods["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Custom multiplier: `1e-09`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU throttled time	Total time that the ["{#JOB.NAME}"] job was throttled.	Dependent item	nomad.client.allocs.cpu.throttled_time["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU ticks	CPU ticks consumed by the process for the ["{#JOB.NAME}"] job in the last collection interval.	Dependent item	nomad.client.allocs.cpu.total_ticks["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory allocated	Amount of memory allocated by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory cached	Amount of memory cached by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.cache["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory used	Total amount of memory used by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.usage["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory swapped	Amount of memory swapped by the ["{#JOB.NAME}"] job.	Dependent item	nomad.client.allocs.memory.swap["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`

HashiCorp Nomad Server by HTTP

Overview

This template is designed to monitor HashiCorp Nomad servers by Zabbix. It works without any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Nomad version 1.5.6/1.6.0

Configuration

Setup

Enable telemetry in HashiCorp Nomad agent configuration file. Set the Prometheus metrics format. >Refer to the vendor documentation.
Set the values for the {$NOMAD.SERVER.API.SCHEME} and {$NOMAD.SERVER.API.PORT} macros to define the common Nomad API web schema and connection port.

Additional information:

The Nomad servers use the default web schema - HTTP and default API port - 4646. If you're using servers discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.SERVER.API.SCHEME:NECESSARY.IP}} or/and {{$NOMAD.SERVER.API.PORT:NECESSARY.IP}} on master host or template level.
Some metrics may not be collected depending on your HashiCorp Nomad agent version, configuration and cluster role.
Don't forget to define the {$NOMAD.REDUNDANCY.MIN} macro value, based on your cluster nodes amount to configure the failure tolerance triggers correctly.

Useful links:

Macros used

Name	Description	Default
{$NOMAD.SERVER.API.SCHEME}	Nomad SERVER API scheme.	`http`
{$NOMAD.SERVER.API.PORT}	Nomad SERVER API port.	`4646`
{$NOMAD.TOKEN}	Nomad authentication token.	`<PUT YOUR AUTH TOKEN>`
{$NOMAD.DATA.TIMEOUT}	Response timeout for an API.	`15s`
{$NOMAD.HTTP.PROXY}	Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used.
{$NOMAD.API.RESPONSE.SUCCESS}	HTTP API successful response code. Availability triggers threshold. Change, if needed.	`200`
{$NOMAD.SERVER.RPC.PORT}	Nomad RPC service port.	`4647`
{$NOMAD.SERVER.SERF.PORT}	Nomad serf service port.	`4648`
{$NOMAD.REDUNDANCY.MIN}	Amount of redundant servers to keep the cluster safe. Default value - '1' for the 3-nodes cluster. Change if needed.	`1`
{$NOMAD.OPEN.FDS.MAX}	Maximum percentage of used file descriptors.	`90`
{$NOMAD.SERVER.LEADER.LATENCY}	Leader last contact latency threshold.	`0.3s`

Items

Name	Description	Type	Key and additional info
HashiCorp Nomad Server: Telemetry get	Telemetry data in raw format.	HTTP agent	nomad.server.data.get Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
HashiCorp Nomad Server: Metrics	Nomad server metrics in raw format.	Dependent item	nomad.server.metrics.get Preprocessing JSON Path: `$.body` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Monitoring API response	Monitoring API response message.	Dependent item	nomad.server.data.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Server: Internal stats get	Internal stats data in raw format.	HTTP agent	nomad.server.stats.get Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`
HashiCorp Nomad Server: Internal stats API response	Internal stats API response message.	Dependent item	nomad.server.stats.api.response Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Server: Nomad server version	Nomad server version.	Dependent item	nomad.server.version Preprocessing JSON Path: `$.body.config.Version.Version`
HashiCorp Nomad Server: Nomad raft version	Nomad raft version.	Dependent item	nomad.raft.version Preprocessing JSON Path: `$.body.stats.raft.protocol_version` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Raft peers	Current cluster raft peers amount.	Dependent item	nomad.server.raft.peers Preprocessing JSON Path: `$.body.stats.raft.num_peers` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Cluster role	Current role in the cluster.	Dependent item	nomad.server.raft.cluster_role Preprocessing JSON Path: `$.body.stats.raft.state` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
HashiCorp Nomad Server: CPU time, rate	Total user and system CPU time spent in seconds.	Dependent item	nomad.server.cpu.time Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: Memory used	Memory utilization in bytes.	Dependent item	nomad.server.runtime.alloc_bytes Preprocessing Prometheus pattern: `VALUE(nomad_runtime_alloc_bytes)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Virtual memory size	Virtual memory size in bytes.	Dependent item	nomad.server.virtualmemorybytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Resident memory size	Resident memory size in bytes.	Dependent item	nomad.server.residentmemorybytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Heap objects	Number of objects on the heap. General memory pressure indicator.	Dependent item	nomad.server.runtime.heap_objects Preprocessing Prometheus pattern: `VALUE(nomad_runtime_heap_objects)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Open file descriptors	Number of open file descriptors.	Dependent item	nomad.server.processopenfds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	nomad.server.processmaxfds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Goroutines	Number of goroutines and general load pressure indicator.	Dependent item	nomad.server.runtime.num_goroutines Preprocessing Prometheus pattern: `VALUE(nomad_runtime_num_goroutines)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations pending	Evaluations that are pending until an existing evaluation for the same job completes.	Dependent item	nomad.server.broker.total_pending Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_pending)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations ready	Number of evaluations ready to be processed.	Dependent item	nomad.server.broker.total_ready Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_ready)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations unacked	Evaluations dispatched for processing but incomplete.	Dependent item	nomad.server.broker.total_unacked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_unacked)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: CPU shares for blocked evaluations	Amount of CPU shares requested by blocked evals.	Dependent item	nomad.server.blocked_evals.cpu Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_cpu)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Memory shares by blocked evaluations	Amount of memory requested by blocked evals.	Dependent item	nomad.server.blocked_evals.memory Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_memory)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: CPU shares for blocked job evaluations	Amount of CPU shares requested by blocked evals of a job.	Dependent item	nomad.server.blocked_evals.job.cpu Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_job_cpu)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Memory shares for blocked job evaluations	Amount of memory requested by blocked evals of a job.	Dependent item	nomad.server.blocked_evals.job.memory Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_job_memory)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations blocked	Count of evals in the blocked state for any reason (cluster resource exhaustion or quota limits).	Dependent item	nomad.server.blockedevals.totalblocked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_blocked)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations escaped	Count of evals that have escaped computed node classes. This indicates a scheduler optimization was skipped and is not usually a source of concern.	Dependent item	nomad.server.blockedevals.totalescaped Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_escaped)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations waiting	Count of evals waiting to be enqueued.	Dependent item	nomad.server.broker.total_waiting Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_total_waiting)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations blocked due to quota limit	Count of blocked evals due to quota limits (the resources for these jobs are not counted in other blockedevals metrics, except for totalblocked).	Dependent item	nomad.server.blockedevals.totalquota_limit Preprocessing Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_quota_limit)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Evaluations enqueue time	Average time elapsed with evaluations waiting to be enqueued.	Dependent item	nomad.server.broker.eval_waiting Preprocessing Prometheus pattern: `AVG(nomad_nomad_eval_ack_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC evaluation acknowledgement time	Time elapsed for Eval.Ack RPC call.	Dependent item	nomad.server.eval.ack Preprocessing Prometheus pattern: `VALUE(nomad_nomad_eval_ack_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC job summary time	Time elapsed for Job.Summary RPC call.	Dependent item	nomad.server.jobsummary.getjob_summary Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_summary_get_job_summary_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Heartbeats active	Number of active heartbeat timers. Each timer represents a Nomad client connection.	Dependent item	nomad.server.heartbeat.active Preprocessing Prometheus pattern: `VALUE(nomad_nomad_heartbeat_active)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: RPC requests, rate	Number of RPC requests being handled.	Dependent item	nomad.server.rpc.request Preprocessing Prometheus pattern: `VALUE(nomad_nomad_rpc_request)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: RPC error requests, rate	Number of RPC requests being handled that result in an error.	Dependent item	nomad.server.rpc.request_error Preprocessing Prometheus pattern: `VALUE(nomad_nomad_rpc_request)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: RPC queries, rate	Number of RPC queries.	Dependent item	nomad.server.rpc.query Preprocessing Prometheus pattern: `VALUE(nomad_nomad_rpc_query)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: RPC job allocations time	Time elapsed for Job.Allocations RPC call.	Dependent item	nomad.server.job.allocations Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_allocations_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC job evaluations time	Time elapsed for Job.Evaluations RPC call.	Dependent item	nomad.server.job.evaluations Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_evaluations_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC get job time	Time elapsed for Job.GetJob RPC call.	Dependent item	nomad.server.job.get_job Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_get_job_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Plan apply time	Time elapsed to apply a plan.	Dependent item	nomad.server.plan.apply Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_apply_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Plan evaluate time	Time elapsed to evaluate a plan.	Dependent item	nomad.server.plan.evaluate Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_evaluate_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC plan submit time	Time elapsed for Plan.Submit RPC call.	Dependent item	nomad.server.plan.submit Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_submit_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Plan raft index processing time	Time elapsed that planner waits for the raft index of the plan to be processed.	Dependent item	nomad.server.plan.waitforindex Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_wait_for_index_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC list time	Time elapsed for Node.List RPC call.	Dependent item	nomad.server.client.list Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_list_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC update allocations time	Time elapsed for Node.UpdateAlloc RPC call.	Dependent item	nomad.server.client.update_alloc Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_update_alloc_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC update status time	Time elapsed for Node.UpdateStatus RPC call.	Dependent item	nomad.server.client.update_status Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_update_status_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC get client allocs time	Time elapsed for Node.GetClientAllocs RPC call.	Dependent item	nomad.server.client.getclientallocs Preprocessing Prometheus pattern: `VALUE(nomad_nomad_client_get_client_allocs_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: RPC eval dequeue time	Time elapsed for Eval.Dequeue RPC call.	Dependent item	nomad.server.client.dequeue Preprocessing Prometheus pattern: `VALUE(nomad_nomad_eval_dequeue_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Vault token last renewal	Time since last successful Vault token renewal.	Dependent item	nomad.server.vault.tokenlastrenewal Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_token_last_renewal)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
HashiCorp Nomad Server: Vault token next renewal	Time until next Vault token renewal attempt.	Dependent item	nomad.server.vault.tokennextrenewal Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_token_next_renewal)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
HashiCorp Nomad Server: Vault token TTL	Time to live for Vault token.	Dependent item	nomad.server.vault.token_ttl Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_token_ttl)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
HashiCorp Nomad Server: Vault tokens revoked	Count of revoked tokens.	Dependent item	nomad.server.vault.distributedtokensrevoked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_vault_distributed_tokens_revoking)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Jobs dead	Number of dead jobs.	Dependent item	nomad.server.job_status.dead Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_status_dead)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Jobs pending	Number of pending jobs.	Dependent item	nomad.server.job_status.pending Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_status_pending)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Jobs running	Number of running jobs.	Dependent item	nomad.server.job_status.running Preprocessing Prometheus pattern: `VALUE(nomad_nomad_job_status_running)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations completed	Number of complete allocations for a job.	Dependent item	nomad.server.job_summary.complete Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_complete)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations failed	Number of failed allocations for a job.	Dependent item	nomad.server.job_summary.failed Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_failed)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations lost	Number of lost allocations for a job.	Dependent item	nomad.server.job_summary.lost Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_lost)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations unknown	Number of unknown allocations for a job.	Dependent item	nomad.server.job_summary.unknown Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_unknown)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations queued	Number of queued allocations for a job.	Dependent item	nomad.server.job_summary.queued Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_queued)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations running	Number of running allocations for a job.	Dependent item	nomad.server.job_summary.running Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_running)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Job allocations starting	Number of starting allocations for a job.	Dependent item	nomad.server.job_summary.starting Preprocessing Prometheus pattern: `SUM(nomad_nomad_job_summary_starting)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Gossip time	Time elapsed to broadcast gossip messages.	Dependent item	nomad.server.memberlist.gossip Preprocessing Prometheus pattern: `VALUE(nomad_memberlist_gossip_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Leader barrier time	Time elapsed to establish a raft barrier during leader transition.	Dependent item	nomad.server.leader.barrier Preprocessing Prometheus pattern: `VALUE(nomad_nomad_leader_barrier_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Reconcile peer time	Time elapsed to reconcile a serf peer with state store.	Dependent item	nomad.server.leader.reconcile_member Preprocessing Prometheus pattern: `VALUE(nomad_nomad_leader_reconcileMember_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Total reconcile time	Time elapsed to reconcile all serf peers with state store.	Dependent item	nomad.server.leader.reconcile Preprocessing Prometheus pattern: `VALUE(nomad_nomad_leader_reconcile_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Leader last contact	Time since last contact to leader. General indicator of Raft latency.	Dependent item	nomad.server.raft.leader.lastContact Preprocessing Prometheus pattern: `VALUE(nomad_raft_leader_lastContact{quantile="0.99"})` ⛔️Custom on fail: Discard value Replace: `NaN -> 0` Custom multiplier: `0.001`
HashiCorp Nomad Server: Plan queue	Count of evals in the plan queue.	Dependent item	nomad.server.plan.queue_depth Preprocessing Prometheus pattern: `VALUE(nomad_nomad_plan_queue_depth)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Worker evaluation create time	Time elapsed for worker to create an eval.	Dependent item	nomad.server.worker.create_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Worker evaluation dequeue time	Time elapsed for worker to dequeue an eval.	Dependent item	nomad.server.worker.dequeue_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Worker invoke scheduler time	Time elapsed for worker to invoke the scheduler.	Dependent item	nomad.server.worker.invokeschedulerservice Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_invoke_scheduler_service_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Worker acknowledgement send time	Time elapsed for worker to send acknowledgement.	Dependent item	nomad.server.worker.send_ack Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_send_ack_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Worker submit plan time	Time elapsed for worker to submit plan.	Dependent item	nomad.server.worker.submit_plan Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_submit_plan_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Worker update evaluation time	Time elapsed for worker to submit updated eval.	Dependent item	nomad.server.worker.update_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_update_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Worker log replication time	Time elapsed that worker waits for the raft index of the eval to be processed.	Dependent item	nomad.server.worker.waitforindex Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_wait_for_index_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Raft calls blocked, rate	Count of blocking raft API calls.	Dependent item	nomad.server.raft.barrier Preprocessing Prometheus pattern: `VALUE(nomad_raft_barrier)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: Raft commit logs enqueued	Count of logs enqueued.	Dependent item	nomad.server.raft.commitnumlogs Preprocessing Prometheus pattern: `VALUE(nomad_raft_commitNumLogs)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Raft transactions, rate	Number of Raft transactions.	Dependent item	nomad.server.raft.apply Preprocessing Prometheus pattern: `VALUE(nomad_raft_apply)` ⛔️Custom on fail: Set value to: `0` Change per second
HashiCorp Nomad Server: Raft commit time	Time elapsed to commit writes.	Dependent item	nomad.server.raft.commit_time Preprocessing Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Raft transaction commit time	Raft transaction commit time.	Dependent item	nomad.server.raft.replication.appendEntries Preprocessing Prometheus pattern: `AVG(nomad_raft_replication_appendEntries_rpc)` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
HashiCorp Nomad Server: FSM apply time	Time elapsed to apply write to FSM.	Dependent item	nomad.server.raft.fsm.apply Preprocessing Prometheus pattern: `VALUE(nomad_raft_fsm_apply_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM enqueue time	Time elapsed to enqueue write to FSM.	Dependent item	nomad.server.raft.fsm.enqueue Preprocessing Prometheus pattern: `VALUE(nomad_raft_fsm_enqueue_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM autopilot time	Time elapsed to apply Autopilot raft entry.	Dependent item	nomad.server.raft.fsm.autopilot Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_autopilot_sum)` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM register node time	Time elapsed to apply RegisterNode raft entry.	Dependent item	nomad.server.raft.fsm.register_node Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_register_node_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM index	Current index applied to FSM.	Dependent item	nomad.server.raft.applied_index Preprocessing Prometheus pattern: `VALUE(nomad_raft_appliedIndex)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Raft last index	Most recent index seen.	Dependent item	nomad.server.raft.last_index Preprocessing Prometheus pattern: `VALUE(nomad_raft_lastIndex)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Dispatch log time	Time elapsed to write log, mark in flight, and start replication.	Dependent item	nomad.server.raft.leader.dispatch_log Preprocessing Prometheus pattern: `VALUE(nomad_raft_leader_dispatchLog_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Logs dispatched	Count of logs dispatched.	Dependent item	nomad.server.raft.leader.dispatchnumlogs Preprocessing Prometheus pattern: `VALUE(nomad_raft_leader_dispatchNumLogs)` ⛔️Custom on fail: Set value to: `0`
HashiCorp Nomad Server: Heartbeat fails	Count of failing to heartbeat and starting election.	Dependent item	nomad.server.raft.transition.heartbeat_timeout Preprocessing Prometheus pattern: `VALUE(nomad_raft_transition_heartbeat_timeout)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Server: Objects freed, rate	Count of objects freed from heap by go runtime GC.	Dependent item	nomad.server.runtime.free_count Preprocessing Prometheus pattern: `VALUE(nomad_runtime_free_count)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: GC pause time	Go runtime GC pause times.	Dependent item	nomad.server.runtime.gcpausens Preprocessing Prometheus pattern: `VALUE(nomad_runtime_gc_pause_ns_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: GC metadata size	Go runtime GC metadata size in bytes.	Dependent item	nomad.server.runtime.sys_bytes Preprocessing Prometheus pattern: `VALUE(nomad_runtime_sys_bytes)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: GC runs	Count of go runtime GC runs.	Dependent item	nomad.server.runtime.totalgcruns Preprocessing Prometheus pattern: `VALUE(nomad_runtime_total_gc_runs)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Memberlist events	Count of memberlist events received.	Dependent item	nomad.server.serf.queue.event Preprocessing Prometheus pattern: `VALUE(nomad_serf_queue_Event_sum)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Memberlist changes	Count of memberlist changes.	Dependent item	nomad.server.serf.queue.intent Preprocessing Prometheus pattern: `VALUE(nomad_serf_queue_Intent_sum)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Memberlist queries	Count of memberlist queries.	Dependent item	nomad.server.serf.queue.queries Preprocessing Prometheus pattern: `VALUE(nomad_serf_queue_Query_sum)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Snapshot index	Current snapshot index.	Dependent item	nomad.server.state.snapshot.index Preprocessing Prometheus pattern: `VALUE(nomad_state_snapshotIndex)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Services ready to schedule	Count of service evals ready to be scheduled.	Dependent item	nomad.server.broker.service_ready Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_service_ready)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Services unacknowledged	Count of unacknowledged service evals.	Dependent item	nomad.server.broker.service_unacked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_service_unacked)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: System evaluations ready to schedule	Count of service evals ready to be scheduled.	Dependent item	nomad.server.broker.system_ready Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_system_ready)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: System evaluations unacknowledged	Count of unacknowledged system evals.	Dependent item	nomad.server.broker.system_unacked Preprocessing Prometheus pattern: `VALUE(nomad_nomad_broker_system_unacked)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB free pages	Number of BoltDB free pages.	Dependent item	nomad.server.raft.boltdb.numfreepages Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_numFreePages)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB pending pages	Number of BoltDB pending pages.	Dependent item	nomad.server.raft.boltdb.numpendingpages Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_numPendingPages)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB free page bytes	Number of free page bytes.	Dependent item	nomad.server.raft.boltdb.freepagebytes Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_freePageBytes)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB freelist bytes	Number of freelist bytes.	Dependent item	nomad.server.raft.boltdb.freelist_bytes Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_freelistBytes)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB read transactions, rate	Count of total read transactions.	Dependent item	nomad.server.raft.boltdb.totalreadtxn Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_totalReadTxn)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB open read transactions	Number of current open read transactions.	Dependent item	nomad.server.raft.boltdb.openreadtxn Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_openReadTxn)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB pages in use	Number of pages in use.	Dependent item	nomad.server.raft.boltdb.txstats.page_count Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_pageCount)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: BoltDB page allocations, rate	Number of page allocations.	Dependent item	nomad.server.raft.boltdb.txstats.page_alloc Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_pageAlloc)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB cursors	Count of total database cursors.	Dependent item	nomad.server.raft.boltdb.txstats.cursor_count Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_cursorCount)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB nodes, rate	Count of total database nodes.	Dependent item	nomad.server.raft.boltdb.txstats.node_count Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_nodeCount)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB node dereferences, rate	Count of total database node dereferences.	Dependent item	nomad.server.raft.boltdb.txstats.node_deref Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_nodeDeref)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB rebalance operations, rate	Count of total rebalance operations.	Dependent item	nomad.server.raft.boltdb.txstats.rebalance Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_rebalance)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB split operations, rate	Count of total split operations.	Dependent item	nomad.server.raft.boltdb.txstats.split Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_split)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB spill operations, rate	Count of total spill operations.	Dependent item	nomad.server.raft.boltdb.txstats.spill Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_spill)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB write operations, rate	Count of total write operations.	Dependent item	nomad.server.raft.boltdb.txstats.write Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_write)` ⛔️Custom on fail: Discard value Change per second
HashiCorp Nomad Server: BoltDB rebalance time	Sample of rebalance operation times.	Dependent item	nomad.server.raft.boltdb.txstats.rebalance_time Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_rebalanceTime_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: BoltDB spill time	Sample of spill operation times.	Dependent item	nomad.server.raft.boltdb.txstats.spill_time Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_spillTime_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: BoltDB write time	Sample of write operation times.	Dependent item	nomad.server.raft.boltdb.txstats.write_time Preprocessing Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_writeTime_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Service [rpc] state	Current [rpc] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Server: Service [serf] state	Current [serf] service state.	Simple check	net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}] Preprocessing Discard unchanged with heartbeat: `1h`
HashiCorp Nomad Server: Namespace list time	Time elapsed for Namespace.ListNamespaces.	Dependent item	nomad.server.namespace.list_namespace Preprocessing Prometheus pattern: `VALUE(nomad_nomad_namespace_list_namespace_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Autopilot state	Current autopilot state.	Dependent item	nomad.server.autopilot.state Preprocessing Prometheus pattern: `VALUE(nomad_nomad_autopilot_healthy)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: Autopilot failure tolerance	The number of redundant healthy servers that can fail without causing an outage.	Dependent item	nomad.server.autopilot.failure_tolerance Preprocessing Prometheus pattern: `VALUE(nomad_nomad_autopilot_failure_tolerance)` ⛔️Custom on fail: Discard value
HashiCorp Nomad Server: FSM allocation client update time	Time elapsed to apply AllocClientUpdate raft entry.	Dependent item	nomad.server.allocclientupdate Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_alloc_client_update_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM apply plan results time	Time elapsed to apply ApplyPlanResults raft entry.	Dependent item	nomad.server.fsm.applyplanresults Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_apply_plan_results_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM update evaluation time	Time elapsed to apply UpdateEval raft entry.	Dependent item	nomad.server.fsm.update_eval Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_update_eval_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: FSM job registration time	Time elapsed to apply RegisterJob raft entry.	Dependent item	nomad.server.fsm.register_job Preprocessing Prometheus pattern: `VALUE(nomad_nomad_fsm_register_job_sum)` ⛔️Custom on fail: Discard value Custom multiplier: `1e-09`
HashiCorp Nomad Server: Allocation reschedule attempts	Count of attempts to reschedule an allocation.	Dependent item	nomad.server.scheduler.allocs.rescheduled.attempted Preprocessing Prometheus pattern: `SUM(nomad_scheduler_allocs_reschedule_attempted)` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity
HashiCorp Nomad Server: Monitoring API connection has failed	Monitoring API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Server by HTTP/nomad.server.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Internal stats API connection has failed	Internal stats API connection has failed. Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.	`find(/HashiCorp Nomad Server by HTTP/nomad.server.stats.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`\|Average	Manual close: Yes Depends on: HashiCorp Nomad Server: Monitoring API connection has failed
HashiCorp Nomad Server: Nomad server version has changed	Nomad server version has changed.	`change(/HashiCorp Nomad Server by HTTP/nomad.server.version)<>0`\|Info	Manual close: Yes
HashiCorp Nomad Server: Cluster role has changed	Cluster role has changed.	`change(/HashiCorp Nomad Server by HTTP/nomad.server.raft.cluster_role) <> 0`\|Info	Manual close: Yes
HashiCorp Nomad Server: Current number of open files is too high	Heavy file descriptor usage (i.e., near the process file descriptor limit) indicates a potential file descriptor exhaustion issue.	`min(/HashiCorp Nomad Server by HTTP/nomad.server.process_open_fds,5m)/last(/HashiCorp Nomad Server by HTTP/nomad.server.process_max_fds)*100>{$NOMAD.OPEN.FDS.MAX}`\|Warning
HashiCorp Nomad Server: Dead jobs found	Jobs with the `Dead` state discovered. Check the {$NOMAD.SERVER.API.SCHEME}://{HOST.IP}:{$NOMAD.SERVER.API.PORT}/v1/jobs URL for the details.	`last(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead) > 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead,5m) = 0`\|Warning	Manual close: Yes
HashiCorp Nomad Server: Leader last contact timeout exceeded	The nomad.raft.leader.lastContact metric is a general indicator of Raft latency which can be used to observe how Raft timing is performing and guide infrastructure provisioning. If this number trends upwards, look at CPU, disk IOPs, and network latency. nomad.raft.leader.lastContact should not get too close to the leader lease timeout of 500ms.	`min(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) >= {$NOMAD.SERVER.LEADER.LATENCY} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) = 0`\|Warning
HashiCorp Nomad Server: Service [rpc] is down	Cannot establish the connection to [rpc] service port {$NOMAD.SERVER.RPC.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Service [serf] is down	Cannot establish the connection to [serf] service port {$NOMAD.SERVER.SERF.PORT}. Check the Nomad state and network connectivity between Nomad and Zabbix.	`last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}]) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Autopilot is unhealthy	The autopilot is in unhealthy state. The successful failover probability is extremely low.	`last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state) = 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state,5m) = 0`\|Average	Manual close: Yes
HashiCorp Nomad Server: Autopilot redundancy is low	The autopilot redundancy is low. Cluster crash risk is high due to one more server failure.	`last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance) < {$NOMAD.REDUNDANCY.MIN} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance,5m) = 0`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_plus_http

View README Download JSON

NGINX Plus by HTTP

Overview

This template is designed for the effortless deployment of Nginx Plus monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

NGINX Plus 1.19.10

Configuration

Setup

Enable the NGINX Plus API. > Refer to the vendor documentation.
Set the {$NGINX.API.ENDPOINT} such as <scheme>://<host>:<port>/<location>/.

Note that depending on the number of zones and upstreams discovery operation may be expensive. Therefore, use the following filters with these macros:

{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES}
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES}
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES}
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES}
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES}
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES}
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES}
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES}
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES}
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES}
{$NGINX.LLD.FILTER.RESOLVER.MATCHES}
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES}

Macros used

Name	Description	Default
{$NGINX.API.ENDPOINT}	NGINX Plus API URL in the format `<scheme>://<host>:<port>/<location>/`.
{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES}	The filter to include the necessary discovered HTTP server zones.	`.*`
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES}	The filter to exclude discovered HTTP server zones.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES}	The filter to include the necessary discovered HTTP location zones.	`.*`
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES}	The filter to exclude discovered HTTP location zones.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES}	The filter to include the necessary discovered HTTP upstreams.	`.*`
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES}	The filter to exclude discovered HTTP upstreams.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES}	The filter to include discovered server zones of the "stream" directive.	`.*`
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES}	The filter to exclude discovered server zones of the "stream" directive.	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES}	The filter to include the necessary discovered upstreams of the "stream" directive.	`.*`
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES}	The filter to exclude discovered upstreams of the "stream" directive	`CHANGE_IF_NEEDED`
{$NGINX.LLD.FILTER.RESOLVER.MATCHES}	The filter to include the necessary discovered `Resolvers`.	`.*`
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES}	The filter to exclude discovered `Resolvers`.	`CHANGE_IF_NEEDED`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`
{$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}	The maximum percentage of errors with the status code `4xx` (for a trigger expression).	`5`
{$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}	The maximum percentage of errors with the status code `5xx` (for a trigger expression).	`5`

Items

Name	Description	Type	Key and additional info
Nginx: Get info	Return status of the NGINX running instance.	HTTP agent	nginx.info
Nginx: Get connections	Returns the statistics of client connections.	HTTP agent	nginx.connections
Nginx: Get SSL	Returns the SSL statistics.	HTTP agent	nginx.ssl
Nginx: Get requests	Returns the status of the client's HTTP requests.	HTTP agent	nginx.requests
Nginx: Get HTTP zones	Returns the status information for each HTTP server zone.	HTTP agent	nginx.http.server_zones
Nginx: Get HTTP location zones	Returns the status information for each HTTP location zone.	HTTP agent	nginx.http.location_zones
Nginx: Get HTTP upstreams	Returns the status of each HTTP upstream server group and its servers.	HTTP agent	nginx.http.upstreams
Nginx: Get Stream server zones	Returns the status information for each server zone configured in the "stream" directive.	HTTP agent	nginx.stream.server_zones
Nginx: Get Stream upstreams	Returns status of each stream upstream server group and its servers.	HTTP agent	nginx.stream.upstreams
Nginx: Get resolvers	Returns the status information for each Resolver zone.	HTTP agent	nginx.resolvers
Nginx: Get info error	The description of NGINX errors.	Dependent item	nginx.info.error Preprocessing JSON Path: `$.error.text` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Nginx: Version	A version number of NGINX.	Dependent item	nginx.info.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `3h`
Nginx: Address	The address of the server that accepted status request.	Dependent item	nginx.info.address Preprocessing JSON Path: `$.address` Discard unchanged with heartbeat: `3h`
Nginx: Generation	The total number of configuration reloads.	Dependent item	nginx.info.generation Preprocessing JSON Path: `$.generation` Discard unchanged with heartbeat: `30m`
Nginx: Uptime	The server uptime.	Dependent item	nginx.info.uptime Preprocessing JSON Path: `$.load_timestamp` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Nginx: Connections accepted, rate	The total number of accepted client connections per second.	Dependent item	nginx.connections.accepted.rate Preprocessing JSON Path: `$.accepted` Change per second
Nginx: Connections dropped	The total number of dropped client connections.	Dependent item	nginx.connections.dropped Preprocessing JSON Path: `$.dropped`
Nginx: Connections active	The current number of active client connections.	Dependent item	nginx.connections.active Preprocessing JSON Path: `$.active`
Nginx: Connections idle	The current number of idle client connections.	Dependent item	nginx.connections.idle Preprocessing JSON Path: `$.idle`
Nginx: SSL handshakes, rate	The total number of successful SSL handshakes per second.	Dependent item	nginx.ssl.handshakes.rate Preprocessing JSON Path: `$.handshakes` Change per second
Nginx: SSL handshakes failed, rate	The total number of failed SSL handshakes per second.	Dependent item	nginx.ssl.handshakes_failed.rate Preprocessing JSON Path: `$.handshakes_failed` Change per second
Nginx: SSL session reuses, rate	The total number of session reuses during SSL handshake per second.	Dependent item	nginx.ssl.session_reuses.rate Preprocessing JSON Path: `$.session_reuses` Change per second
Nginx: Requests total, rate	The total number of client requests per second.	Dependent item	nginx.requests.total.rate Preprocessing JSON Path: `$.total` Change per second
Nginx: Requests current	The current number of client requests.	Dependent item	nginx.requests.current Preprocessing JSON Path: `$.current`

Triggers

Name	Description	Expression	Severity
Nginx: Server response error		`length(last(/NGINX Plus by HTTP/nginx.info.error))>0`\|High
Nginx: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/NGINX Plus by HTTP/nginx.info.version,#1)<>last(/NGINX Plus by HTTP/nginx.info.version,#2) and length(last(/NGINX Plus by HTTP/nginx.info.version))>0`\|Info	Manual close: Yes
Nginx: Host has been restarted	Uptime is less than 10 minutes.	`last(/NGINX Plus by HTTP/nginx.info.uptime)<10m`\|Info	Manual close: Yes
Nginx: Failed to fetch info data	Zabbix has not received any data for metrics for the last 30 minutes	`nodata(/NGINX Plus by HTTP/nginx.info.uptime,30m)=1`\|Warning	Manual close: Yes
Nginx: High connections drop rate	The rate of dropped connections is greater than `{$NGINX.DROP_RATE.MAX.WARN}` for the last 5 minutes.	`min(/NGINX Plus by HTTP/nginx.connections.dropped,5m) > {$NGINX.DROP_RATE.MAX.WARN}`\|Warning

LLD rule HTTP server zones discovery

Name Description Type Key and additional info

HTTP server zones discovery

Dependent item

nginx.http.server_zones.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP server zones discovery

Name	Description	Type	Key and additional info
Nginx: HTTP server zone [{#NAME}]: Raw data	The raw data of the HTTP server zone with the name `{#NAME}` .	Dependent item	nginx.http.server_zones.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Nginx: HTTP server zone [{#NAME}]: Processing	The number of client requests that are currently being processed.	Dependent item	nginx.http.server_zones.processing[{#NAME}] Preprocessing JSON Path: `$.processing`
Nginx: HTTP server zone [{#NAME}]: Requests, rate	The total number of client requests received from clients per second.	Dependent item	nginx.http.server_zones.requests.rate[{#NAME}] Preprocessing JSON Path: `$.requests` Change per second
Nginx: HTTP server zone [{#NAME}]: Responses 1xx, rate	The number of responses with `1xx` status code per second.	Dependent item	nginx.http.server_zones.responses.1xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.1xx` Change per second
Nginx: HTTP server zone [{#NAME}]: Responses 2xx, rate	The number of responses with `2xx` status code per second.	Dependent item	nginx.http.server_zones.responses.2xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.2xx` Change per second
Nginx: HTTP server zone [{#NAME}]: Responses 3xx, rate	The number of responses with `3xx` status code per second.	Dependent item	nginx.http.server_zones.responses.3xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.3xx` Change per second
Nginx: HTTP server zone [{#NAME}]: Responses 4xx, rate	The number of responses with `4xx` status code per second.	Dependent item	nginx.http.server_zones.responses.4xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.4xx` Change per second
Nginx: HTTP server zone [{#NAME}]: Responses 5xx, rate	The number of responses with `5xx` status code per second.	Dependent item	nginx.http.server_zones.responses.5xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.5xx` Change per second
Nginx: HTTP server zone [{#NAME}]: Responses total, rate	The total number of responses sent to clients per second.	Dependent item	nginx.http.server_zones.responses.total.rate[{#NAME}] Preprocessing JSON Path: `$.responses.total` Change per second
Nginx: HTTP server zone [{#NAME}]: Discarded, rate	The total number of requests completed without sending a response per second.	Dependent item	nginx.http.server_zones.discarded.rate[{#NAME}] Preprocessing JSON Path: `$.discarded` Change per second
Nginx: HTTP server zone [{#NAME}]: Received, rate	The total number of bytes received from clients per second.	Dependent item	nginx.http.server_zones.received.rate[{#NAME}] Preprocessing JSON Path: `$.received` Change per second
Nginx: HTTP server zone [{#NAME}]: Sent, rate	The total number of bytes sent to clients per second.	Dependent item	nginx.http.server_zones.sent.rate[{#NAME}] Preprocessing JSON Path: `$.sent` Change per second

LLD rule HTTP location zones discovery

Name Description Type Key and additional info

HTTP location zones discovery

Dependent item

nginx.http.location_zones.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP location zones discovery

Name	Description	Type	Key and additional info
Nginx: HTTP location zone [{#NAME}]: Raw data	The raw data of the location zone with the name `{#NAME}`.	Dependent item	nginx.http.location_zones.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Nginx: HTTP location zone [{#NAME}]: Requests, rate	The total number of client requests received from clients per second.	Dependent item	nginx.http.location_zones.requests.rate[{#NAME}] Preprocessing JSON Path: `$.requests` Change per second
Nginx: HTTP location zone [{#NAME}]: Responses 1xx, rate	The number of responses with `1xx` status code per second.	Dependent item	nginx.http.location_zones.responses.1xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.1xx` Change per second
Nginx: HTTP location zone [{#NAME}]: Responses 2xx, rate	The number of responses with `2xx` status code per second.	Dependent item	nginx.http.location_zones.responses.2xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.2xx` Change per second
Nginx: HTTP location zone [{#NAME}]: Responses 3xx, rate	The number of responses with `3xx` status code per second.	Dependent item	nginx.http.location_zones.responses.3xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.3xx` Change per second
Nginx: HTTP location zone [{#NAME}]: Responses 4xx, rate	The number of responses with `4xx` status code per second.	Dependent item	nginx.http.location_zones.responses.4xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.4xx` Change per second
Nginx: HTTP location zone [{#NAME}]: Responses 5xx, rate	The number of responses with `5xx` status code per second.	Dependent item	nginx.http.location_zones.responses.5xx.rate[{#NAME}] Preprocessing JSON Path: `$.responses.5xx` Change per second
Nginx: HTTP location zone [{#NAME}]: Responses total, rate	The total number of responses sent to clients per second.	Dependent item	nginx.http.location_zones.responses.total.rate[{#NAME}] Preprocessing JSON Path: `$.responses.total` Change per second
Nginx: HTTP location zone [{#NAME}]: Discarded, rate	The total number of requests completed without sending a response per second.	Dependent item	nginx.http.location_zones.discarded.rate[{#NAME}] Preprocessing JSON Path: `$.discarded` Change per second
Nginx: HTTP location zone [{#NAME}]: Received, rate	The total number of bytes received from clients per second.	Dependent item	nginx.http.location_zones.received.rate[{#NAME}] Preprocessing JSON Path: `$.received` Change per second
Nginx: HTTP location zone [{#NAME}]: Sent, rate	The total number of bytes sent to clients per second.	Dependent item	nginx.http.location_zones.sent.rate[{#NAME}] Preprocessing JSON Path: `$.sent` Change per second

LLD rule HTTP upstreams discovery

Name Description Type Key and additional info

HTTP upstreams discovery

Dependent item

nginx.http.upstreams.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP upstreams discovery

Name	Description	Type	Key and additional info
Nginx: HTTP upstream [{#NAME}]: Raw data	The raw data of the HTTP upstream with the name `{#NAME}`.	Dependent item	nginx.http.upstreams.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Nginx: HTTP upstream [{#NAME}]: Keepalive	The current number of idle keepalive connections.	Dependent item	nginx.http.upstreams.keepalive[{#NAME}] Preprocessing JSON Path: `$.keepalive`
Nginx: HTTP upstream [{#NAME}]: Zombies	The current number of servers removed from the group but still processing active client requests.	Dependent item	nginx.http.upstreams.zombies[{#NAME}] Preprocessing JSON Path: `$.zombies`
Nginx: HTTP upstream [{#NAME}]: Zone	The name of the shared memory zone that keeps the group's configuration and run-time state.	Dependent item	nginx.http.upstreams.zone[{#NAME}] Preprocessing JSON Path: `$.zone` Discard unchanged with heartbeat: `3h`

LLD rule HTTP upstream peers discovery

Name Description Type Key and additional info

HTTP upstream peers discovery

Dependent item

nginx.http.upstream.peers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for HTTP upstream peers discovery

Name	Description	Type	Key and additional info
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data	The raw data of the HTTP upstream with the name `[{#UPSTREAM}]`and peer with the name`[{#PEER}]`.	Dependent item	nginx.http.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$['{#UPSTREAM}'].peers[?(@.server == '{#PEER}')].first()`
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: State	The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”.	Dependent item	nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.state` Discard unchanged with heartbeat: `3h`
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Active	The current number of active connections.	Dependent item	nginx.http.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.active`
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Requests, rate	The total number of client requests forwarded to this server per second.	Dependent item	nginx.http.upstream.peer.requests.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.requests` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 1xx, rate	The number of responses with `1xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.1xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.1xx` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 2xx, rate	The number of responses with `2xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.2xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.2xx` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 3xx, rate	The number of responses with `3xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.3xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.3xx` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 4xx, rate	The number of responses with `4xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.4xx` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 5xx, rate	The number of responses with `5xx` status code per second.	Dependent item	nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.5xx` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses total, rate	The total number of responses obtained from this server.	Dependent item	nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.responses.total` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate	The total number of bytes sent to this server per second.	Dependent item	nginx.http.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.sent` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate	The total number of bytes received from this server per second.	Dependent item	nginx.http.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.received` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate	The total number of unsuccessful attempts to communicate with the server per second.	Dependent item	nginx.http.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.fails` Change per second
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail	Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the `max_fails` threshold.	Dependent item	nginx.http.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.unavail`
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Header time	The average time to get the response header from the server.	Dependent item	nginx.http.upstream.peer.header_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.header_time` ⛔️Custom on fail: Discard value
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Response time	The average time to get the full response from the server.	Dependent item	nginx.http.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.response_time` ⛔️Custom on fail: Discard value
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check	The total number of health check requests made.	Dependent item	nginx.http.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.checks`
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails	The number of failed health checks.	Dependent item	nginx.http.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.fails`
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy	Displays how many times the server has become `unhealthy` (the state - “unhealthy”.	Dependent item	nginx.http.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.unhealthy`

Trigger prototypes for HTTP upstream peers discovery

Name	Description	Expression	Severity	Dependencies and additional info
Nginx: HTTP upstream server is not in UP or DOWN state.		`find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0`\|Warning
Nginx: Too many HTTP requests with code 4xx		`sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}/100))`\|Warning
Nginx: Too many HTTP requests with code 5xx		`sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}/100))`\|High

LLD rule Stream server zones discovery

Name Description Type Key and additional info

Stream server zones discovery

Dependent item

nginx.stream.server_zones.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Stream server zones discovery

Name	Description	Type	Key and additional info
Nginx: Stream server zone [{#NAME}]: Raw data	The raw data of server zone with the name `{#NAME}`, configured in the "stream" directive.	Dependent item	nginx.stream.server_zones.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Nginx: Stream server zone [{#NAME}]: Processing	The number of client connections that are currently being processed.	Dependent item	nginx.stream.server_zones.processing[{#NAME}] Preprocessing JSON Path: `$.processing`
Nginx: Stream server zone [{#NAME}]: Connections, rate	The total number of connections accepted from clients per second.	Dependent item	nginx.stream.server_zones.connections.rate[{#NAME}] Preprocessing JSON Path: `$.connections` Change per second
Nginx: Stream server zone [{#NAME}]: Sessions 2xx, rate	The total number of sessions completed with status code `2xx` per second.	Dependent item	nginx.stream.server_zones.sessions.2xx.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.2xx` Change per second
Nginx: Stream server zone [{#NAME}]: Sessions 4xx, rate	The total number of sessions completed with status code `4xx` per second.	Dependent item	nginx.stream.server_zones.sessions.4xx.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.4xx` Change per second
Nginx: Stream server zone [{#NAME}]: Sessions 5xx, rate	The total number of sessions completed with status code `5xx` per second.	Dependent item	nginx.stream.server_zones.sessions.5xx.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.5xx` Change per second
Nginx: Stream server zone [{#NAME}]: Sessions total, rate	The total number of completed client sessions per second.	Dependent item	nginx.stream.server_zones.sessions.total.rate[{#NAME}] Preprocessing JSON Path: `$.sessions.total` Change per second
Nginx: Stream server zone [{#NAME}]: Discarded, rate	The total number of connections completed without creating a session per second.	Dependent item	nginx.stream.server_zones.discarded.rate[{#NAME}] Preprocessing JSON Path: `$.discarded` Change per second
Nginx: Stream server zone [{#NAME}]: Received, rate	The total number of bytes received from clients per second.	Dependent item	nginx.stream.server_zones.received.rate[{#NAME}] Preprocessing JSON Path: `$.received` Change per second
Nginx: Stream server zone [{#NAME}]: Sent, rate	The total number of bytes sent to clients per second.	Dependent item	nginx.stream.server_zones.sent.rate[{#NAME}] Preprocessing JSON Path: `$.sent` Change per second

LLD rule Stream upstreams discovery

Name Description Type Key and additional info

Stream upstreams discovery

Dependent item

nginx.stream.upstreams.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Stream upstreams discovery

Name Description Type Key and additional info

Nginx: Stream upstream [{#NAME}]: Raw data

The raw data of the upstream with the name [{#UPSTREAM}], configured in the "stream" directive.

Dependent item

nginx.stream.upstreams.raw[{#NAME}]

Preprocessing

JSON Path: $['{#NAME}']

Nginx: Stream upstream [{#NAME}]: Zombies

Dependent item

nginx.stream.upstreams.zombies[{#NAME}]

Preprocessing

JSON Path: $.zombies

Nginx: Stream upstream [{#NAME}]: Zone

Dependent item

nginx.stream.upstreams.zone[{#NAME}]

Preprocessing

JSON Path: $.zone
Discard unchanged with heartbeat: 3h

LLD rule Stream upstream peers discovery

Name Description Type Key and additional info

Stream upstream peers discovery

Dependent item

nginx.stream.upstream.peers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Stream upstream peers discovery

Name	Description	Type	Key and additional info
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data	The raw data of the upstream with the name `[{#UPSTREAM}]`and peer with the name`[{#PEER}]`, configured in the "stream" directive.	Dependent item	nginx.stream.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$['{#UPSTREAM}'].peers[?(@.server == '{#PEER}')].first()`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: State	The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”.	Dependent item	nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.state` Discard unchanged with heartbeat: `3h`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Active	The current number of connections.	Dependent item	nginx.stream.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.active`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate	The total number of bytes sent to this server per second.	Dependent item	nginx.stream.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.sent` Change per second
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate	The total number of bytes received from this server per second.	Dependent item	nginx.stream.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.received` Change per second
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate	The total number of unsuccessful attempts to communicate with the server per second.	Dependent item	nginx.stream.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.fails` Change per second
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail	Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the `max_fails` threshold.	Dependent item	nginx.stream.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.unavail`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connections	The total number of client connections forwarded to this server.	Dependent item	nginx.stream.upstream.peer.connections.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.connections`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connect time	The average time to connect to the upstream server.	Dependent item	nginx.stream.upstream.peer.connect_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.connect_time` ⛔️Custom on fail: Discard value
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: First byte time	The average time to receive the first byte of data.	Dependent item	nginx.stream.upstream.peer.firstbytetime.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.first_byte_time` ⛔️Custom on fail: Discard value
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Response time	The average time to receive the last byte of data.	Dependent item	nginx.stream.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.response_time` ⛔️Custom on fail: Discard value
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check	The total number of health check requests made.	Dependent item	nginx.stream.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.checks`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails	The number of failed health checks.	Dependent item	nginx.stream.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.fails`
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy	Displays how many times the server has become `unhealthy` (the state - “unhealthy”).	Dependent item	nginx.stream.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing JSON Path: `$.health_checks.unhealthy`

Trigger prototypes for Stream upstream peers discovery

Name	Description	Expression	Severity	Dependencies and additional info
Nginx: Stream upstream server is not in UP or DOWN state.		`find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0`\|Warning

LLD rule Resolvers discovery

Name Description Type Key and additional info

Resolvers discovery

Dependent item

nginx.resolvers.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 30m

Item prototypes for Resolvers discovery

Name	Description	Type	Key and additional info
Nginx: Resolver [{#NAME}]: Raw data	The raw data of the `Resolver` with the name `{#NAME}`.	Dependent item	nginx.resolvers.raw[{#NAME}] Preprocessing JSON Path: `$['{#NAME}']`
Nginx: Resolver [{#NAME}]: Requests name, rate	The total number of requests to resolve names to addresses per second.	Dependent item	nginx.resolvers.requests.name.rate[{#NAME}] Preprocessing JSON Path: `$.requests.name` Change per second
Nginx: Resolver [{#NAME}]: Requests srv, rate	The total number of requests to resolve SRV records per second.	Dependent item	nginx.resolvers.requests.srv.rate[{#NAME}] Preprocessing JSON Path: `$.requests.srv` Change per second
Nginx: Resolver [{#NAME}]: Requests addr, rate	The total number of requests to resolve addresses to names per second.	Dependent item	nginx.resolvers.requests.addr.rate[{#NAME}] Preprocessing JSON Path: `$.requests.addr` Change per second
Nginx: Resolver [{#NAME}]: Responses noerror, rate	The total number of successful responses per second.	Dependent item	nginx.resolvers.responses.noerror.rate[{#NAME}] Preprocessing JSON Path: `$.responses.noerror` Change per second
Nginx: Resolver [{#NAME}]: Responses formerr, rate	The total number of `FORMERR` (format error) responses per second.	Dependent item	nginx.resolvers.responses.formerr.rate[{#NAME}] Preprocessing JSON Path: `$.responses.formerr` Change per second
Nginx: Resolver [{#NAME}]: Responses servfail, rate	The total number of `SERVFAIL` (server failure) responses per second.	Dependent item	nginx.resolvers.responses.servfail.rate[{#NAME}] Preprocessing JSON Path: `$.responses.servfail` Change per second
Nginx: Resolver [{#NAME}]: Responses nxdomain, rate	The total number of `NXDOMAIN` (host not found) responses per second.	Dependent item	nginx.resolvers.responses.nxdomain.rate[{#NAME}] Preprocessing JSON Path: `$.responses.nxdomain` Change per second
Nginx: Resolver [{#NAME}]: Responses notimp, rate	The total number of `NOTIMP` (unimplemented) responses per second.	Dependent item	nginx.resolvers.responses.notimp.rate[{#NAME}] Preprocessing JSON Path: `$.responses.notimp` Change per second
Nginx: Resolver [{#NAME}]: Responses refused, rate	The total number of `REFUSED` (operation refused) responses per second.	Dependent item	nginx.resolvers.responses.refused.rate[{#NAME}] Preprocessing JSON Path: `$.responses.refused` Change per second
Nginx: Resolver [{#NAME}]: Responses timedout, rate	The total number of timed out requests per second.	Dependent item	nginx.resolvers.responses.timedout.rate[{#NAME}] Preprocessing JSON Path: `$.responses.timedout` Change per second
Nginx: Resolver [{#NAME}]: Responses unknown, rate	The total number of requests completed with an unknown error per second.	Dependent item	nginx.resolvers.responses.unknown.rate[{#NAME}] Preprocessing JSON Path: `$.responses.unknown` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_http

View README Download JSON

Nginx by HTTP

Overview

This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling the module ngx_http_stub_status_module with HTTP agent remotely:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Nginx 1.17.2

Configuration

Setup

See the setup instructions for ngx_http_stub_status_module.

Test the availability of the http_stub_status_module with nginx -V 2>&1 | grep -o with-http_stub_status_module.

Example configuration of Nginx:

location = /basic_status {
    stub_status;
    allow <IP of your Zabbix server/proxy>;
    deny all;
}

Set the hostname or IP address of the Nginx host or Nginx container in the {$NGINX.STUB_STATUS.HOST} macro. You can also change the status page port in the {$NGINX.STUB_STATUS.PORT} macro, the status page scheme in the {$NGINX.STUB_STATUS.SCHEME} macro and the status page path in the {$NGINX.STUB_STATUS.PATH} macro if necessary.

Example answer from Nginx:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Macros used

Name	Description	Default
{$NGINX.STUB_STATUS.HOST}	The hostname or IP address of the Nginx host or Nginx container of a stub_status.	`<SET STUB_STATUS HOST>`
{$NGINX.STUB_STATUS.SCHEME}	The protocol http or https of Nginx stub_status host or container.	`http`
{$NGINX.STUB_STATUS.PATH}	The path of the `Nginx stub_status` page.	`basic_status`
{$NGINX.STUB_STATUS.PORT}	The port of the `Nginx stub_status` host or container.	`80`
{$NGINX.RESPONSE_TIME.MAX.WARN}	The maximum response time of Nginx expressed in seconds for a trigger expression.	`10`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`

Items

Name	Description	Type	Key and additional info
Nginx: Get stub status page	The following status information is provided: `Active connections` - the current number of active client connections including waiting connections. `Accepted` - the total number of accepted client connections. `Handled` - the total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections` limit). `Requests` - the total number of client requests. `Reading` - the current number of connections where Nginx is reading the request header. `Writing` - the current number of connections where Nginx is writing a response back to the client. `Waiting` - the current number of idle client connections waiting for a request. See also Module ngxhttpstubstatusmodule.	HTTP agent	nginx.getstubstatus
Nginx: Service status		Simple check	net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Nginx: Service response time		Simple check	net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"]
Nginx: Requests total	The total number of client requests.	Dependent item	nginx.requests.total Preprocessing Regular expression: `The text is too long. Please see the template.`
Nginx: Requests per second	The total number of client requests.	Dependent item	nginx.requests.total.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Nginx: Connections accepted per second	The total number of accepted client connections.	Dependent item	nginx.connections.accepted.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Nginx: Connections dropped per second	The total number of dropped client connections.	Dependent item	nginx.connections.dropped.rate Preprocessing JavaScript: `The text is too long. Please see the template.` Change per second
Nginx: Connections handled per second	The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections limit`).	Dependent item	nginx.connections.handled.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Nginx: Connections active	The current number of active client connections including waiting connections.	Dependent item	nginx.connections.active Preprocessing Regular expression: `Active connections: ([0-9]+) \1`
Nginx: Connections reading	The current number of connections where Nginx is reading the request header.	Dependent item	nginx.connections.reading Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \1`
Nginx: Connections waiting	The current number of idle client connections waiting for a request.	Dependent item	nginx.connections.waiting Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \3`
Nginx: Connections writing	The current number of connections where Nginx is writing a response back to the client.	Dependent item	nginx.connections.writing Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \2`
Nginx: Version		Dependent item	nginx.version Preprocessing Regular expression: `(?i)Server: nginx\/(.+(?<!\r)) \1` Discard unchanged with heartbeat: `1d`

Triggers

Name	Description	Expression	Severity
Nginx: Failed to fetch stub status page	Zabbix has not received any data for items for the last 30 minutes.	`find(/Nginx by HTTP/nginx.get_stub_status,,"iregexp","HTTP\/[\d.]+\s+200")=0 or nodata(/Nginx by HTTP/nginx.get_stub_status,30m)=1`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: Service is down		`last(/Nginx by HTTP/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0`\|Average	Manual close: Yes
Nginx: Service response time is too high		`min(/Nginx by HTTP/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: High connections drop rate	The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes.	`min(/Nginx by HTTP/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN}`\|Warning	Depends on: Nginx: Service is down
Nginx: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/Nginx by HTTP/nginx.version,#1)<>last(/Nginx by HTTP/nginx.version,#2) and length(last(/Nginx by HTTP/nginx.version))>0`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_nginx_agent

View README Download JSON

Nginx by Zabbix agent

Overview

This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Nginx by Zabbix agent - collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Note that this template doesn't support HTTPS and redirects (limitations of web.page.get).

It also uses Zabbix agent to collect Nginx Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Nginx 1.17.2

Configuration

Setup

See the setup instructions for ngxhttpstubstatusmodule. Test the availability of the http_stub_status_module nginx -V 2>&1 | grep -o with-http_stub_status_module.

Example configuration of Nginx:

location = /basic_status {
    stub_status;
    allow 127.0.0.1;
    allow ::1;
    deny all;
}

If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro.

Example answer from Nginx:

Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

Note that this template doesn't support https and redirects (limitations of web.page.get).

Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$NGINX.STUB_STATUS.HOST}	The hostname or IP address of the Nginx host or Nginx container of `astub_status`.	`localhost`
{$NGINX.STUB_STATUS.PATH}	The path of the `Nginx stub_status` page.	`basic_status`
{$NGINX.STUB_STATUS.PORT}	The port of the `Nginx stub_status` host or container.	`80`
{$NGINX.RESPONSE_TIME.MAX.WARN}	The maximum response time of Nginx expressed in seconds for a trigger expression.	`10`
{$NGINX.DROP_RATE.MAX.WARN}	The critical rate of the dropped connections for a trigger expression.	`1`
{$NGINX.PROCESS_NAME}	The process name filter for the Nginx process discovery.	`nginx`
{$NGINX.PROCESS.NAME.PARAMETER}	The process name of the Nginx server used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Nginx: Get stub status page	The following status information is provided: `Active connections` - the current number of active client connections including waiting connections. `Accepted` - the total number of accepted client connections. `Handled` - the total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections` limit). `Requests` - the total number of client requests. `Reading` - the current number of connections where Nginx is reading the request header. `Writing` - the current number of connections where Nginx is writing a response back to the client. `Waiting` - the current number of idle client connections waiting for a request. See also Module ngxhttpstubstatusmodule.	Zabbix agent	web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"]
Nginx: Service status		Zabbix agent	net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Nginx: Service response time		Zabbix agent	net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"]
Nginx: Requests total	The total number of client requests.	Dependent item	nginx.requests.total Preprocessing Regular expression: `The text is too long. Please see the template.`
Nginx: Requests per second	The total number of client requests.	Dependent item	nginx.requests.total.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Nginx: Connections accepted per second	The total number of accepted client connections.	Dependent item	nginx.connections.accepted.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Nginx: Connections dropped per second	The total number of dropped client connections.	Dependent item	nginx.connections.dropped.rate Preprocessing JavaScript: `The text is too long. Please see the template.` Change per second
Nginx: Connections handled per second	The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the `worker_connections limit`).	Dependent item	nginx.connections.handled.rate Preprocessing Regular expression: `The text is too long. Please see the template.` Change per second
Nginx: Connections active	The current number of active client connections including waiting connections.	Dependent item	nginx.connections.active Preprocessing Regular expression: `Active connections: ([0-9]+) \1`
Nginx: Connections reading	The current number of connections where Nginx is reading the request header.	Dependent item	nginx.connections.reading Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \1`
Nginx: Connections waiting	The current number of idle client connections waiting for a request.	Dependent item	nginx.connections.waiting Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \3`
Nginx: Connections writing	The current number of connections where Nginx is writing a response back to the client.	Dependent item	nginx.connections.writing Preprocessing Regular expression: `Reading: ([0-9]+) Writing: ([0-9]+) Waiting: ([0-9]+) \2`
Nginx: Version		Dependent item	nginx.version Preprocessing Regular expression: `(?i)Server: nginx\/(.+(?<!\r)) \1` Discard unchanged with heartbeat: `1d`
Nginx: Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$NGINX.PROCESS.NAME.PARAMETER},,,summary]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Nginx: Version has changed	The Nginx version has changed. Acknowledge to close the problem manually.	`last(/Nginx by Zabbix agent/nginx.version,#1)<>last(/Nginx by Zabbix agent/nginx.version,#2) and length(last(/Nginx by Zabbix agent/nginx.version))>0`\|Info	Manual close: Yes

LLD rule Nginx process discovery

Name	Description	Type	Key and additional info
Nginx process discovery	The discovery of Nginx process summary.	Dependent item	nginx.proc.discovery

Item prototypes for Nginx process discovery

Name	Description	Type	Key and additional info
Nginx: CPU utilization	The percentage of the CPU utilization by a process {#NGINX.NAME}.	Zabbix agent	proc.cpu.util[{#NGINX.NAME}]
Nginx: Get process data	The summary metrics aggregated by a process {#NGINX.NAME}.	Dependent item	nginx.proc.get[{#NGINX.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#NGINX.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#NGINX.NAME} data`
Nginx: Memory usage (vsize)	The summary of virtual memory used by a process {#NGINX.NAME} expressed in bytes.	Dependent item	nginx.proc.vmem[{#NGINX.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Nginx: Memory usage (rss)	The summary of resident set size memory used by a process {#NGINX.NAME} expressed in bytes.	Dependent item	nginx.proc.rss[{#NGINX.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Nginx: Memory usage, %	The percentage of real memory used by a process {#NGINX.NAME}.	Dependent item	nginx.proc.pmem[{#NGINX.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Nginx: Number of running processes	The number of running processes {#NGINX.NAME}.	Dependent item	nginx.proc.num[{#NGINX.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nginx process discovery

Name	Description	Expression	Severity
Nginx: Process is not running		`last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])=0`\|High
Nginx: Service is down		`last(/Nginx by Zabbix agent/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Average	Manual close: Yes
Nginx: High connections drop rate	The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes.	`min(/Nginx by Zabbix agent/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Depends on: Nginx: Service is down
Nginx: Service response time is too high		`min(/Nginx by Zabbix agent/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Manual close: Yes Depends on: Nginx: Service is down
Nginx: Failed to fetch stub status page	Zabbix has not received any data for items for the last 30 minutes.	`(find(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"iregexp","HTTP\/[\d.]+\s+200")=0 or nodata(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0`\|Warning	Manual close: Yes Depends on: Nginx: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

nextcloud_http

View README Download JSON

Nextcloud by HTTP

Overview

This template is designed for monitoring Nextcloud by HTTP via Zabbix, and it works without any external scripts. Nextcloud is a suite of client-server software for creating and using file hosting services. For more information, see the official documentation

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Nextcloud ver. 27.0.1

Configuration

Setup

Set macros {$NEXTCLOUD.USER.NAME}, {$NEXTCLOUD.USER.PASSWORD}, {$NEXTCLOUD.ADDRESS}. The user must be included in the Administrators group.

Macros used

Name	Description	Default
{$NEXTCLOUD.SCHEMA}	HTTP or HTTPS protocol of Nextcloud.	`https`
{$NEXTCLOUD.USER.NAME}	Nextcloud username.	`root`
{$NEXTCLOUD.USER.PASSWORD}	Nextcloud user password.	`<Put the password here>`
{$NEXTCLOUD.ADDRESS}	IP or DNS name of Nextcloud server.	`127.0.0.1`
{$NEXTCLOUD.LLD.FILTER.USER.MATCHES}	Filter of discoverable users by name.	`.*`
{$NEXTCLOUD.LLD.FILTER.USER.NOT_MATCHES}	Filter to exclude discovered users by name.	`CHANGE_IF_NEEDED`
{$NEXTCLOUD.USER.QUOTA.PUSED.MAX}	Storage utilization threshold.	`90`
{$NEXTCLOUD.USER.MAX.INACTIVE}	How many days a user can be inactive.	`30`
{$NEXTCLOUD.CPU.LOAD.MAX}	CPU load threshold (the number of processes in the system run queue).	`95`
{$NEXTCLOUD.MEM.PUSED.MAX}	Memory utilization threshold.	`90`
{$NEXTCLOUD.SWAP.PUSED.MAX}	Swap utilization threshold.	`90`
{$NEXTCLOUD.PHP.MEM.PUSED.MAX}	PHP memory utilization threshold.	`90`
{$NEXTCLOUD.STORAGE.FREE.MIN}	Free space threshold.	`1G`
{$NEXTCLOUD.PROXY}	Proxy HTTP(S) address.

Items

Name	Description	Type	Key and additional info
Nextcloud: Get server information	This item provides useful server information, such as CPU load, RAM usage, disk usage, number of users, etc. https://github.com/nextcloud/serverinfo	HTTP agent	nextcloud.serverinfo.get_data Preprocessing XML to JSON Check for not supported value ⛔️Custom on fail: Set value to: `<ocs><meta><status>failure</status><statuscode>999</statuscode><message/></meta><data><message>Unknown error</message></data></ocs>`
Nextcloud: Server information status	Server information API status	Dependent item	nextcloud.serverinfo.status Preprocessing JSON Path: `$.ocs.meta.message` Discard unchanged with heartbeat: `1h`
Nextcloud: Version	Nextcloud service version.	Dependent item	nextcloud.serverinfo.version Preprocessing JSON Path: `$.ocs.data.nextcloud.system.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Free space	The amount of free disk space.	Dependent item	nextcloud.serverinfo.freespace Preprocessing JSON Path: `$.ocs.data.nextcloud.system.freespace` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: CPU load, avg 1m	The average system load (the number of processes in the system run queue), last 1 minute.	Dependent item	nextcloud.serverinfo.cpu.avg.1m Preprocessing JSON Path: `$.ocs.data.nextcloud.system.cpuload.element[0]` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: CPU load, avg 5m	The average system load (the number of processes in the system run queue), last 5 minutes.	Dependent item	nextcloud.serverinfo.cpu.avg.5m Preprocessing JSON Path: `$.ocs.data.nextcloud.system.cpuload.element[1]` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: CPU load, avg 15m	The average system load (the number of processes in the system run queue), last 15 minutes.	Dependent item	nextcloud.serverinfo.cpu.avg.15m Preprocessing JSON Path: `$.ocs.data.nextcloud.system.cpuload.element[2]` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Memory total	The size of the RAM.	Dependent item	nextcloud.serverinfo.mem.total Preprocessing JSON Path: `$.ocs.data.nextcloud.system.mem_total` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Nextcloud: Memory free	The amount of free RAM.	Dependent item	nextcloud.serverinfo.mem.free Preprocessing JSON Path: `$.ocs.data.nextcloud.system.mem_free` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Nextcloud: Memory used, in %	RAM usage, in percent.	Dependent item	nextcloud.serverinfo.mem.pused Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Nextcloud: Swap total	The size of the swap memory.	Dependent item	nextcloud.serverinfo.swap.total Preprocessing JSON Path: `$.ocs.data.nextcloud.system.swap_total` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Nextcloud: Swap free	The amount of free swap.	Dependent item	nextcloud.serverinfo.swap.free Preprocessing JSON Path: `$.ocs.data.nextcloud.system.swap_free` ⛔️Custom on fail: Discard value Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
Nextcloud: Swap used, in %	Swap usage, in percent.	Dependent item	nextcloud.serverinfo.swap.pused Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Nextcloud: Apps installed	The number of installed applications.	Dependent item	nextcloud.serverinfo.apps.installed Preprocessing JSON Path: `$.ocs.data.nextcloud.system.apps.num_installed` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Apps update available	The number of applications for which an update is available.	Dependent item	nextcloud.serverinfo.apps.update Preprocessing JSON Path: `$.ocs.data.nextcloud.system.apps.num_updates_available` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Web server	Web server description.	Dependent item	nextcloud.serverinfo.apps.webserver Preprocessing JSON Path: `$.ocs.data.server.webserver` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP version	PHP version	Dependent item	nextcloud.serverinfo.php.version Preprocessing JSON Path: `$.ocs.data.server.php.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP memory limit	By default, the PHP memory limit is generally set to 128 MB, but it can be customized based on the application's specific needs. The php.ini file is usually the standard location to set the PHP memory limit.	Dependent item	nextcloud.serverinfo.php.memory.limit Preprocessing JSON Path: `$.ocs.data.server.php.memory_limit` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP memory used	PHP memory used	Dependent item	nextcloud.serverinfo.php.memory.used Preprocessing JSON Path: `$.ocs.data.server.php.opcache.memory_usage.used_memory` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP memory free	PHP free memory size.	Dependent item	nextcloud.serverinfo.php.memory.free Preprocessing JSON Path: `$.ocs.data.server.php.opcache.memory_usage.free_memory` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP memory wasted	Memory allocated to the service but not in use.	Dependent item	nextcloud.serverinfo.php.memory.wasted Preprocessing JSON Path: `$.ocs.data.server.php.opcache.memory_usage.wasted_memory` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP memory wasted, in %	Memory allocated to the service but not in use, in percent.	Dependent item	nextcloud.serverinfo.php.memory.wasted_percentage Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP memory used, in %	PHP memory used percentage	Dependent item	nextcloud.serverinfo.php.memory.pused Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Nextcloud: PHP maximum execution time	By default, the maximum execution time for PHP scripts is set to 30 seconds. If a script runs for longer than 30 seconds, PHP stops the script and reports an error. You can control the amount of time PHP allows scripts to run by changing the 'maxexecutiontime' directive in your php.ini file.	Dependent item	nextcloud.serverinfo.php.maxexecutiontime Preprocessing JSON Path: `$.ocs.data.server.php.max_execution_time` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: PHP maximum upload file size	By default, the maximum upload file size for PHP scripts is set to 128 megabytes. However, you may want to change this limit. For example, you can set a lower limit to prevent users from uploading large files to your site. To do this, change the 'uploadmaxfilesize' and 'postmaxsize' directives.	Dependent item	nextcloud.serverinfo.php.uploadmaxfilesize Preprocessing JSON Path: `$.ocs.data.server.php.upload_max_filesize` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Database type	Database type.	Dependent item	nextcloud.serverinfo.db.type Preprocessing JSON Path: `$.ocs.data.server.database.type` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Database version	Database description.	Dependent item	nextcloud.serverinfo.db.version Preprocessing JSON Path: `$.ocs.data.server.database.version` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Database size	Size of database.	Dependent item	nextcloud.serverinfo.db.size Preprocessing JSON Path: `$.ocs.data.server.database.size` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Active users, last 5 minutes	The number of active users in the last 5 minutes.	Dependent item	nextcloud.serverinfo.active_users.last5m Preprocessing JSON Path: `$.ocs.data.activeUsers.last5minutes` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Active users, last 1 hour	The number of active users in the last 1 hour.	Dependent item	nextcloud.serverinfo.active_users.last1h Preprocessing JSON Path: `$.ocs.data.activeUsers.last1hour` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: Active users, last 24 hours	The number of active users in the last day.	Dependent item	nextcloud.serverinfo.active_users.last24hours Preprocessing JSON Path: `$.ocs.data.activeUsers.last24hours` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Nextcloud: Server information unavailable	Failed to get server information.	`last(/Nextcloud by HTTP/nextcloud.serverinfo.status)<>"OK"`\|High
Nextcloud: Version has changed	Nextcloud version has changed. Acknowledge to close the problem manually.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.version))>0`\|Info	Manual close: Yes
Nextcloud: Disk space is low	Condition should be the following: - the disk free space is less than `{$NEXTCLOUD.STORAGE.FREE.MIN}`;	`last(/Nextcloud by HTTP/nextcloud.serverinfo.freespace)<{$NEXTCLOUD.STORAGE.FREE.MIN}`\|Average	Manual close: Yes
Nextcloud: CPU load is too high	High CPU load.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.cpu.avg.1m,5m) > {$NEXTCLOUD.CPU.LOAD.MAX}`\|Average
Nextcloud: High memory utilization	The system is running out of free memory.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.mem.pused,5m) > {$NEXTCLOUD.MEM.PUSED.MAX}`\|Average
Nextcloud: High swap utilization	The system is running out of free swap.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.swap.pused,5m) > {$NEXTCLOUD.SWAP.PUSED.MAX}`\|Average
Nextcloud: Number of installed apps has been changed	Applications have been installed or removed.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.apps.installed)<>0`\|Info	Manual close: Yes
Nextcloud: Application updates are available	Updates are available for some of the installed applications.	`last(/Nextcloud by HTTP/nextcloud.serverinfo.apps.update)<>0`\|Warning	Manual close: Yes
Nextcloud: PHP version has changed	The PHP version has changed. Acknowledge to close the problem manually.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.php.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.php.version))>0`\|Info	Manual close: Yes
Nextcloud: High PHP memory utilization	The PHP is running out of free memory.	`min(/Nextcloud by HTTP/nextcloud.serverinfo.php.memory.pused,5m) > {$NEXTCLOUD.PHP.MEM.PUSED.MAX}`\|Average
Nextcloud: Database version has changed	The Database version has changed. Acknowledge to close the problem manually.	`change(/Nextcloud by HTTP/nextcloud.serverinfo.db.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.db.version))>0`\|Info	Manual close: Yes

LLD rule Nextcloud: User discovery

Name Description Type Key and additional info

Nextcloud: User discovery

User discovery.

HTTP agent

nextcloud.user.discovery

Preprocessing

JSON Path: $.ocs.data.users
⛔️Custom on fail: Set value to: []
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Nextcloud: User discovery

Name	Description	Type	Key and additional info
Nextcloud: User "{#NEXTCLOUD.USER}": Get data	Get common information about user	HTTP agent	nextcloud.user.get_data[{#NEXTCLOUD.USER}] Preprocessing XML to JSON Check for not supported value ⛔️Custom on fail: Set value to: `<ocs><meta><status>failure</status><statuscode>999</statuscode><message/></meta><data><message>Unknown error</message></data></ocs>`
Nextcloud: User "{#NEXTCLOUD.USER}": Status	User account status.	Dependent item	nextcloud.user.enabled[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.enabled` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Storage location	The location of the user's store.	Dependent item	nextcloud.user.storageLocation[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.storageLocation` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Last login	The time the user has last logged in.	Dependent item	nextcloud.user.lastLogin[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.lastLogin` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Last login, days ago	The number of days since the user has last logged in.	Dependent item	nextcloud.user.inactive[{#NEXTCLOUD.USER}] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Quota free space	The size of the free available space in the user's storage.	Dependent item	nextcloud.user.quota.free[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.free` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Quota used space	The size of the used available space in the user storage.	Dependent item	nextcloud.user.quota.used[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.used` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Quota total space	The size of space available in the user's storage.	Dependent item	nextcloud.user.quota.total[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Quota used space, in %	Usage of the allocated storage space, in percent.	Dependent item	nextcloud.user.quota.pused[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.relative` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Quota	The size of space available in the user's storage.	Dependent item	nextcloud.user.quota[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.quota.quota` ⛔️Custom on fail: Discard value Replace: `none -> -99` Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Display name	User visible name.	Dependent item	nextcloud.user.displayname[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.displayname` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Nextcloud: User "{#NEXTCLOUD.USER}": Language	User language.	Dependent item	nextcloud.user.language[{#NEXTCLOUD.USER}] Preprocessing JSON Path: `$.ocs.data.language` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nextcloud: User discovery

Name	Description	Expression
Nextcloud: User "{#NEXTCLOUD.USER}" status changed	User account status has changed.	`change(/Nextcloud by HTTP/nextcloud.user.enabled[{#NEXTCLOUD.USER}]) = 1`\|Info
Nextcloud: User "{#NEXTCLOUD.USER}": inactive	The user has not logged in for more than {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"} days.	`last(/Nextcloud by HTTP/nextcloud.user.inactive[{#NEXTCLOUD.USER}]) > {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"}`\|Info
Nextcloud: User "{#NEXTCLOUD.USER}": High quota utilization	More than {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"} percent of the allocated storage space has been used.	`min(/Nextcloud by HTTP/nextcloud.user.quota.pused[{#NEXTCLOUD.USER}],5m) > {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_memcached

View README Download JSON

Memcached by Zabbix agent 2

Overview

This template is designed for the effortless deployment of Memcached monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Memcached 1.4, 1.5, 1.6

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Memcached monitoring plugin.

Test availability: zabbix_get -s memcached-host -k memcached.ping

Macros used

Name	Description	Default
{$MEMCACHED.CONN.URI}	Connection string in the URI format (password is not used). This param overwrites a value configured in the "Plugins.Memcached.Uri" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:11211"	`tcp://localhost:11211`
{$MEMCACHED.CONN.THROTTLED.MAX.WARN}	Maximum number of throttled connections per second	`1`
{$MEMCACHED.CONN.QUEUED.MAX.WARN}	Maximum number of queued connections per second	`1`
{$MEMCACHED.CONN.PRC.MAX.WARN}	Maximum percentage of connected clients	`80`
{$MEMCACHED.MEM.PUSED.MAX.WARN}	Maximum percentage of memory used	`90`

Items

Name	Description	Type	Key and additional info
Memcached: Get status		Zabbix agent	memcached.stats["{$MEMCACHED.CONN.URI}"]
Memcached: Ping		Zabbix agent	memcached.ping["{$MEMCACHED.CONN.URI}"] Preprocessing Discard unchanged with heartbeat: `10m`
Memcached: Max connections	Max number of concurrent connections	Dependent item	memcached.connections.max Preprocessing JSON Path: `$.max_connections` Discard unchanged with heartbeat: `30m`
Memcached: Maximum number of bytes	Maximum number of bytes allowed in cache. You can adjust this setting via a config file or the command line while starting your Memcached server.	Dependent item	memcached.config.limit_maxbytes Preprocessing JSON Path: `$.limit_maxbytes` Discard unchanged with heartbeat: `30m`
Memcached: CPU sys	System CPU consumed by the Memcached server	Dependent item	memcached.cpu.sys Preprocessing JSON Path: `$.rusage_system`
Memcached: CPU user	User CPU consumed by the Memcached server	Dependent item	memcached.cpu.user Preprocessing JSON Path: `$.rusage_user`
Memcached: Queued connections per second	Number of times that memcached has hit its connections limit and disabled its listener	Dependent item	memcached.connections.queued.rate Preprocessing JSON Path: `$.listen_disabled_num` Change per second
Memcached: New connections per second	Number of connections opened per second	Dependent item	memcached.connections.rate Preprocessing JSON Path: `$.total_connections` Change per second
Memcached: Throttled connections	Number of times a client connection was throttled. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation.	Dependent item	memcached.connections.throttled.rate Preprocessing JSON Path: `$.conn_yields` Change per second
Memcached: Connection structures	Number of connection structures allocated by the server	Dependent item	memcached.connections.structures Preprocessing JSON Path: `$.connection_structures`
Memcached: Open connections	The number of clients presently connected	Dependent item	memcached.connections.current Preprocessing JSON Path: `$.curr_connections`
Memcached: Commands: FLUSH per second	The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn't take place in production, so check your debug scripts.	Dependent item	memcached.commands.flush.rate Preprocessing JSON Path: `$.cmd_flush` Change per second
Memcached: Commands: GET per second	Number of GET requests received by server per second.	Dependent item	memcached.commands.get.rate Preprocessing JSON Path: `$.cmd_get` Change per second
Memcached: Commands: SET per second	Number of SET requests received by server per second.	Dependent item	memcached.commands.set.rate Preprocessing JSON Path: `$.cmd_set` Change per second
Memcached: Process id	PID of the server process	Dependent item	memcached.process_id Preprocessing JSON Path: `$.pid` Discard unchanged with heartbeat: `1d`
Memcached: Memcached version	Version of the Memcached server	Dependent item	memcached.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `1d`
Memcached: Uptime	Number of seconds since Memcached server start	Dependent item	memcached.uptime Preprocessing JSON Path: `$.uptime`
Memcached: Bytes used	Current number of bytes used to store items.	Dependent item	memcached.stats.bytes Preprocessing JSON Path: `$.bytes`
Memcached: Written bytes per second	The network's read rate per second in B/sec	Dependent item	memcached.stats.bytes_written.rate Preprocessing JSON Path: `$.bytes_written` Change per second
Memcached: Read bytes per second	The network's read rate per second in B/sec	Dependent item	memcached.stats.bytes_read.rate Preprocessing JSON Path: `$.bytes_read` Change per second
Memcached: Hits per second	Number of successful GET requests (items requested and found) per second.	Dependent item	memcached.stats.hits.rate Preprocessing JSON Path: `$.get_hits` Change per second
Memcached: Misses per second	Number of missed GET requests (items requested but not found) per second.	Dependent item	memcached.stats.misses.rate Preprocessing JSON Path: `$.get_misses` Change per second
Memcached: Evictions per second	"An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated. The item is selected with a pseudo-LRU mechanism. A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again."	Dependent item	memcached.stats.evictions.rate Preprocessing JSON Path: `$.evictions` Change per second
Memcached: New items per second	Number of new items stored per second.	Dependent item	memcached.stats.total_items.rate Preprocessing JSON Path: `$.total_items` Change per second
Memcached: Current number of items stored	Current number of items stored by this instance.	Dependent item	memcached.stats.curr_items Preprocessing JSON Path: `$.curr_items`
Memcached: Threads	Number of worker threads requested	Dependent item	memcached.stats.threads Preprocessing JSON Path: `$.threads`

Triggers

Name	Description	Expression	Severity
Memcached: Service is down		`last(/Memcached by Zabbix agent 2/memcached.ping["{$MEMCACHED.CONN.URI}"])=0`\|Average	Manual close: Yes
Memcached: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Memcached by Zabbix agent 2/memcached.cpu.sys,30m)=1`\|Warning	Manual close: Yes Depends on: Memcached: Service is down
Memcached: Too many queued connections	The max number of connections is reached and a new connection had to wait in the queue as a result.	`min(/Memcached by Zabbix agent 2/memcached.connections.queued.rate,5m)>{$MEMCACHED.CONN.QUEUED.MAX.WARN}`\|Warning
Memcached: Too many throttled connections	Number of times a client connection was throttled is too high. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation.	`min(/Memcached by Zabbix agent 2/memcached.connections.throttled.rate,5m)>{$MEMCACHED.CONN.THROTTLED.MAX.WARN}`\|Warning
Memcached: Total number of connected clients is too high	When the number of connections reaches the value of the "max_connections" parameter, new connections will be rejected.	`min(/Memcached by Zabbix agent 2/memcached.connections.current,5m)/last(/Memcached by Zabbix agent 2/memcached.connections.max)*100>{$MEMCACHED.CONN.PRC.MAX.WARN}`\|Warning
Memcached: Version has changed	The Memcached version has changed. Acknowledge to close the problem manually.	`last(/Memcached by Zabbix agent 2/memcached.version,#1)<>last(/Memcached by Zabbix agent 2/memcached.version,#2) and length(last(/Memcached by Zabbix agent 2/memcached.version))>0`\|Info	Manual close: Yes
Memcached: has been restarted	Uptime is less than 10 minutes.	`last(/Memcached by Zabbix agent 2/memcached.uptime)<10m`\|Info	Manual close: Yes
Memcached: Memory usage is too high		`min(/Memcached by Zabbix agent 2/memcached.stats.bytes,5m)/last(/Memcached by Zabbix agent 2/memcached.config.limit_maxbytes)*100>{$MEMCACHED.MEM.PUSED.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_mantisbt_http

View README Download JSON

Mantis BT by HTTP

Overview

This template is designed for the effortless deployment of Mantis BT monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

MantisBT 2.22

Configuration

Setup

Generate the API token in Mantis BT. Use this manual for detailed instructions.
Change values for the {$MANTIS.URL} and {$MANTIS.TOKEN} macros.

Macros used

Name	Description	Default
{$MANTIS.URL}	MantisBT URL.
{$MANTIS.TOKEN}	MantisBT Token.
{$MANTIS.LLD.FILTER.PROJECTS.MATCHES}	Filter of discoverable projects.	`.*`
{$MANTIS.LLD.FILTER.PROJECTS.NOT_MATCHES}	Filter to exclude discovered projects.	`CHANGE_IF_NEEDED`
{$MANTIS.HTTP.PROXY}	Proxy for http requests.

Items

Name	Description	Type	Key and additional info
Mantis BT: Get projects	Get projects from Mantis BT.	HTTP agent	mantisbt.get.projects

LLD rule Projects discovery

Name Description Type Key and additional info

Projects discovery

Discovery rule for a Mantis BT projects.

Dependent item

mantisbt.projects.discovery

Preprocessing

JSON Path: $.projects

Item prototypes for Projects discovery

Name	Description	Type	Key and additional info
Project [{#NAME}]: Get issues	Getting project issues.	HTTP agent	mantisbt.get.issues[{#NAME}]
Project [{#NAME}]: Total issues	Count of issues in project.	Dependent item	mantis.project.total_issues[{#NAME}] Preprocessing JSON Path: `$.issues.length()`
Project [{#NAME}]: New issues	Count of issues with 'new' status.	Dependent item	mantis.project.status.new_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='new')].length()`
Project [{#NAME}]: Resolved issues	Count of issues with 'resolved' status.	Dependent item	mantis.project.status.resolved_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='resolved')].length()`
Project [{#NAME}]: Closed issues	Count of issues with 'closed' status.	Dependent item	mantis.project.status.closed_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='closed')].length()`
Project [{#NAME}]: Assigned issues	Count of issues with 'assigned' status.	Dependent item	mantis.project.status.assigned_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='assigned')].length()`
Project [{#NAME}]: Feedback issues	Count of issues with 'feedback' status.	Dependent item	mantis.project.status.feedback_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='feedback')].length()`
Project [{#NAME}]: Acknowledged issues	Count of issues with 'acknowledged' status.	Dependent item	mantis.project.status.acknowledged_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='acknowledged')].length()`
Project [{#NAME}]: Confirmed issues	Count of issues with 'confirmed' status.	Dependent item	mantis.project.status.confirmed_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.status.name=='confirmed')].length()`
Project [{#NAME}]: Open issues	Count of "open" resolution issues.	Dependent item	mantis.project.resolution.open_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='open')].length()`
Project [{#NAME}]: Fixed issues	Count of "fixed" resolution issues.	Dependent item	mantis.project.resolution.fixed_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='fixed')].length()`
Project [{#NAME}]: Reopened issues	Count of "reopened" resolution issues.	Dependent item	mantis.project.resolution.reopened_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='reopened')].length()`
Project [{#NAME}]: Unable to reproduce issues	Count of "unable to reproduce" resolution issues.	Dependent item	mantis.project.resolution.unabletoreproduce_issues[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Project [{#NAME}]: Not fixable issues	Count of "not fixable" resolution issues.	Dependent item	mantis.project.resolution.notfixableissues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='not fixable')].length()`
Project [{#NAME}]: Duplicate issues	Count of "duplicate" resolution issues.	Dependent item	mantis.project.resolution.duplicate_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='duplicate')].length()`
Project [{#NAME}]: No change required issues	Count of "no change required" resolution issues.	Dependent item	mantis.project.resolution.nochangerequired_issues[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Project [{#NAME}]: Suspended issues	Count of "suspended" resolution issues.	Dependent item	mantis.project.resolution.suspended_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='suspended')].length()`
Project [{#NAME}]: Will not fix issues	Count of "wont fix" resolution issues.	Dependent item	mantis.project.resolution.wontfixissues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.resolution.name=='wont fix')].length()`
Project [{#NAME}]: Feature severity issues	Count of "feature" severity issues.	Dependent item	mantis.project.severity.feature_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='feature')].length()`
Project [{#NAME}]: Trivial severity issues	Count of "trivial" severity issues.	Dependent item	mantis.project.severity.trivial_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='trivial')].length()`
Project [{#NAME}]: Text severity issues	Count of "text" severity issues.	Dependent item	mantis.project.severity.text_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='text')].length()`
Project [{#NAME}]: Tweak severity issues	Count of "tweak" severity issues.	Dependent item	mantis.project.severity.tweak_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='tweak')].length()`
Project [{#NAME}]: Minor severity issues	Count of "minor" severity issues.	Dependent item	mantis.project.severity.minor_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='minor')].length()`
Project [{#NAME}]: Major severity issues	Count of "major" severity issues.	Dependent item	mantis.project.severity.major_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='major')].length()`
Project [{#NAME}]: Crash severity issues	Count of "crash" severity issues.	Dependent item	mantis.project.severity.crash_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='crash')].length()`
Project [{#NAME}]: Block severity issues	Count of "block" severity issues.	Dependent item	mantis.project.severity.block_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.severity.name=='block')].length()`
Project [{#NAME}]: None priority issues	Count of "none" priority issues.	Dependent item	mantis.project.priority.none_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='none')].length()`
Project [{#NAME}]: Low priority issues	Count of "low" priority issues.	Dependent item	mantis.project.priority.low_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='low')].length()`
Project [{#NAME}]: Normal priority issues	Count of "normal" priority issues.	Dependent item	mantis.project.priority.normal_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='normal')].length()`
Project [{#NAME}]: High priority issues	Count of "high" priority issues.	Dependent item	mantis.project.priority.high_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='high')].length()`
Project [{#NAME}]: Urgent priority issues	Count of "urgent" priority issues.	Dependent item	mantis.project.priority.urgent_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='urgent')].length()`
Project [{#NAME}]: Immediate priority issues	Count of "immediate" priority issues.	Dependent item	mantis.project.priority.immediate_issues[{#NAME}] Preprocessing JSON Path: `$.issues[?(@.priority.name=='immediate')].length()`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_state

View README Download JSON

Kubernetes cluster state by HTTP

Overview

The template to monitor Kubernetes state. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.

Template Kubernetes cluster state by HTTP - collects metrics by HTTP agent from kube-state-metrics endpoint and Kubernetes API.

Don't forget to change macros {$KUBE.API.URL} and {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Setup

Install the Zabbix Helm Chart in your Kubernetes cluster. Internal service metrics are collected from kube-state-metrics endpoint.

Template needs to use authorization via API token.

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command:

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}. Set {$KUBE.STATE.ENDPOINT.NAME} with Kube state metrics endpoint name. See kubectl -n monitoring get ep. Default: zabbix-kube-state-metrics.

NOTE. If you wish to monitor Controller Manager and Scheduler components, you might need to set the --binding-address option for them to the address where Zabbix proxy can reach them. For example, for clusters created with kubeadm it can be set in the following manifest files (changes will be applied immediately):

/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml

Depending on your Kubernetes distribution, you might need to adjust {$KUBE.CONTROL_PLANE.TAINT} macro (for example, set it to node-role.kubernetes.io/master for OpenShift).

NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.

Also, see the Macros section for a list of macros used to set trigger values.

Set up the macros to filter the metrics of discovered Kubelets by node names:

{$KUBE.LLD.FILTER.KUBELET_NODE.MATCHES}
{$KUBE.LLD.FILTER.KUBELETNODE.NOTMATCHES}

Set up macros to filter metrics by namespace:

{$KUBE.LLD.FILTER.NAMESPACE.MATCHES}
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES}

Set up macros to filter node metrics by nodename:

{$KUBE.LLD.FILTER.NODE.MATCHES}
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}

Note: If you have a large cluster, it is highly recommended to set a filter for discoverable namespaces.

You can use the {$KUBE.KUBELET.FILTER.LABELS} and {$KUBE.KUBELET.FILTER.ANNOTATIONS} macros for advanced filtering of kubelets by node labels and annotations.

Notes about labels and annotations filters:

Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (key1: value, key2: regexp).
ECMAScript syntax is used for regular expressions.
Filters are applied if such label key exists for the entity that is being filtered (it means that if you specify a key in the filter, entities that do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
You can also use the exclamation point symbol (!) to invert the filter (!key: value).

For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*. As a result, the kubelets on nodes 5-25 without the "ingress" role will be discovered.

See the Kubernetes documentation for details about labels and annotations:

You can also set up evaluation periods for replica mismatch triggers (Deployments, ReplicaSets, StatefulSets) with the macro {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}, which supports context and regular expressions. For example, you can create the following macros:

Set the evaluation period for the Deployment "nginx-deployment" in the namespace "default" to the 3 last values:

{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:default:nginx-deployment"} = #3

Set the evaluation period for all Deployments to the 10 last values:

{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"deployment:.*:.*"} = #10 or {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"^deployment.*"} = #10

Set the evaluation period for Deployments, ReplicaSets and StatefulSets in the namespace "default" to 15 minutes:

{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:".*:default:.*"} = 15m

Note that different context macros with regular expressions matching the same string can be applied in an undefined order, and simple context macros (without regular expressions) have higher priority. Read the Important notes section in Zabbix documentation for details.

Macros used

Name	Description	Default
{$KUBE.API.URL}	Kubernetes API endpoint URL in the format ://:	`https://kubernetes.default.svc.cluster.local:443`
{$KUBE.API.READYZ.ENDPOINT}	Kubernetes API readyz endpoint /readyz	`/readyz`
{$KUBE.API.LIVEZ.ENDPOINT}	Kubernetes API livez endpoint /livez	`/livez`
{$KUBE.API.COMPONENTSTATUSES.ENDPOINT}	Kubernetes API componentstatuses endpoint /api/v1/componentstatuses	`/api/v1/componentstatuses`
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$KUBE.STATE.ENDPOINT.NAME}	Kubernetes state endpoint name.	`zabbix-kube-state-metrics`
{$OPENSHIFT.STATE.ENDPOINT.NAME}	OpenShift state endpoint name.	`openshift-state-metrics`
{$KUBE.API_SERVER.SCHEME}	Kubernetes API servers metrics endpoint scheme. Used in ControlPlane LLD.	`https`
{$KUBE.API_SERVER.PORT}	Kubernetes API servers metrics endpoint port. Used in ControlPlane LLD.	`6443`
{$KUBE.CONTROL_PLANE.TAINT}	Taint that applies to control plane nodes. Change if needed. Used in ControlPlane LLD.	`node-role.kubernetes.io/control-plane`
{$KUBE.CONTROLLER_MANAGER.SCHEME}	Kubernetes Controller manager metrics endpoint scheme. Used in ControlPlane LLD.	`https`
{$KUBE.CONTROLLER_MANAGER.PORT}	Kubernetes Controller manager metrics endpoint port. Used in ControlPlane LLD.	`10257`
{$KUBE.SCHEDULER.SCHEME}	Kubernetes Scheduler metrics endpoint scheme. Used in ControlPlane LLD.	`https`
{$KUBE.SCHEDULER.PORT}	Kubernetes Scheduler metrics endpoint port. Used in ControlPlane LLD.	`10259`
{$KUBE.KUBELET.SCHEME}	Kubernetes Kubelet metrics endpoint scheme. Used in Kubelet LLD.	`https`
{$KUBE.KUBELET.PORT}	Kubernetes Kubelet metrics endpoint port. Used in Kubelet LLD.	`10250`
{$KUBE.LLD.FILTER.NAMESPACE.MATCHES}	Filter of discoverable metrics by namespace.	`.*`
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES}	Filter to exclude discovered metrics by namespace.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.NODE.MATCHES}	Filter of discoverable nodes by nodename.	`.*`
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}	Filter to exclude discovered nodes by nodename.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.KUBELET_NODE.MATCHES}	Filter of discoverable Kubelets by nodename.	`.*`
{$KUBE.LLD.FILTER.KUBELETNODE.NOTMATCHES}	Filter to exclude discovered Kubelets by nodename.	`CHANGE_IF_NEEDED`
{$KUBE.KUBELET.FILTER.ANNOTATIONS}	Node annotations to filter Kubelets (regex in values are supported). See the template's README.md for details.
{$KUBE.KUBELET.FILTER.LABELS}	Node labels to filter Kubelets (regex in values are supported). See the template's README.md for details.
{$KUBE.LLD.FILTER.PV.MATCHES}	Filter of discoverable persistent volumes by name.	`.*`
{$KUBE.LLD.FILTER.PV.NOT_MATCHES}	Filter to exclude discovered persistent volumes by name.	`CHANGE_IF_NEEDED`
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}	The evaluation period range which is used for calculation of expressions in trigger prototypes (time period or value range). Can be used with context.	`#5`

Items

Name	Description	Type	Key and additional info
Kubernetes: Get state metrics	Collecting Kubernetes metrics from kube-state-metrics.	Script	kube.state.metrics
Kubernetes: Control plane LLD	Generation of data for Control plane discovery rules.	Script	kube.control_plane.lld Preprocessing Discard unchanged with heartbeat: `3h`
Kubernetes: Node LLD	Generation of data for Kubelet discovery rules.	Script	kube.node.lld Preprocessing Discard unchanged with heartbeat: `3h`
Kubernetes: Get component statuses		HTTP agent	kube.componentstatuses Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Kubernetes: Get readyz		HTTP agent	kube.readyz Preprocessing JavaScript: `The text is too long. Please see the template.`
Kubernetes: Get livez		HTTP agent	kube.livez Preprocessing JavaScript: `The text is too long. Please see the template.`
Kubernetes: Namespace count	The number of namespaces.	Dependent item	kube.namespace.count Preprocessing Prometheus pattern: `COUNT(kube_namespace_created)` ⛔️Custom on fail: Discard value
Kubernetes: CronJob count	Number of cronjobs.	Dependent item	kube.cronjob.count Preprocessing Prometheus pattern: `COUNT(kube_cronjob_created)` ⛔️Custom on fail: Discard value
Kubernetes: Job count	Number of jobs (generated by cronjob + job).	Dependent item	kube.job.count Preprocessing Prometheus pattern: `COUNT(kube_job_created)` ⛔️Custom on fail: Discard value
Kubernetes: Endpoint count	Number of endpoints.	Dependent item	kube.endpoint.count Preprocessing Prometheus pattern: `COUNT(kube_endpoint_created)` ⛔️Custom on fail: Discard value
Kubernetes: Deployment count	The number of deployments.	Dependent item	kube.deployment.count Preprocessing Prometheus pattern: `COUNT(kube_deployment_created)` ⛔️Custom on fail: Discard value
Kubernetes: Service count	The number of services.	Dependent item	kube.service.count Preprocessing Prometheus pattern: `COUNT(kube_service_created)` ⛔️Custom on fail: Discard value
Kubernetes: StatefulSet count	The number of statefulsets.	Dependent item	kube.statefulset.count Preprocessing Prometheus pattern: `COUNT(kube_statefulset_created)` ⛔️Custom on fail: Discard value
Kubernetes: Node count	The number of nodes.	Dependent item	kube.node.count Preprocessing Prometheus pattern: `COUNT(kube_node_created)` ⛔️Custom on fail: Discard value

LLD rule API servers discovery

Name	Description	Type	Key and additional info
API servers discovery		Dependent item	kube.api_servers.discovery

LLD rule Controller manager nodes discovery

Name	Description	Type	Key and additional info
Controller manager nodes discovery		Dependent item	kube.controller_manager.discovery

LLD rule Scheduler servers nodes discovery

Name	Description	Type	Key and additional info
Scheduler servers nodes discovery		Dependent item	kube.scheduler.discovery

LLD rule Kubelet discovery

Name	Description	Type	Key and additional info
Kubelet discovery		Dependent item	kube.kubelet.discovery

LLD rule Daemonset discovery

Name Description Type Key and additional info

Daemonset discovery

Dependent item

kube.daemonset.discovery

Preprocessing

Prometheus to JSON: kube_daemonset_status_number_ready

Item prototypes for Daemonset discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Ready	The number of nodes that should be running the daemon pod and have one or more running and ready.	Dependent item	kube.daemonset.ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Scheduled	The number of nodes that run at least one daemon pod and are supposed to.	Dependent item	kube.daemonset.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Desired	The number of nodes that should be running the daemon pod.	Dependent item	kube.daemonset.desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Misscheduled	The number of nodes that run a daemon pod but are not supposed to.	Dependent item	kube.daemonset.misscheduled[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Updated number scheduled	The total number of nodes that are running updated daemon pod.	Dependent item	kube.daemonset.updated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule PVC discovery

Name Description Type Key and additional info

PVC discovery

Dependent item

kube.pvc.discovery

Preprocessing

Prometheus to JSON: kube_persistentvolumeclaim_info
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for PVC discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase	The current status phase of the persistent volume claim.	Dependent item	kube.pvc.status_phase[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Requested storage	The capacity of storage requested by the persistent volume claim.	Dependent item	kube.pvc.requested.storage[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] PVC status phase: Bound, sum	The total amount of persistent volume claims in the Bound phase.	Dependent item	kube.pvc.status_phase.bound.sum[{#NAMESPACE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] PVC status phase: Lost, sum	The total amount of persistent volume claims in the Lost phase.	Dependent item	kube.pvc.status_phase.lost.sum[{#NAMESPACE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] PVC status phase: Pending, sum	The total amount of persistent volume claims in the Pending phase.	Dependent item	kube.pvc.status_phase.pending.sum[{#NAMESPACE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Trigger prototypes for PVC discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: NS [{#NAMESPACE}] PVC [{#NAME}]: PVC is pending		`count(/Kubernetes cluster state by HTTP/kube.pvc.status_phase[{#NAMESPACE}/{#NAME}],2m,,5)>=2`\|Warning

LLD rule PV discovery

Name Description Type Key and additional info

PV discovery

Dependent item

kube.pv.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for PV discovery

Name	Description	Type	Key and additional info
Kubernetes: PV [{#NAME}] Status phase	The current status phase of the persistent volume.	Dependent item	kube.pv.status_phase[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Kubernetes: PV [{#NAME}] Capacity bytes	A capacity of the persistent volume in bytes.	Dependent item	kube.pv.capacity.bytes[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: PV status phase: Pending, sum	The total amount of persistent volumes in the Pending phase.	Dependent item	kube.pv.status_phase.pending.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: PV status phase: Available, sum	The total amount of persistent volumes in the Available phase.	Dependent item	kube.pv.status_phase.available.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: PV status phase: Bound, sum	The total amount of persistent volumes in the Bound phase.	Dependent item	kube.pv.status_phase.bound.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: PV status phase: Released, sum	The total amount of persistent volumes in the Released phase.	Dependent item	kube.pv.status_phase.released.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: PV status phase: Failed, sum	The total amount of persistent volumes in the Failed phase.	Dependent item	kube.pv.status_phase.failed.sum[{#SINGLETON}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Trigger prototypes for PV discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: PV [{#NAME}]: PV has failed		`count(/Kubernetes cluster state by HTTP/kube.pv.status_phase[{#NAME}],2m,,3)>=2`\|Warning

LLD rule Deployment discovery

Name Description Type Key and additional info

Deployment discovery

Dependent item

kube.deployment.discovery

Preprocessing

Prometheus to JSON: kube_deployment_spec_paused
Discard unchanged with heartbeat: 3h

Item prototypes for Deployment discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Paused	Whether the deployment is paused and will not be processed by the deployment controller.	Dependent item	kube.deployment.spec_paused[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas desired	Number of desired pods for a deployment.	Dependent item	kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Rollingupdate max unavailable	Maximum number of unavailable replicas during a rolling update of a deployment.	Dependent item	kube.deployment.rollingupdate.max_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas	The number of replicas per deployment.	Dependent item	kube.deployment.replicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas available	The number of available replicas per deployment.	Dependent item	kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas unavailable	The number of unavailable replicas per deployment.	Dependent item	kube.deployment.replicas_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas updated	The number of updated replicas per deployment.	Dependent item	kube.deployment.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas mismatched	The number of available replicas not matching the desired number of replicas.	Dependent item	kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus to JSON: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for Deployment discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Deployment replicas mismatch	Deployment has not matched the expected number of replicas during the specified trigger evaluation period.	`min(/Kubernetes cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0`\|Warning

LLD rule Endpoint discovery

Name Description Type Key and additional info

Endpoint discovery

Dependent item

kube.endpoint.discovery

Preprocessing

Prometheus to JSON: kube_endpoint_created

Item prototypes for Endpoint discovery

Name Description Type Key and additional info

Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address available

Number of addresses available in endpoint.

Dependent item

kube.endpoint.address_available[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address not ready

Number of addresses not ready in endpoint.

Dependent item

kube.endpoint.addressnotready[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Age

Endpoint age (number of seconds since creation).

Dependent item

kube.endpoint.age[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JavaScript: return (Math.floor(Date.now()/1000)-Number(value));

LLD rule Node discovery

Name Description Type Key and additional info

Node discovery

Dependent item

kube.node.discovery

Preprocessing

Prometheus to JSON: kube_node_info

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Kubernetes: Node [{#NAME}]: CPU allocatable	The CPU resources of a node that are available for scheduling.	Dependent item	kube.node.cpu_allocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: Memory allocatable	The memory resources of a node that are available for scheduling.	Dependent item	kube.node.memory_allocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: Pods allocatable	The pods resources of a node that are available for scheduling.	Dependent item	kube.node.pods_allocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: Ephemeral storage allocatable	The allocatable ephemeral storage of a node that is available for scheduling.	Dependent item	kube.node.ephemeralstorageallocatable[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: CPU capacity	The capacity for CPU resources of a node.	Dependent item	kube.node.cpu_capacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: Memory capacity	The capacity for memory resources of a node.	Dependent item	kube.node.memory_capacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: Ephemeral storage capacity	The ephemeral storage capacity of a node.	Dependent item	kube.node.ephemeralstoragecapacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Node [{#NAME}]: Pods capacity	The capacity for pods resources of a node.	Dependent item	kube.node.pods_capacity[{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Pod discovery

Name Description Type Key and additional info

Pod discovery

Dependent item

kube.pod.discovery

Preprocessing

Prometheus to JSON: kube_pod_start_time

Item prototypes for Pod discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Pending	Pod is in pending state.	Dependent item	kube.pod.phase.pending[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Pending"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Succeeded	Pod is in succeeded state.	Dependent item	kube.pod.phase.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Failed	Pod is in failed state.	Dependent item	kube.pod.phase.failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Failed"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Unknown	Pod is in unknown state.	Dependent item	kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Unknown"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Running	Pod is in unknown state.	Dependent item	kube.pod.phase.running[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_phase{pod="{#NAME}", phase="Running"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers terminated	Describes whether the container is currently in terminated state.	Dependent item	kube.pod.containers_terminated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_terminated{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers waiting	Describes whether the container is currently in waiting state.	Dependent item	kube.pod.containers_waiting[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_waiting{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers ready	Describes whether the containers readiness check succeeded.	Dependent item	kube.pod.containers_ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_ready{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers restarts	The number of container restarts.	Dependent item	kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_restarts_total{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers running	Describes whether the container is currently in running state.	Dependent item	kube.pod.containers_running[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `SUM(kube_pod_container_status_running{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Ready	Describes whether the pod is ready to serve requests.	Dependent item	kube.pod.ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Scheduled	Describes the status of the scheduling process for the pod.	Dependent item	kube.pod.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Unschedulable	Describes the unschedulable status for the pod.	Dependent item	kube.pod.unschedulable[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `VALUE(kube_pod_status_unschedulable{pod="{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU limits	The limit on CPU cores to be used by a container.	Dependent item	kube.pod.containers.limits.cpu[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory limits	The limit on memory to be used by a container.	Dependent item	kube.pod.containers.limits.memory[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU requests	The number of requested CPU cores by a container.	Dependent item	kube.pod.containers.requests.cpu[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory requests	The number of requested memory bytes by a container.	Dependent item	kube.pod.containers.requests.memory[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Trigger prototypes for Pod discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is not healthy		`min(/Kubernetes cluster state by HTTP/kube.pod.phase.failed[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.pending[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}],10m)>0`\|High
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is crash looping	Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state.	`(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],15m))>1`\|Warning

LLD rule ReplicaSet discovery

Name Description Type Key and additional info

ReplicaSet discovery

Dependent item

kube.replicaset.discovery

Preprocessing

Prometheus to JSON: kube_replicaset_status_replicas
Discard unchanged with heartbeat: 3h

Item prototypes for ReplicaSet discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas	The number of replicas per ReplicaSet.	Dependent item	kube.replicaset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Desired replicas	Number of desired pods for a ReplicaSet.	Dependent item	kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Fully labeled replicas	The number of fully labeled replicas per ReplicaSet.	Dependent item	kube.replicaset.fullylabeledreplicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Ready	The number of ready replicas per ReplicaSet.	Dependent item	kube.replicaset.ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas mismatched	The number of ready replicas not matching the desired number of replicas.	Dependent item	kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus to JSON: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for ReplicaSet discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Namespace [{#NAMESPACE}] RS [{#NAME}]: ReplicaSet mismatch	ReplicaSet has not matched the expected number of replicas during the specified trigger evaluation period.	`min(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"replicaset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.ready[{#NAMESPACE}/{#NAME}])>=0`\|Warning

LLD rule StatefulSet discovery

Name Description Type Key and additional info

StatefulSet discovery

Dependent item

kube.statefulset.discovery

Preprocessing

Prometheus to JSON: kube_statefulset_status_replicas
Discard unchanged with heartbeat: 3h

Item prototypes for StatefulSet discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas	The number of replicas per StatefulSet.	Dependent item	kube.statefulset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Desired replicas	Number of desired pods for a StatefulSet.	Dependent item	kube.statefulset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Current replicas	The number of current replicas per StatefulSet.	Dependent item	kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Ready replicas	The number of ready replicas per StatefulSet.	Dependent item	kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Updated replicas	The number of updated replicas per StatefulSet.	Dependent item	kube.statefulset.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas mismatched	The number of ready replicas not matching the number of replicas.	Dependent item	kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus to JSON: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for StatefulSet discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet is down		`(last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))<>1`\|High
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet replicas mismatch	StatefulSet has not matched the number of replicas during the specified trigger evaluation period.	`min(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"statefulset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}])>=0`\|Warning

LLD rule PodDisruptionBudget discovery

Name Description Type Key and additional info

PodDisruptionBudget discovery

Dependent item

kube.pdb.discovery

Preprocessing

Prometheus to JSON: kube_poddisruptionbudget_created

Item prototypes for PodDisruptionBudget discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods healthy	Current number of healthy pods.	Dependent item	kube.pdb.pods_healthy[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods desired	Minimum desired number of healthy pods.	Dependent item	kube.pdb.pods_desired[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Disruptions allowed	Number of pod disruptions that are allowed.	Dependent item	kube.pdb.disruptions_allowed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods total	Total number of pods counted by this disruption budget.	Dependent item	kube.pdb.pods_total[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule CronJob discovery

Name Description Type Key and additional info

CronJob discovery

Dependent item

kube.cronjob.discovery

Preprocessing

Prometheus to JSON: kube_cronjob_created

Item prototypes for CronJob discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Suspend	Suspend flag tells the controller to suspend subsequent executions.	Dependent item	kube.cronjob.spec_suspend[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Active	Active holds pointers to currently running jobs.	Dependent item	kube.cronjob.status_active[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Last schedule	LastScheduleTime keeps information of when was the last time the job was successfully scheduled.	Dependent item	kube.cronjob.lastscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1`
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Next schedule	Next time the cronjob should be scheduled. The time after lastScheduleTime or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed.	Dependent item	kube.cronjob.nextscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1`
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Failed	The number of pods which reached the Failed phase and the reason for failure.	Dependent item	kube.cronjob.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Succeeded	The number of pods which reached the Succeeded phase.	Dependent item	kube.cronjob.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion succeeded	Number of jobs the execution of which has been completed.	Dependent item	kube.cronjob.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion failed	Number of jobs the execution of which has failed.	Dependent item	kube.cronjob.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Job discovery

Name Description Type Key and additional info

Job discovery

Dependent item

kube.job.discovery

Preprocessing

Prometheus to JSON: kube_job_owner{owner_is_controller!="true"}

Item prototypes for Job discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Failed	The number of pods which reached the Failed phase and the reason for failure.	Dependent item	kube.job.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Succeeded	The number of pods which reached the Succeeded phase.	Dependent item	kube.job.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion succeeded	Number of jobs the execution of which has been completed.	Dependent item	kube.job.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion failed	Number of jobs the execution of which has failed.	Dependent item	kube.job.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Component statuses discovery

Name Description Type Key and additional info

Component statuses discovery

Dependent item

kube.componentstatuses.discovery

Preprocessing

JSON Path: $.items
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Item prototypes for Component statuses discovery

Name Description Type Key and additional info

Kubernetes: Component [{#NAME}]: Healthy

Cluster component healthy.

Dependent item

kube.componentstatuses.healthy[{#NAME}]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Trigger prototypes for Component statuses discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Component [{#NAME}] is unhealthy		`count(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}],#3,,"True")<2 and length(last(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}]))>0`\|Warning

LLD rule Readyz discovery

Name Description Type Key and additional info

Readyz discovery

Dependent item

kube.readyz.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Readyz discovery

Name Description Type Key and additional info

Kubernetes: Readyz [{#NAME}]: Healthcheck

Result of readyz healthcheck for component.

Dependent item

kube.readyz.healthcheck[{#NAME}]

Preprocessing

JSON Path: $.[?(@.name == "{#NAME}")].value.first()
⛔️Custom on fail: Discard value

Trigger prototypes for Readyz discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Readyz [{#NAME}] is unhealthy		`count(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}],#3,,"ok")<2 and length(last(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}]))>0`\|Warning

LLD rule Livez discovery

Name Description Type Key and additional info

Livez discovery

Dependent item

kube.livez.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Livez discovery

Name Description Type Key and additional info

Kubernetes: Livez [{#NAME}]: Healthcheck

Result of livez healthcheck for component.

Dependent item

kube.livez.healthcheck[{#NAME}]

Preprocessing

JSON Path: $.[?(@.name == "{#NAME}")].value.first()
⛔️Custom on fail: Discard value

Trigger prototypes for Livez discovery

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Livez [{#NAME}] is unhealthy		`count(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}],#3,,"ok")<2 and length(last(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}]))>0`\|Warning

LLD rule OpenShift BuildConfig discovery

Name Description Type Key and additional info

OpenShift BuildConfig discovery

Dependent item

openshift.buildconfig.discovery

Preprocessing

Prometheus to JSON: openshift_buildconfig_created
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift BuildConfig discovery

Name Description Type Key and additional info

OpenShift: Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Created

OpenShift BuildConfig Unix creation timestamp.

Dependent item

openshift.buildconfig.created.time[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Custom multiplier: 1
Discard unchanged with heartbeat: 3h

OpenShift: Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Generation

Sequence number representing a specific generation of the desired state.

Dependent item

openshift.buildconfig.generation[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

OpenShift: Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Latest version

The latest version of BuildConfig.

Dependent item

openshift.buildconfig.status[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

LLD rule OpenShift Build discovery

Name Description Type Key and additional info

OpenShift Build discovery

Dependent item

openshift.build.discovery

Preprocessing

Prometheus to JSON: openshift_build_created
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift Build discovery

Name Description Type Key and additional info

OpenShift: Namespace [{#NAMESPACE}] Build [{#NAME}]: Created

OpenShift Build Unix creation timestamp.

Dependent item

openshift.build.created.time[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Custom multiplier: 1
Discard unchanged with heartbeat: 3h

OpenShift: Namespace [{#NAMESPACE}] Build [{#NAME}]: Generation

Sequence number representing a specific generation of the desired state.

Dependent item

openshift.build.sequence.number[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

OpenShift: Namespace [{#NAMESPACE}] Build [{#NAME}]: Status phase

The Build phase.

Dependent item

openshift.build.status_phase[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.

Trigger prototypes for OpenShift Build discovery

Name	Description	Expression	Severity	Dependencies and additional info
OpenShift: Build [{#NAME}]: Build has failed		`count(/Kubernetes cluster state by HTTP/openshift.build.status_phase[{#NAMESPACE}/{#NAME}],2m,"ge",6)>=2`\|Warning

LLD rule OpenShift ClusterResourceQuota discovery

Name Description Type Key and additional info

OpenShift ClusterResourceQuota discovery

Dependent item

openshift.cluster.resource.quota.discovery

Preprocessing

Prometheus to JSON: openshift_clusterresourcequota_usage
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift ClusterResourceQuota discovery

Name Description Type Key and additional info

OpenShift: Quota [{#NAME}] Resource [{#RESOURCE}]: Type [{#TYPE}]]

Usage about resource quota.

Dependent item

openshift.cluster.resource.quota[{#RESOURCE}/{#NAME}/{#TYPE}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule OpenShift Route discovery

Name Description Type Key and additional info

OpenShift Route discovery

Dependent item

openshift.route.discovery

Preprocessing

Prometheus to JSON: openshift_route_info
Discard unchanged with heartbeat: 3h

Item prototypes for OpenShift Route discovery

Name Description Type Key and additional info

OpenShift: Namespace [{#NAMESPACE}] Route [{#NAME}]: Created

OpenShift Route Unix creation timestamp.

Dependent item

openshift.route.created.time[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Custom multiplier: 1
Discard unchanged with heartbeat: 3h

OpenShift: Namespace [{#NAMESPACE}] Route [{#NAME}]: Status

Information about route status.

Dependent item

openshift.route.status[{#NAMESPACE}/{#NAME}]

Preprocessing

Prometheus pattern: openshift_route_status{route="{#NAME}"} == 1 label status
⛔️Custom on fail: Discard value
Boolean to decimal

Trigger prototypes for OpenShift Route discovery

Name	Description	Expression	Severity	Dependencies and additional info
OpenShift: Route [{#NAME}] with issue: Status is false		`count(/Kubernetes cluster state by HTTP/openshift.route.status[{#NAMESPACE}/{#NAME}],2m,,0)>=2`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_scheduler

View README Download JSON

Kubernetes Scheduler by HTTP

Overview

The template to monitor Kubernetes Scheduler by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes Scheduler by HTTP - collects metrics by HTTP agent from Scheduler /metrics endpoint.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Kubernetes Scheduler 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.SCHEDULER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. You might need to set the --binding-address option for Scheduler to the address where Zabbix proxy can reach it. For example, for clusters created with kubeadm it can be set in the following manifest file (changes will be applied immediately):

/etc/kubernetes/manifests/kube-scheduler.yaml

NOTE. Some metrics may not be collected depending on your Kubernetes Scheduler instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.SCHEDULER.SERVER.URL}	Kubernetes Scheduler metrics endpoint URL.	`https://localhost:10259/metrics`
{$KUBE.API.TOKEN}	API Authorization Token.
{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR}	Maximum number of HTTP client requests failures used for trigger.	`2`
{$KUBE.SCHEDULER.UNSCHEDULABLE}	Maximum number of scheduling failures with 'unschedulable' used for trigger.	`2`
{$KUBE.SCHEDULER.ERROR}	Maximum number of scheduling failures with 'error' used for trigger.	`2`

Items

Name	Description	Type	Key and additional info
Kubernetes Scheduler: Get Scheduler metrics	Get raw metrics from Scheduler instance /metrics endpoint.	HTTP agent	kubernetes.scheduler.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kubernetes.scheduler.processvirtualmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Resident memory, bytes	Resident memory size in bytes.	Dependent item	kubernetes.scheduler.processresidentmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: CPU	Total user and system CPU usage ratio.	Dependent item	kubernetes.scheduler.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second Custom multiplier: `100`
Kubernetes Scheduler: Goroutines	Number of goroutines that currently exist.	Dependent item	kubernetes.scheduler.go_goroutines Preprocessing Prometheus pattern: `SUM(go_goroutines)` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Go threads	Number of OS threads created.	Dependent item	kubernetes.scheduler.go_threads Preprocessing Prometheus pattern: `VALUE(go_threads)` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Fds open	Number of open file descriptors.	Dependent item	kubernetes.scheduler.open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Fds max	Maximum allowed open file descriptors.	Dependent item	kubernetes.scheduler.max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: REST Client requests: 2xx, rate	Number of HTTP requests with 2xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_200.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Scheduler: REST Client requests: 3xx, rate	Number of HTTP requests with 3xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_300.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Scheduler: REST Client requests: 4xx, rate	Number of HTTP requests with 4xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_400.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Scheduler: REST Client requests: 5xx, rate	Number of HTTP requests with 5xx status code per second.	Dependent item	kubernetes.scheduler.clienthttprequests_500.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Scheduler: Schedule attempts: scheduled	Number of attempts to schedule pods with result "scheduled" per second.	Dependent item	kubernetes.scheduler.schedulerscheduleattempts.scheduled.rate Preprocessing Prometheus pattern: `SUM(scheduler_schedule_attempts_total{result = "scheduled"})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Scheduler: Schedule attempts: unschedulable	Number of attempts to schedule pods with result "unschedulable" per second.	Dependent item	kubernetes.scheduler.schedulerscheduleattempts.unschedulable.rate Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Kubernetes Scheduler: Schedule attempts: error	Number of attempts to schedule pods with result "error" per second.	Dependent item	kubernetes.scheduler.schedulerscheduleattempts.error.rate Preprocessing Prometheus pattern: `SUM(scheduler_schedule_attempts_total{result = "error"})` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression
Kubernetes Scheduler: Too many REST Client errors	"Kubernetes Scheduler REST Client requests is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.client_http_requests_500.rate,5m)>{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR}`\|Warning
Kubernetes Scheduler: Too many unschedulable pods	Number of attempts to schedule pods with 'unschedulable' result is too high. 'unschedulable' means a pod could not be scheduled.	`min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate,5m)>{$KUBE.SCHEDULER.UNSCHEDULABLE}`\|Warning
Kubernetes Scheduler: Too many schedule attempts with errors	Number of attempts to schedule pods with 'error' result is too high. 'error' means an internal scheduler problem.	`min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.error.rate,5m)>{$KUBE.SCHEDULER.ERROR}`\|Warning

LLD rule Scheduling algorithm histogram

Name Description Type Key and additional info

Scheduling algorithm histogram

Discovery raw data of scheduling algorithm latency.

Dependent item

kubernetes.scheduler.scheduling_algorithm.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Scheduling algorithm histogram

Name	Description	Type	Key and additional info
Kubernetes Scheduler: Scheduling algorithm duration bucket, {#LE}	Scheduling algorithm latency in seconds.	Dependent item	kubernetes.scheduler.schedulingalgorithmduration[{#LE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Scheduling algorithm duration, p90	90 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p90[{#SINGLETON}]
Kubernetes Scheduler: Scheduling algorithm duration, p95	95 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p95[{#SINGLETON}]
Kubernetes Scheduler: Scheduling algorithm duration, p99	99 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p99[{#SINGLETON}]
Kubernetes Scheduler: Scheduling algorithm duration, p50	50 percentile of scheduling algorithm latency in seconds.	Calculated	kubernetes.scheduler.schedulingalgorithmduration_p50[{#SINGLETON}]

LLD rule Binding histogram

Name Description Type Key and additional info

Binding histogram

Discovery raw data of binding latency.

Dependent item

kubernetes.scheduler.binding.discovery

Preprocessing

Prometheus to JSON: {__name__=~ "scheduler_binding_duration_seconds_*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Binding histogram

Name	Description	Type	Key and additional info
Kubernetes Scheduler: Binding duration bucket, {#LE}	Binding latency in seconds.	Dependent item	kubernetes.scheduler.binding_duration[{#LE}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: Binding duration, p90	90 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp90[{#SINGLETON}]
Kubernetes Scheduler: Binding duration, p95	99 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp95[{#SINGLETON}]
Kubernetes Scheduler: Binding duration, p99	95 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp99[{#SINGLETON}]
Kubernetes Scheduler: Binding duration, p50	50 percentile of binding latency in seconds.	Calculated	kubernetes.scheduler.bindingdurationp50[{#SINGLETON}]

LLD rule e2e scheduling histogram

Name Description Type Key and additional info

e2e scheduling histogram

Discovery raw data and percentile items of e2e scheduling latency.

Dependent item

kubernetes.controller.e2e_scheduling.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for e2e scheduling histogram

Name	Description	Type	Key and additional info
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling seconds bucket, {#LE}	E2e scheduling latency in seconds (scheduling algorithm + binding)	Dependent item	kubernetes.scheduler.e2eschedulingbucket[{#LE},"{#RESULT}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p50	50 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp50["{#RESULT}"]
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p90	90 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp90["{#RESULT}"]
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p95	95 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp95["{#RESULT}"]
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p99	95 percentile of e2e scheduling latency.	Calculated	kubernetes.scheduler.e2eschedulingp99["{#RESULT}"]

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_nodes

View README Download JSON

Kubernetes nodes by HTTP

Overview

The template to monitor Kubernetes nodes that work without any external scripts.
It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API. Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.4) in your Kubernetes cluster.

Change the values according to the environment in the file $HOME/zabbix_values.yaml.

For example:

## Enables use of Zabbix proxy enabled: false

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.

Set up the macros to filter the metrics of discovered nodes

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Setup

Install the Zabbix Helm Chart in your Kubernetes cluster.

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.
Set {$KUBE.NODES.ENDPOINT.NAME} with Zabbix agent's endpoint name. See kubectl -n monitoring get ep. Default: zabbix-zabbix-helm-chrt-agent.

Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:

{$KUBE.LLD.FILTER.NODE.MATCHES}
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}

Set up macros to filter pod metrics by namespace:

{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}

Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.

You can use the {$KUBE.NODE.FILTER.LABELS}, {$KUBE.POD.FILTER.LABELS}, {$KUBE.NODE.FILTER.ANNOTATIONS} and {$KUBE.POD.FILTER.ANNOTATIONS} macros for advanced filtering of nodes and pods by labels and annotations.

Notes about labels and annotations filters:

Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (key1: value, key2: regexp).
ECMAScript syntax is used for regular expressions.
Filters are applied if such a label key exists for the entity that is being filtered (it means that if you specify a key in a filter, entities which do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
You can also use the exclamation point symbol (!) to invert the filter (!key: value).

For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*. As a result, the nodes 5-25 without the "ingress" role will be discovered.

See the Kubernetes documentation for details about labels and annotations:

Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.

Macros used

Name	Description	Default
{$KUBE.API.URL}	Kubernetes API endpoint URL in the format ://:	`https://kubernetes.default.svc.cluster.local:443`
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$KUBE.NODES.ENDPOINT.NAME}	Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep".	`zabbix-zabbix-helm-chrt-agent`
{$KUBE.LLD.FILTER.NODE.MATCHES}	Filter of discoverable nodes.	`.*`
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}	Filter of discoverable nodes by role.	`.*`
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}	Filter to exclude discovered node by role.	`CHANGE_IF_NEEDED`
{$KUBE.NODE.FILTER.ANNOTATIONS}	Annotations to filter nodes (regex in values are supported). See the template's README.md for details.
{$KUBE.NODE.FILTER.LABELS}	Labels to filter nodes (regex in values are supported). See the template's README.md for details.
{$KUBE.POD.FILTER.ANNOTATIONS}	Annotations to filter pods (regex in values are supported). See the template's README.md for details.
{$KUBE.POD.FILTER.LABELS}	Labels to filter Pods (regex in values are supported). See the template's README.md for details.
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}	Filter of discoverable pods by namespace.	`.*`
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}	Filter to exclude discovered pods by namespace.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Kubernetes: Get nodes

Collecting and processing cluster nodes data via Kubernetes API.

Script

kube.nodes

Get nodes check

Data collection check.

Dependent item

kube.nodes.check

Preprocessing

JSON Path: $.error
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: 3h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Failed to get nodes		`length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0`\|Warning

LLD rule Node discovery

Name Description Type Key and additional info

Node discovery

Dependent item

kube.node.discovery

Preprocessing

JSON Path: $.nodes..filternode

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Node [{#NAME}]: Get data	Collecting and processing cluster by node [{#NAME}] data via Kubernetes API.	Dependent item	kube.node.get[{#NAME}] Preprocessing JSON Path: `$.nodes..[?(@.metadata.name == "{#NAME}")].first()`
Node [{#NAME}] Addresses: External IP	Typically the IP address of the node that is externally routable (available from outside the cluster).	Dependent item	kube.node.addresses.external_ip[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Addresses: Internal IP	Typically the IP address of the node that is routable only within the cluster.	Dependent item	kube.node.addresses.internal_ip[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Allocatable: CPU	Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.	Dependent item	kube.node.allocatable.cpu[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.cpu`
Node [{#NAME}] Allocatable: Memory	Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.	Dependent item	kube.node.allocatable.memory[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.memory`
Node [{#NAME}] Allocatable: Pods	https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/	Dependent item	kube.node.allocatable.pods[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.pods`
Node [{#NAME}] Capacity: CPU	CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity	Dependent item	kube.node.capacity.cpu[{#NAME}] Preprocessing JSON Path: `$.status.capacity.cpu`
Node [{#NAME}] Capacity: Memory	Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity	Dependent item	kube.node.capacity.memory[{#NAME}] Preprocessing JSON Path: `$.status.capacity.memory`
Node [{#NAME}] Capacity: Pods	https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/	Dependent item	kube.node.capacity.pods[{#NAME}] Preprocessing JSON Path: `$.status.capacity.pods`
Node [{#NAME}] Conditions: Disk pressure	True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.	Dependent item	kube.node.conditions.diskpressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Memory pressure	True if pressure exists on the node memory - that is, if the node memory is low; otherwise False.	Dependent item	kube.node.conditions.memorypressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Network unavailable	True if the network for the node is not correctly configured, otherwise False.	Dependent item	kube.node.conditions.networkunavailable[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: PID pressure	True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False.	Dependent item	kube.node.conditions.pidpressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Ready	True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).	Dependent item	kube.node.conditions.ready[{#NAME}] Preprocessing JSON Path: `$.status.conditions[?(@.type == "Ready")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Info: Architecture	Node architecture.	Dependent item	kube.node.info.architecture[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.architecture` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Container runtime	Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/	Dependent item	kube.node.info.containerruntime[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.containerRuntimeVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Kernel version	Node kernel version.	Dependent item	kube.node.info.kernelversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kernelVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Kubelet version	Version of Kubelet.	Dependent item	kube.node.info.kubeletversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kubeletVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: KubeProxy version	Version of KubeProxy.	Dependent item	kube.node.info.kubeproxyversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kubeProxyVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Operating system	Node operating system.	Dependent item	kube.node.info.operatingsystem[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.operatingSystem` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: OS image	Node OS image.	Dependent item	kube.node.info.osversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kernelVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Roles	Node roles.	Dependent item	kube.node.info.roles[{#NAME}] Preprocessing JSON Path: `$.status.roles` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Limits: CPU	Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.limits.cpu[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Limits: Memory	Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.limits.memory[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Requests: CPU	Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.requests.cpu[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Requests: Memory	Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.requests.memory[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Uptime	Node uptime.	Dependent item	kube.node.uptime[{#NAME}] Preprocessing JSON Path: `$.metadata.creationTimestamp` ⛔️Custom on fail: Discard value JavaScript: `return Math.floor((Date.now() - new Date(value)) / 1000);`
Node [{#NAME}] Used: Pods	Current number of pods on the node.	Dependent item	kube.node.used.pods[{#NAME}] Preprocessing JSON Path: `$.status.podsCount`

Trigger prototypes for Node discovery

Name	Description	Expression	Severity
Node [{#NAME}] Conditions: Pressure exists on the disk size	True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.	`last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1`\|Warning
Node [{#NAME}] Conditions: Pressure exists on the node memory	True - pressure exists on the node memory - that is, if the node memory is low; otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1`\|Warning
Node [{#NAME}] Conditions: Network is not correctly configured	True - the network for the node is not correctly configured, otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1`\|Warning
Node [{#NAME}] Conditions: Pressure exists on the processes	True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1`\|Warning
Node [{#NAME}] Conditions: Is not in Ready state	False - if the node is not healthy and is not accepting pods. Unknown - if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).	`last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1`\|Warning
Node [{#NAME}] Limits: Total CPU limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9`\|Warning	Depends on: Node [{#NAME}] Limits: Total CPU limits are too high
Node [{#NAME}] Limits: Total CPU limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1`\|Average
Node [{#NAME}] Limits: Total memory limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9`\|Warning	Depends on: Node [{#NAME}] Limits: Total memory limits are too high
Node [{#NAME}] Limits: Total memory limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1`\|Average
Node [{#NAME}] Requests: Total CPU requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5`\|Warning	Depends on: Node [{#NAME}] Requests: Total CPU requests are too high
Node [{#NAME}] Requests: Total CPU requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8`\|Average
Node [{#NAME}] Requests: Total memory requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5`\|Warning	Depends on: Node [{#NAME}] Requests: Total memory requests are too high
Node [{#NAME}] Requests: Total memory requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8`\|Average
Node [{#NAME}]: Has been restarted	Uptime is less than 10 minutes.	`last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10`\|Info
Node [{#NAME}] Used: Kubelet too many pods	Kubelet is running at capacity.	`last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9`\|Warning

LLD rule Pod discovery

Name Description Type Key and additional info

Pod discovery

Dependent item

kube.pod.discovery

Preprocessing

JSON Path: $.Pods
Discard unchanged with heartbeat: 3h

Item prototypes for Pod discovery

Name	Description	Type	Key and additional info
Node [{#NODE}] Pod [{#POD}]: Get data	Collecting and processing cluster by node [{#NODE}] data via Kubernetes API.	Dependent item	kube.pod.get[{#POD}] Preprocessing JSON Path: `$.Pods[?(@.name == "{#POD}")].first()`
Node [{#NODE}] Pod [{#POD}] Conditions: Containers ready	All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.containers_ready[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "ContainersReady")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Conditions: Initialized	All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.initialized[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "Initialized")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Conditions: Ready	The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.ready[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "Ready")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Conditions: Scheduled	The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.scheduled[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "PodScheduled")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Containers: Restarts	The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection.	Dependent item	kube.pod.containers.restartcount[{#POD}] Preprocessing JSON Path: `$.containers.restartCount` ⛔️Custom on fail: Discard value
Node [{#NODE}] Pod [{#POD}] Status: Phase	The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase	Dependent item	kube.pod.status.phase[{#POD}] Preprocessing JSON Path: `$.phase` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Uptime	Pod uptime.	Dependent item	kube.pod.uptime[{#POD}] Preprocessing JSON Path: `$.startTime` ⛔️Custom on fail: Discard value JavaScript: `return Math.floor((Date.now() - new Date(value)) / 1000);`

Trigger prototypes for Pod discovery

Name	Description	Expression	Severity	Dependencies and additional info
Node [{#NODE}] Pod [{#POD}]: Pod is crash looping	Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state.	`(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}],15m))>1`\|Warning
Node [{#NODE}] Pod [{#POD}] Status: Kubernetes Pod not healthy	Pod has been in a non-ready state for longer than 10 minutes.	`count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#POD}],10m, "regexp","^(1\|4\|5)$")>=9`\|High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_kubelet

View README Download JSON

Kubernetes Kubelet by HTTP

Overview

The template to monitor Kubernetes Kubelet by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes Kubelet by HTTP - collects metrics by HTTP agent from Kubelet /metrics endpoint.

Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.

NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.

NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.KUBELET.URL}	Kubernetes Kubelet instance URL.	`https://localhost:10250`
{$KUBE.KUBELET.METRIC.ENDPOINT}	Kubelet /metrics endpoint.	`/metrics`
{$KUBE.KUBELET.CADVISOR.ENDPOINT}	cAdvisor metrics from Kubelet /metrics/cadvisor endpoint.	`/metrics/cadvisor`
{$KUBE.KUBELET.PODS.ENDPOINT}	Kubelet /pods endpoint.	`/pods`

Items

Name	Description	Type	Key and additional info
Kubernetes: Get kubelet metrics	Collecting raw Kubelet metrics from /metrics endpoint.	HTTP agent	kube.kubelet.metrics
Kubernetes: Get cadvisor metrics	Collecting raw Kubelet metrics from /metrics/cadvisor endpoint.	HTTP agent	kube.cadvisor.metrics
Kubernetes: Get pods	Collecting raw Kubelet metrics from /pods endpoint.	HTTP agent	kube.pods
Kubernetes: Pods running	The number of running pods.	Dependent item	kube.kubelet.pods.running Preprocessing JSON Path: `$.items[?(@.status.phase == "Running")].length()`
Kubernetes: Containers running	The number of running containers.	Dependent item	kube.kubelet.containers.running Preprocessing JSON Path: `$.items[].status.containerStatuses[].restartCount.sum()`
Kubernetes: Containers last state terminated	The number of containers that were previously terminated.	Dependent item	kube.kublet.containers.terminated Preprocessing JSON Path: `The text is too long. Please see the template.`
Kubernetes: Containers restarts	The number of times the container has been restarted.	Dependent item	kube.kubelet.containers.restarts Preprocessing JSON Path: `$.items[].status.containerStatuses[].restartCount.sum()`
Kubernetes: CPU cores, total	The number of cores in this machine (available until kubernetes v1.18).	Dependent item	kube.kubelet.cpu.cores Preprocessing Prometheus pattern: `VALUE(machine_cpu_cores)`
Kubernetes: Machine memory, bytes	Resident memory size in bytes.	Dependent item	kube.kubelet.machine.memory Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Kubernetes: Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kube.kubelet.virtual.memory Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Kubernetes: File descriptors, max	Maximum number of open file descriptors.	Dependent item	kube.kubelet.processmaxfds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Kubernetes: File descriptors, open	Number of open file descriptors.	Dependent item	kube.kubelet.processopenfds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`

LLD rule Runtime operations discovery

Name Description Type Key and additional info

Runtime operations discovery

Dependent item

kube.kubelet.runtimeoperationsbucket.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Runtime operations discovery

Name	Description	Type	Key and additional info
Kubernetes: [{#OP_TYPE}] Runtime operations bucket: {#LE}	Duration in seconds of runtime operations. Broken down by operation type.	Dependent item	kube.kublet.runtimeopsdurationsecondsbucket[{#LE},"{#OP_TYPE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Kubernetes: [{#OP_TYPE}] Runtime operations total, rate	Cumulative number of runtime operations by operation type.	Dependent item	kube.kublet.runtimeopstotal.rate["{#OP_TYPE}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Kubernetes: [{#OP_TYPE}] Operations, p90	90 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp90["{#OP_TYPE}"]
Kubernetes: [{#OP_TYPE}] Operations, p95	95 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp95["{#OP_TYPE}"]
Kubernetes: [{#OP_TYPE}] Operations, p99	99 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp99["{#OP_TYPE}"]
Kubernetes: [{#OP_TYPE}] Operations, p50	50 percentile of operation latency distribution in seconds for each verb.	Calculated	kube.kublet.runtimeopsdurationsecondsp50["{#OP_TYPE}"]

LLD rule Pods discovery

Name Description Type Key and additional info

Pods discovery

Dependent item

kube.kubelet.pods.discovery

Preprocessing

JSON Path: $.items
⛔️Custom on fail: Discard value

Item prototypes for Pods discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Load average, 10s	Pods cpu load average over the last 10 seconds.	Dependent item	kube.pod.containercpuloadaverage10s[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: System seconds, total	System cpu time consumed. It is calculated from the cumulative value using the `Change per second` preprocessing step.	Dependent item	kube.pod.containercpusystemsecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Usage seconds, total	Consumed cpu time. It is calculated from the cumulative value using the `Change per second` preprocessing step.	Dependent item	kube.pod.containercpuusagesecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: User seconds, total	User cpu time consumed. It is calculated from the cumulative value using the `Change per second` preprocessing step.	Dependent item	kube.pod.containercpuusersecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second

LLD rule REST client requests discovery

Name Description Type Key and additional info

REST client requests discovery

Dependent item

kube.kubelet.rest.requests.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for REST client requests discovery

Name Description Type Key and additional info

Kubernetes: Host [{#HOST}] Request method [{#METHOD}] Code:[{#CODE}]

Number of HTTP requests, partitioned by status code, method, and host.

Dependent item

kube.kubelet.rest.requests["{#CODE}", "{#HOST}", "{#METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Container memory discovery

Name Description Type Key and additional info

Container memory discovery

Dependent item

kube.kubelet.container.memory.cache.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Container memory discovery

Name	Description	Type	Key and additional info
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory page cache	Number of bytes of page cache memory.	Dependent item	kube.kubelet.container.memory.cache["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory max usage	Maximum memory usage recorded in bytes.	Dependent item	kube.kubelet.container.memory.max_usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: RSS	Size of RSS in bytes.	Dependent item	kube.kubelet.container.memory.rss["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Swap	Container swap usage in bytes.	Dependent item	kube.kubelet.container.memory.swap["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Usage	Current memory usage in bytes, including all memory regardless of when it was accessed.	Dependent item	kube.kubelet.container.memory.usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Working set	Current working set in bytes.	Dependent item	kube.kubelet.container.memory.working_set["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_controller_manager

View README Download JSON

Kubernetes Controller manager by HTTP

Overview

The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes Controller manager by HTTP - collects metrics by HTTP agent from Controller manager /metrics endpoint.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Kubernetes Controller manager 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.CONTROLLER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. You might need to set the --binding-address option for Controller Manager to the address where Zabbix proxy can reach it. For example, for clusters created with kubeadm it can be set in the following manifest file (changes will be applied immediately):

/etc/kubernetes/manifests/kube-controller-manager.yaml

NOTE. Some metrics may not be collected depending on your Kubernetes Controller manager instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.CONTROLLER.SERVER.URL}	Kubernetes Controller manager metrics endpoint URL.	`https://localhost:10257/metrics`
{$KUBE.API.TOKEN}	API Authorization Token
{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR}	Maximum number of HTTP client requests failures used for trigger.	`2`

Items

Name	Description	Type	Key and additional info
Kubernetes Controller: Get Controller metrics	Get raw metrics from Controller instance /metrics endpoint.	HTTP agent	kubernetes.controller.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: Leader election status	Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master.	Dependent item	kubernetes.controller.leaderelectionmaster_status Preprocessing Prometheus pattern: `VALUE(leader_election_master_status)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kubernetes.controller.processvirtualmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: Resident memory, bytes	Resident memory size in bytes.	Dependent item	kubernetes.controller.processresidentmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: CPU	Total user and system CPU usage ratio.	Dependent item	kubernetes.controller.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second Custom multiplier: `100`
Kubernetes Controller Manager: Goroutines	Number of goroutines that currently exist.	Dependent item	kubernetes.controller.go_goroutines Preprocessing Prometheus pattern: `SUM(go_goroutines)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: Go threads	Number of OS threads created.	Dependent item	kubernetes.controller.go_threads Preprocessing Prometheus pattern: `VALUE(go_threads)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: Fds open	Number of open file descriptors.	Dependent item	kubernetes.controller.open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: Fds max	Maximum allowed open file descriptors.	Dependent item	kubernetes.controller.max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: REST Client requests: 2xx, rate	Number of HTTP requests with 2xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_200.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Controller Manager: REST Client requests: 3xx, rate	Number of HTTP requests with 3xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_300.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Controller Manager: REST Client requests: 4xx, rate	Number of HTTP requests with 4xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_400.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Controller Manager: REST Client requests: 5xx, rate	Number of HTTP requests with 5xx status code per second.	Dependent item	kubernetes.controller.clienthttprequests_500.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes Controller Manager: Too many HTTP client errors	"Kubernetes Controller manager is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes Controller manager by HTTP/kubernetes.controller.client_http_requests_500.rate,5m)>{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR}`\|Warning

LLD rule Workqueue metrics discovery

Name Description Type Key and additional info

Workqueue metrics discovery

Dependent item

kubernetes.controller.workqueue.discovery

Preprocessing

Prometheus to JSON: {__name__=~ "workqueue_*", name =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Workqueue metrics discovery

Name	Description	Type	Key and additional info
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue adds total, rate	Total number of adds handled by workqueue per second.	Dependent item	kubernetes.controller.workqueueaddstotal["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_adds_total{name = "{#NAME}"})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue depth	Current depth of workqueue.	Dependent item	kubernetes.controller.workqueue_depth["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_depth{name = "{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue unfinished work, sec	How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.	Dependent item	kubernetes.controller.workqueueunfinishedwork_seconds["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_unfinished_work_seconds{name = "{#NAME}"})` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue retries, rate	Total number of retries handled by workqueue per second.	Dependent item	kubernetes.controller.workqueueretriestotal["{#NAME}"] Preprocessing Prometheus pattern: `VALUE(workqueue_retries_total{name = "{#NAME}"})` ⛔️Custom on fail: Discard value Change per second
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue longest running processor, sec	How many seconds has the longest running processor for workqueue been running.	Dependent item	kubernetes.controller.workqueuelongestrunningprocessorseconds["{#NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p90	90 percentile of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp90["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p95	95 percentile of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp95["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p99	99 percentile of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp99["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, 50p	50 percentiles of how long in seconds processing an item from workqueue takes, by queue.	Calculated	kubernetes.controller.workqueueworkdurationsecondsp50["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p90	90 percentile of how long in seconds an item stays in workqueue before being requested, by queue.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp90["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p95	95 percentile of how long in seconds an item stays in workqueue before being requested, by queue.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp95["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p99	99 percentile of how long in seconds an item stays in workqueue before being requested, by queue.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp99["{#NAME}"]
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, 50p	50 percentile of how long in seconds an item stays in workqueue before being requested. If there are no requests for 5 minute, item value will be discarded.	Calculated	kubernetes.controller.workqueuequeuedurationsecondsp50["{#NAME}"] Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue duration seconds bucket, {#LE}	How long in seconds processing an item from workqueue takes.	Dependent item	kubernetes.controller.durationsecondsbucket[{#LE},"{#NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Kubernetes Controller Manager: ["{#NAME}"]: Queue duration seconds bucket, {#LE}	How long in seconds an item stays in workqueue before being requested.	Dependent item	kubernetes.controller.queuedurationseconds_bucket[{#LE},"{#NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

kubernetes_api_servers

View README Download JSON

Kubernetes API server by HTTP

Overview

The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Kubernetes API server by HTTP - collects metrics by HTTP agent from API server /metrics endpoint.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Kubernetes API server 1.19.10

Configuration

Setup

Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.

Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Kubernetes API server instance version and configuration.

Macros used

Name	Description	Default
{$KUBE.API.SERVER.URL}	Kubernetes API server metrics endpoint URL.	`https://localhost:6443/metrics`
{$KUBE.API.TOKEN}	API Authorization Token.
{$KUBE.API.CERT.EXPIRATION}	Number of days for alert of client certificate used for trigger.	`7`
{$KUBE.API.HTTP.CLIENT.ERROR}	Maximum number of HTTP client requests failures used for trigger.	`2`
{$KUBE.API.HTTP.SERVER.ERROR}	Maximum number of HTTP server requests failures used for trigger.	`2`

Items

Name	Description	Type	Key and additional info
Kubernetes API: Get API instance metrics	Get raw metrics from API instance /metrics endpoint.	HTTP agent	kubernetes.api.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Kubernetes API: Audit events, total	Accumulated number audit events generated and sent to the audit backend.	Dependent item	kubernetes.api.auditeventtotal Preprocessing Prometheus pattern: `SUM(apiserver_audit_event_total)` ⛔️Custom on fail: Discard value
Kubernetes API: Virtual memory, bytes	Virtual memory size in bytes.	Dependent item	kubernetes.api.processvirtualmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)` ⛔️Custom on fail: Discard value
Kubernetes API: Resident memory, bytes	Resident memory size in bytes.	Dependent item	kubernetes.api.processresidentmemory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)` ⛔️Custom on fail: Discard value
Kubernetes API: CPU	Total user and system CPU usage ratio.	Dependent item	kubernetes.api.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second Custom multiplier: `100`
Kubernetes API: Goroutines	Number of goroutines that currently exist.	Dependent item	kubernetes.api.go_goroutines Preprocessing Prometheus pattern: `SUM(go_goroutines)` ⛔️Custom on fail: Discard value
Kubernetes API: Go threads	Number of OS threads created.	Dependent item	kubernetes.api.go_threads Preprocessing Prometheus pattern: `VALUE(go_threads)` ⛔️Custom on fail: Discard value
Kubernetes API: Fds open	Number of open file descriptors.	Dependent item	kubernetes.api.open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)` ⛔️Custom on fail: Discard value
Kubernetes API: Fds max	Maximum allowed open file descriptors.	Dependent item	kubernetes.api.max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)` ⛔️Custom on fail: Discard value
Kubernetes API: gRPCs client started, rate	Total number of RPCs started per second.	Dependent item	kubernetes.api.grpcclientstarted.rate Preprocessing Prometheus pattern: `SUM(grpc_client_started_total)` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: gRPCs messages received, rate	Total number of gRPC stream messages received per second.	Dependent item	kubernetes.api.grpcclientmsg_received.rate Preprocessing Prometheus pattern: `SUM(grpc_client_msg_received_total)` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: gRPCs messages sent, rate	Total number of gRPC stream messages sent per second.	Dependent item	kubernetes.api.grpcclientmsg_sent.rate Preprocessing Prometheus pattern: `SUM(grpc_client_msg_sent_total)` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: Request terminations, rate	Number of requests which apiserver terminated in self-defense per second.	Dependent item	kubernetes.api.apiserverrequestterminations Preprocessing Prometheus pattern: `SUM(apiserver_request_terminations_total)` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: TLS handshake errors, rate	Number of requests dropped with 'TLS handshake error from' error per second.	Dependent item	kubernetes.api.apiservertlshandshakeerrorstotal.rate Preprocessing Prometheus pattern: `SUM(apiserver_tls_handshake_errors_total)` ⛔️Custom on fail: Discard value
Kubernetes API: API server requests: 5xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_500.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: API server requests: 4xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_400.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: API server requests: 3xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_300.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: API server requests: 0	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_0.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code = "0"})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: API server requests: 2xx, rate	Counter of apiserver requests broken out for each HTTP response code.	Dependent item	kubernetes.api.apiserverrequesttotal_200.rate Preprocessing Prometheus pattern: `SUM(apiserver_request_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: HTTP requests: 5xx, rate	Number of HTTP requests with 5xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal500.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: HTTP requests: 4xx, rate	Number of HTTP requests with 4xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal400.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: HTTP requests: 3xx, rate	Number of HTTP requests with 3xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal300.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})` ⛔️Custom on fail: Discard value Change per second
Kubernetes API: HTTP requests: 2xx, rate	Number of HTTP requests with 2xx status code per second.	Dependent item	kubernetes.api.restclientrequeststotal200.rate Preprocessing Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes API: Too many server errors	"Kubernetes API server is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR}`\|Warning
Kubernetes API: Too many client errors	"Kubernetes API client is experiencing high error rate (with 5xx HTTP code).	`min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR}`\|Warning

LLD rule Long-running requests

Name Description Type Key and additional info

Long-running requests

Discovery of long-running requests by verb, resource and scope.

Dependent item

kubernetes.api.longrunning_gauge.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Long-running requests

Name Description Type Key and additional info

Kubernetes API: Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE}

Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way.

Dependent item

kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule Request duration histogram

Name Description Type Key and additional info

Request duration histogram

Discovery raw data and percentile items of request duration.

Dependent item

kubernetes.api.requests_bucket.discovery

Preprocessing

Prometheus to JSON: {__name__=~ "apiserver_request_duration_*", verb =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Request duration histogram

Name	Description	Type	Key and additional info
Kubernetes API: ["{#VERB}"] Requests bucket: {#LE}	Response latency distribution in seconds for each verb.	Dependent item	kubernetes.api.requestdurationseconds_bucket[{#LE},"{#VERB}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Kubernetes API: ["{#VERB}"] Requests, p90	90 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p90["{#VERB}"]
Kubernetes API: ["{#VERB}"] Requests, p95	95 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p95["{#VERB}"]
Kubernetes API: ["{#VERB}"] Requests, p99	99 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p99["{#VERB}"]
Kubernetes API: ["{#VERB}"] Requests, p50	50 percentile of response latency distribution in seconds for each verb.	Calculated	kubernetes.api.requestdurationseconds_p50["{#VERB}"]

LLD rule Requests inflight discovery

Name Description Type Key and additional info

Requests inflight discovery

Discovery requests inflight by kind.

Dependent item

kubernetes.api.inflight_requests.discovery

Preprocessing

Prometheus to JSON: apiserver_current_inflight_requests{request_kind =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Requests inflight discovery

Name Description Type Key and additional info

Kubernetes API: Requests current: {#KIND}

Maximal number of currently used inflight request limit of this apiserver per request kind in last second.

Dependent item

kubernetes.api.currentinflightrequests["{#KIND}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule gRPC completed requests discovery

Name Description Type Key and additional info

gRPC completed requests discovery

Discovery grpc completed requests by grpc code.

Dependent item

kubernetes.api.grpcclienthandled.discovery

Preprocessing

Prometheus to JSON: grpc_client_handled_total{grpc_code =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for gRPC completed requests discovery

Name Description Type Key and additional info

Kubernetes API: gRPCs completed: {#GRPC_CODE}, rate

Total number of RPCs completed by the client regardless of success or failure per second.

Dependent item

kubernetes.api.grpcclienthandledtotal.rate["{#GRPCCODE}"]

Preprocessing

Prometheus pattern: SUM(grpc_client_handled_total{grpc_code = "{#GRPC_CODE}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Authentication attempts discovery

Name Description Type Key and additional info

Authentication attempts discovery

Discovery authentication attempts by result.

Dependent item

kubernetes.api.authentication_attempts.discovery

Preprocessing

Prometheus to JSON: authentication_attempts{result =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Authentication attempts discovery

Name Description Type Key and additional info

Kubernetes API: Authentication attempts: {#RESULT}, rate

Authentication attempts by result per second.

Dependent item

kubernetes.api.authentication_attempts.rate["{#RESULT}"]

Preprocessing

Prometheus pattern: SUM(authentication_attempts{result = "{#RESULT}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Authentication requests discovery

Name Description Type Key and additional info

Authentication requests discovery

Discovery authentication attempts by name.

Dependent item

kubernetes.api.authenticateduserrequests.discovery

Preprocessing

Prometheus to JSON: authenticated_user_requests{username =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Authentication requests discovery

Name Description Type Key and additional info

Kubernetes API: Authenticated requests: {#NAME}, rate

Counter of authenticated requests broken out by username per second.

Dependent item

kubernetes.api.authenticateduserrequests.rate["{#NAME}"]

Preprocessing

Prometheus pattern: VALUE(authenticated_user_requests{result = "{#NAME}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Watchers metrics discovery

Name Description Type Key and additional info

Watchers metrics discovery

Discovery watchers by kind.

Dependent item

kubernetes.api.apiserverregisteredwatchers.discovery

Preprocessing

Prometheus to JSON: apiserver_registered_watchers{kind =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Watchers metrics discovery

Name Description Type Key and additional info

Kubernetes API: Watchers: {#KIND}

Number of currently registered watchers for a given resource.

Dependent item

kubernetes.api.apiserverregisteredwatchers["{#KIND}"]

Preprocessing

Prometheus pattern: VALUE(apiserver_registered_watchers{kind = "{#KIND}"})
⛔️Custom on fail: Discard value

LLD rule Etcd objects metrics discovery

Name Description Type Key and additional info

Etcd objects metrics discovery

Discovery etcd objects by resource.

Dependent item

kubernetes.api.etcdobjectcounts.discovery

Preprocessing

Prometheus to JSON: etcd_object_counts{resource =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Etcd objects metrics discovery

Name Description Type Key and additional info

Kubernetes API: etcd objects: {#RESOURCE}

Number of stored objects at the time of last check split by kind.

Dependent item

kubernetes.api.etcdobjectcounts["{#RESOURCE}"]

Preprocessing

Prometheus pattern: VALUE(etcd_object_counts{ resource = "{#RESOURCE}"})
⛔️Custom on fail: Discard value

LLD rule Workqueue metrics discovery

Name Description Type Key and additional info

Workqueue metrics discovery

Discovery workqueue metrics by name.

Dependent item

kubernetes.api.workqueue.discovery

Preprocessing

Prometheus to JSON: workqueue_adds_total{name =~ ".*"}
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Workqueue metrics discovery

Name Description Type Key and additional info

Kubernetes API: ["{#NAME}"] Workqueue depth

Current depth of workqueue.

Dependent item

kubernetes.api.workqueue_depth["{#NAME}"]

Preprocessing

Prometheus pattern: VALUE(workqueue_depth{name = "{#NAME}"})
⛔️Custom on fail: Discard value

Kubernetes API: ["{#NAME}"] Workqueue adds total, rate

Total number of adds handled by workqueue per second.

Dependent item

kubernetes.api.workqueueaddstotal.rate["{#NAME}"]

Preprocessing

Prometheus pattern: VALUE(workqueue_adds_total{name = "{#NAME}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Client certificate expiration histogram

Name Description Type Key and additional info

Client certificate expiration histogram

Discovery raw data of client certificate expiration

Dependent item

kubernetes.api.certificate_expiration.discovery

Preprocessing

Prometheus to JSON: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Client certificate expiration histogram

Name Description Type Key and additional info

Kubernetes API: Certificate expiration seconds bucket, {#LE}

Distribution of the remaining lifetime on the certificate used to authenticate a request.

Dependent item

kubernetes.api.clientcertificateexpirationsecondsbucket[{#LE}]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Kubernetes API: Client certificate expiration, p1

1 percentile of the remaining lifetime on the certificate used to authenticate a request.

Calculated

kubernetes.api.clientcertificateexpiration_p1[{#SINGLETON}]

Trigger prototypes for Client certificate expiration histogram

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes API: Kubernetes client certificate is expiring	A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days.	`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}2460*60`\|Warning	Depends on: Kubernetes API: Kubernetes client certificate expires soon
Kubernetes API: Kubernetes client certificate expires soon	A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.	`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 246060`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_kafka_jmx

View README Download JSON

Apache Kafka by JMX

Overview

This template is designed for the effortless deployment of Apache Kafka monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Apache Kafka 2.6.0

Configuration

Setup

Metrics are collected by JMX.

Enable and configure JMX access to Apache Kafka. See documentation for instructions.
Set the user name and password in host macros {$KAFKA.USER} and {$KAFKA.PASSWORD}.

Macros used

Name	Description	Default
{$KAFKA.USER}		`zabbix`
{$KAFKA.PASSWORD}		`zabbix`
{$KAFKA.TOPIC.MATCHES}	Filter of discoverable topics	`.*`
{$KAFKA.TOPIC.NOT_MATCHES}	Filter to exclude discovered topics	`__consumer_offsets`
{$KAFKA.NETPROCAVG_IDLE.MIN.WARN}	The minimum Network processor average idle percent for trigger expression.	`30`
{$KAFKA.REQUESTHANDLERAVG_IDLE.MIN.WARN}	The minimum Request handler average idle percent for trigger expression.	`30`

Items

Name	Description	Type	Key and additional info
Kafka: Leader election per second	Number of leader elections per second.	JMX agent	jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"]
Kafka: Unclean leader election per second	Number of “unclean” elections per second.	JMX agent	jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"] Preprocessing Change per second
Kafka: Controller state on broker	One indicates that the broker is the controller for the cluster.	JMX agent	jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"] Preprocessing Discard unchanged with heartbeat: `1h`
Kafka: Ineligible pending replica deletes	The number of ineligible pending replica deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"]
Kafka: Pending replica deletes	The number of pending replica deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"]
Kafka: Ineligible pending topic deletes	The number of ineligible pending topic deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"]
Kafka: Pending topic deletes	The number of pending topic deletes.	JMX agent	jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"]
Kafka: Offline log directory count	The number of offline log directories (for example, after a hardware failure).	JMX agent	jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]
Kafka: Offline partitions count	Number of partitions that don't have an active leader.	JMX agent	jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]
Kafka: Bytes out per second	The rate at which data is fetched and read from the broker by consumers.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"] Preprocessing Change per second
Kafka: Bytes in per second	The rate at which data sent from producers is consumed by the broker.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"] Preprocessing Change per second
Kafka: Messages in per second	The rate at which individual messages are consumed by the broker.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"] Preprocessing Change per second
Kafka: Bytes rejected per second	The rate at which bytes rejected per second by the broker.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"] Preprocessing Change per second
Kafka: Client fetch request failed per second	Number of client fetch request failures per second.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"] Preprocessing Change per second
Kafka: Produce requests failed per second	Number of failed produce requests per second.	JMX agent	jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"] Preprocessing Change per second
Kafka: Request handler average idle percent	Indicates the percentage of time that the request handler (IO) threads are not in use.	JMX agent	jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"] Preprocessing Custom multiplier: `100`
Kafka: Fetch-Consumer response send time, mean	Average time taken, in milliseconds, to send the response.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"]
Kafka: Fetch-Consumer response send time, p95	The time taken, in milliseconds, to send the response for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"]
Kafka: Fetch-Consumer response send time, p99	The time taken, in milliseconds, to send the response for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"]
Kafka: Fetch-Follower response send time, mean	Average time taken, in milliseconds, to send the response.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"]
Kafka: Fetch-Follower response send time, p95	The time taken, in milliseconds, to send the response for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"]
Kafka: Fetch-Follower response send time, p99	The time taken, in milliseconds, to send the response for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"]
Kafka: Produce response send time, mean	Average time taken, in milliseconds, to send the response.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"]
Kafka: Produce response send time, p95	The time taken, in milliseconds, to send the response for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"]
Kafka: Produce response send time, p99	The time taken, in milliseconds, to send the response for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"]
Kafka: Fetch-Consumer request total time, mean	Average time in ms to serve the Fetch-Consumer request.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"]
Kafka: Fetch-Consumer request total time, p95	Time in ms to serve the Fetch-Consumer request for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"]
Kafka: Fetch-Consumer request total time, p99	Time in ms to serve the specified Fetch-Consumer for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"]
Kafka: Fetch-Follower request total time, mean	Average time in ms to serve the Fetch-Follower request.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"]
Kafka: Fetch-Follower request total time, p95	Time in ms to serve the Fetch-Follower request for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"]
Kafka: Fetch-Follower request total time, p99	Time in ms to serve the Fetch-Follower request for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"]
Kafka: Produce request total time, mean	Average time in ms to serve the Produce request.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"]
Kafka: Produce request total time, p95	Time in ms to serve the Produce requests for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"]
Kafka: Produce request total time, p99	Time in ms to serve the Produce requests for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"]
Kafka: Fetch-Consumer request total time, mean	Average time for a request to update metadata.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"]
Kafka: UpdateMetadata request total time, p95	Time for update metadata requests for 95th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"]
Kafka: UpdateMetadata request total time, p99	Time for update metadata requests for 99th percentile.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"]
Kafka: Temporary memory size in bytes (Fetch), max	The maximum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"]
Kafka: Temporary memory size in bytes (Fetch), min	The minimum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"]
Kafka: Temporary memory size in bytes (Produce), max	The maximum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"]
Kafka: Temporary memory size in bytes (Produce), avg	The amount of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"]
Kafka: Temporary memory size in bytes (Produce), min	The minimum of temporary memory used for converting message formats and decompressing messages.	JMX agent	jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"]
Kafka: Network processor average idle percent	The average percentage of time that the network processors are idle.	JMX agent	jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"] Preprocessing Custom multiplier: `100`
Kafka: Requests in producer purgatory	Number of requests waiting in producer purgatory.	JMX agent	jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"]
Kafka: Requests in fetch purgatory	Number of requests waiting in fetch purgatory.	JMX agent	jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"]
Kafka: Replication maximum lag	The maximum lag between the time that messages are received by the leader replica and by the follower replicas.	JMX agent	jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"]
Kafka: Under minimum ISR partition count	The number of partitions under the minimum In-Sync Replica (ISR) count.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"]
Kafka: Under replicated partitions	The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0).	JMX agent	jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"]
Kafka: ISR expands per second	The rate at which the number of ISRs in the broker increases.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"] Preprocessing Change per second
Kafka: ISR shrink per second	Rate of replicas leaving the ISR pool.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"] Preprocessing Change per second
Kafka: Leader count	The number of replicas for which this broker is the leader.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"]
Kafka: Partition count	The number of partitions in the broker.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"]
Kafka: Number of reassigning partitions	The number of reassigning leader partitions on a broker.	JMX agent	jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"]
Kafka: Request queue size	The size of the delay queue.	JMX agent	jmx["kafka.server:type=Request","queue-size"]
Kafka: Version	Current version of broker.	JMX agent	jmx["kafka.server:type=app-info","version"] Preprocessing Discard unchanged with heartbeat: `1h`
Kafka: Uptime	The service uptime expressed in seconds.	JMX agent	jmx["kafka.server:type=app-info","start-time-ms"] Preprocessing JavaScript: `The text is too long. Please see the template.`
Kafka: ZooKeeper client request latency	Latency in milliseconds for ZooKeeper requests from broker.	JMX agent	jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"]
Kafka: ZooKeeper connection status	Connection status of broker's ZooKeeper session.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"] Preprocessing Discard unchanged with heartbeat: `1h`
Kafka: ZooKeeper disconnect rate	ZooKeeper client disconnect per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"] Preprocessing Change per second
Kafka: ZooKeeper session expiration rate	ZooKeeper client session expiration per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"] Preprocessing Change per second
Kafka: ZooKeeper readonly rate	ZooKeeper client readonly per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"] Preprocessing Change per second
Kafka: ZooKeeper sync rate	ZooKeeper client sync per second.	JMX agent	jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"] Preprocessing Change per second

Triggers

Name	Description	Expression	Severity
Kafka: Unclean leader election detected	Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability.	`last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0`\|Average
Kafka: There are offline log directories	The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore.	`last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0`\|Warning
Kafka: One or more partitions have no leader	Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available.	`last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0`\|Warning
Kafka: Request handler average idle percent is too low	The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is.	`max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN}`\|Average
Kafka: Network processor average idle percent is too low	The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is.	`max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN}`\|Average
Kafka: Failed to fetch info data	Zabbix has not received data for items for the last 15 minutes	`nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1`\|Warning
Kafka: There are partitions under the min ISR	The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0`\|Average
Kafka: There are under replicated partitions	The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0`\|Average
Kafka: Version has changed	The Kafka version has changed. Acknowledge to close the problem manually.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0`\|Info	Manual close: Yes
Kafka: has been restarted	Uptime is less than 10 minutes.	`last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m`\|Info	Manual close: Yes
Kafka: Broker is not connected to ZooKeeper		`find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0`\|Average

LLD rule Topic Metrics (write)

Name	Description	Type	Key and additional info
Topic Metrics (write)		JMX agent	jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"]

Item prototypes for Topic Metrics (write)

Name

Description

Type

Key and additional info

Kafka {#JMXTOPIC}: Messages in per second

The rate at which individual messages are consumed by topic.

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

Kafka {#JMXTOPIC}: Bytes in per second

The rate at which data sent from producers is consumed by topic.

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

LLD rule Topic Metrics (read)

Name	Description	Type	Key and additional info
Topic Metrics (read)		JMX agent	jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"]

Item prototypes for Topic Metrics (read)

Name

Description

Type

Key and additional info

Kafka {#JMXTOPIC}: Bytes out per second

The rate at which data is fetched and read from the broker by consumers (by topic).

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

LLD rule Topic Metrics (errors)

Name	Description	Type	Key and additional info
Topic Metrics (errors)		JMX agent	jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"]

Item prototypes for Topic Metrics (errors)

Name

Description

Type

Key and additional info

Kafka {#JMXTOPIC}: Bytes rejected per second

Rejected bytes rate by topic.

JMX agent

jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing

Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_jira_datacenter_jmx

View README Download JSON

Jira Data Center by JMX

Overview

This template is used for monitoring Jira Data Center health. It is designed for standalone operation for on-premises Jira installations.

This template uses a single data source, JMX, which requires JMX RMI setup of your Jira application and Java Gateway setup on the Zabbix side. If you need "Garbage collector" and "Web server" monitoring, add "Generic Java JMX" and "Apache Tomcat by JMX" templates on the same host.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Jira Data Center 9.14.1
Jira Data Center 9.12.4

Configuration

Setup

Metrics are collected by JMX.

Deploy the Zabbix Java Gateway component (instructions).
Enable and configure JMX access to Jira Data Center. See documentation for instructions.
Assign the "Jira Data Center by JMX" template to the host with a JMX interface.
If your Jira installation requires authentication for JMX, set the values in the host macros {$JMX.USERNAME} and {$JMX.PASSWORD}.
(Optional) Set custom macro values and add macros with context for specific metrics following the macro description.
(Optional) Assign the "Generic Java JMX" template for garbage collector monitoring.
(Optional) Assign the "Apache Tomcat by JMX" template for web server monitoring.

Macros used

Name	Description	Default
{$JMX.USER}	User for JMX.
{$JMX.PASSWORD}	Password for JMX.
{$JIRA_DC.LICENSE.USER.CAPACITY.WARN}	User capacity warning threshold (%).	`80`
{$JIRA_DC.DB.CONNECTION.USAGE.WARN}	Warning threshold for database connections usage (%).	`80`
{$JIRA_DC.ISSUE.LATENCY.WARN}	Warning threshold for issue operation latency (in seconds).	`5`
{$JIRA_DC.STORAGE.LATENCY.WARN}	Warning threshold for storage write operation latency (in seconds).	`5`
{$JIRA_DC.INDEXING.LATENCY.WARN}	Warning threshold for indexing operation latency (in seconds).	`5`
{$JIRA_DC.LLD.FILTER.MATCHES.HOMEFOLDERS}	Used for storage metric discovery.	`local\|share`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.HOMEFOLDERS}	Used for storage metric discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.INDEXING}	Used for indexing metric discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.INDEXING}	Used for indexing metric discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.ISSUE}	Used for issue discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.ISSUE}	Used for issue discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.MAIL}	Used for mail server connection metric discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.MAIL}	Used for mail server connection metric discovery.	`NO MATCH`
{$JIRA_DC.LLD.FILTER.MATCHES.LICENSE}	Used for license discovery.	`.*`
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.LICENSE}	Used for license discovery.	`NO MATCH`

Items

Name	Description	Type	Key and additional info
DB: Connections: State	The state of the database connection.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value]
DB: Connections: Failed per minute	The count of database connection failures registered in one minute. Units: fpm - fails per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=failures,name=counter",Count] Preprocessing Simple change:
DB: Pool: Connections: Idle	Idle connection count of the database pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value]
DB: Pool: Connections: Active	Active connection count of the database pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numActive,name=value",Value]
DB: Reads	Database read operations from Jira per second. Units: rps - read operations per second.	JMX agent	jmx["com.atlassian.jira:type=db.reads",invocation.count] Preprocessing Change per second:
DB: Writes	Database write operations from Jira per second. Units: wps - write operations per second.	JMX agent	jmx["com.atlassian.jira:type=db.writes",invocation.count] Preprocessing Change per second:
DB: Connections: Limit	Total allowed database connection count.	JMX agent	jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal]
DB: Connections: Active	Active database connection count.	JMX agent	jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive]
DB: Connections: Latency	The latest measure of latency when querying the database.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=latency,name=value",Value]
License: Users: Get	License data for the discovery rule.	JMX agent	jmx.discovery[attributes,"com.atlassian.jira:type=jira.license"] Preprocessing JavaScript: `The text is too long. Please see the template.`
HTTP: Pool: Connections: Active	The latest measure of the number of active connections in the HTTP connection pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numActive,name=value",Value]
HTTP: Pool: Connections: Idle	The latest measure of the number of idle connections in the HTTP connection pool.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value]
HTTP: Sessions: Active	The latest measure of the number of active user sessions.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=sessions,category03=active,name=value",Value]
HTTP: Requests per minute	The latest measure of the total number of HTTP requests per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=http,category01=requests,name=value",Value]
Mail: Queue	The latest measure of the number of items in a mail queue.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value]
Mail: Queue: Error	The latest measure of the number of items in an error mail queue.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value]
Mail: Sent per minute	The latest measure of the number of emails sent by the SMTP server per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numEmailsSentPerMin,name=value",Value]
Mail: Processed per minute	The latest measure of the number of items processed by a mail queue per minute.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItemsProcessedPerMin,name=value",Value]
Mail: Queue: Processing state	The latest indicator of the state of a mail queue job.	JMX agent	jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value]
Entity: Issues	The number of issues.	JMX agent	jmx["com.atlassian.jira:type=entity.issues.total",Value]
Entity: Attachments	The number of attachments.	JMX agent	jmx["com.atlassian.jira:type=entity.attachments.total",Value]
Entity: Components	The number of components.	JMX agent	jmx["com.atlassian.jira:type=entity.components.total",Value]
Entity: Custom fields	The number of custom fields.	JMX agent	jmx["com.atlassian.jira:type=entity.customfields.total",Value]
Entity: Filters	The number of filters.	JMX agent	jmx["com.atlassian.jira:type=entity.filters.total",Value]
Entity: Versions created	The number of versions created.	JMX agent	jmx["com.atlassian.jira:type=entity.versions.total",Value]
Issue: Search per minute	Issue searches performed per minute.	JMX agent	jmx["com.atlassian.jira:type=issue.search.count",Value] Preprocessing Simple change:
Issue: Created per minute	Issues created per minute.	JMX agent	jmx["com.atlassian.jira:type=issue.created.count",Value] Preprocessing Simple change:
Issue: Updates per minute	Issue updates performed per minute.	JMX agent	jmx["com.atlassian.jira:type=issue.updated.count",Value] Preprocessing Simple change:
Quicksearch: Concurrent searches	The number of concurrent searches that are being performed in real-time by using the quick search.	JMX agent	jmx["com.atlassian.jira:type=quicksearch.concurrent.search",Value] Preprocessing Simple change:

Triggers

Name	Description	Expression	Severity
DB: Connection lost	Database connection lost	`max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value],3m)=0`\|Average	Manual close: Yes
DB: Pool: Out of idle connections	Fires when out of idle connections in database pool for 5 minutes.	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0`\|Warning	Manual close: Yes
DB: Connection usage is near the limit		`100*min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)/last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal])>{$JIRA_DC.DB.CONNECTION.USAGE.WARN}`\|Warning	Manual close: Yes
DB: Connection limit reached		`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)=last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal])`\|Warning	Manual close: Yes
HTTP: Pool: Out of idle connections	All available connections are utilized. It can cause outages for users as the system is unable to serve their requests.	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0`\|Warning	Manual close: Yes
Mail: Queue: Doesn’t empty over an extended period	Might indicate SMTP performance or connection problems.	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],30m)>0`\|Warning	Manual close: Yes Depends on: Mail: Queue job is not running
Mail: Error queue contains one or more items	A mail queue attempts to resend items up to 10 times. If the operation fails for the 11th time, the items are put into an error mail queue. You can remove items from the error mail queue in one of the following ways: - Manually clear the whole error queue. - Manually resend all items from the error queue to a mail queue. You should pay attention to the cases where an error mail queue item gets back to an error mail queue after you resend the items manually. These cases might indicate permanent performance issues.	`max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value],5m)>0`\|Warning	Manual close: Yes
Mail: Queue job is not running	It should be running when its queue is not empty. Might indicate SMTP server connection problems.	`max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value],15m)=0 and min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],15m)>0`\|Average	Manual close: Yes

LLD rule Storage discovery

Name	Description	Type	Key and additional info
Storage discovery	Discovery of the Jira storage metrics.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=home,category01=,category02=write,category03=latency,,name=value"]

Item prototypes for Storage discovery

Name Description Type Key and additional info

Storage [{#JMXCATEGORY01}]: Latency

The median latency of writing a small file (~30 bytes) to {#JMXCATEGORY01}.

JMX agent

jmx["{#JMXOBJ}",Value]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for Storage discovery

Name	Description	Expression	Severity	Dependencies and additional info
Storage [{#JMXCATEGORY01}]: Slow performance	Fires when latency grows above the threshold: `{$JIRA_DC.STORAGE.LATENCY.WARN:"{#JMXCATEGORY01}"}`s	`min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Value],5m)>{$JIRA_DC.STORAGE.LATENCY.WARN:"{#JMXCATEGORY01}"}`\|Warning	Manual close: Yes

LLD rule Mail server discovery

Name	Description	Type	Key and additional info
Mail server discovery	Discovery of the Jira connected mail servers.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=mail,category01=,category02=connection,category03=state,name="]

Item prototypes for Mail server discovery

Name Description Type Key and additional info

Mail [{#JMXCATEGORY01},{#JMXNAME}]: Connection state

Shows connection state of Jira to discovered mail server: {#JMXCATEGORY01}-{#JMXNAME}

JMX agent

jmx["{#JMXOBJ}",Connected]

Preprocessing

Boolean to decimal:

Mail [{#JMXCATEGORY01},{#JMXNAME}]: Failures per minute

Count of failed connections to discovered mail server {#JMXCATEGORY01}-{#JMXNAME} per minute

JMX agent

jmx["{#JMXOBJ}",TotalFailures]

Preprocessing

Simple change:

Trigger prototypes for Mail server discovery

Name	Description	Expression	Severity	Dependencies and additional info
Mail [{#JMXCATEGORY01}-{#JMXNAME}]: Server disconnected	Trigger is fired when discovered mail server `{#JMXCATEGORY01}-{#JMXNAME}` becomes unavailable	`max(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Connected],5m)=0`\|Average	Manual close: Yes

LLD rule Indexing latency discovery

Name	Description	Type	Key and additional info
Indexing latency discovery	Discovery of the Jira indexing metrics.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=indexing,name=*"]

Item prototypes for Indexing latency discovery

Name Description Type Key and additional info

Indexing [{#JMXNAME}]: Latency

Average time spent on indexing operations.

JMX agent

jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for Indexing latency discovery

Name	Description	Expression	Severity	Dependencies and additional info
Indexing [{#JMXNAME}]: Slow performance	Fires when latency grows above the threshold: `{$JIRA_DC.INDEXING.LATENCY.WARN:"{#JMXNAME}"}`s	`min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean],5m)>{$JIRA_DC.INDEXING.LATENCY.WARN:"{#JMXNAME}"}`\|Warning	Manual close: Yes

LLD rule Issue latency discovery

Name	Description	Type	Key and additional info
Issue latency discovery	Discovery of the Jira issue latency metrics.	JMX agent	jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=issue,name=*"]

Item prototypes for Issue latency discovery

Name Description Type Key and additional info

Issue [{#JMXNAME}]: Latency

Average time spent on issue {#JMXNAME} operations.

JMX agent

jmx["{#JMXOBJ}",Mean]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for Issue latency discovery

Name	Description	Expression	Severity	Dependencies and additional info
Issue [{#JMXNAME}]: Slow operations	Fires when latency grows above the threshold: `{$JIRA_DC.ISSUE.LATENCY.WARN:"{#JMXNAME}"}`s	`min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Mean],5m)>{$JIRA_DC.ISSUE.LATENCY.WARN:"{#JMXNAME}"}`\|Warning	Manual close: Yes

LLD rule License discovery

Name Description Type Key and additional info

License discovery

Discovery of the Jira licenses.

Dependent item

jmx.license.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for License discovery

Name Description Type Key and additional info

License [{#LICENSE.TYPE}]: Users: Current

Current user count for {#LICENSE.TYPE}.

Dependent item

jmx.license.get.user.current["{#LICENSE.TYPE}"]

Preprocessing

JSON Path: $.{#LICENSE.TYPE}.properties.current_user_count

License [{#LICENSE.TYPE}]: Users: Maximum

User count limit for {#LICENSE.TYPE}.

-1 = No limits for the license type.

Dependent item

jmx.license.get.user.max["{#LICENSE.TYPE}"]

Preprocessing

JSON Path: $.{#LICENSE.TYPE}.properties.max_user_count

Trigger prototypes for License discovery

Name	Description	Expression	Severity	Dependencies and additional info
License [{#LICENSE.TYPE}]: Low user capacity	Fires when relative user quantity grows above the threshold: `{$JIRA_DC.LICENSE.USER.CAPACITY.WARN:"{#LICENSE.TYPE}"}`%	`last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * (100*last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"])/last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>{$JIRA_DC.LICENSE.USER.CAPACITY.WARN:"{#LICENSE.TYPE}"})`\|Warning	Manual close: Yes Depends on: License [{#LICENSE.TYPE}]: User count reached the limit
License [{#LICENSE.TYPE}]: User count reached the limit	Fires when user quantity reaches the limit. It won't fire if the limit is disabled (set to `-1`).	`last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * ((last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])-last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"]))<=0)`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_jenkins

View README Download JSON

Jenkins by HTTP

Overview

The template to monitor Apache Jenkins by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Jenkins 2.263.1

Configuration

Setup

Metrics are collected by requests to Metrics API. For common metrics: Install and configure Metrics plugin parameters according official documentations. Do not forget to configure access to the Metrics Servlet by issuing API key and change macro {$JENKINS.API.KEY}.

For monitoring computers and builds: Create API token for monitoring user according official documentations and change macro {$JENKINS.USER}, {$JENKINS.API.TOKEN}. Don't forget to change macros {$JENKINS.URL}.

Macros used

Name	Description	Default
{$JENKINS.URL}	Jenkins URL in the format `<scheme>://<host>:<port>`
{$JENKINS.API.KEY}	API key to access Metrics Servlet
{$JENKINS.USER}	Username for HTTP BASIC authentication	`zabbix`
{$JENKINS.API.TOKEN}	API token for HTTP BASIC authentication.
{$JENKINS.PING.REPLY}	Expected reply to the ping.	`pong`
{$JENKINS.FILE_DESCRIPTORS.MAX.WARN}	Maximum percentage of file descriptors usage alert threshold (for trigger expression).	`85`
{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN}	Minimum job's health score (for trigger expression).	`50`

Items

Name	Description	Type	Key and additional info
Jenkins: Get service metrics		HTTP agent	jenkins.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Jenkins: Get healthcheck		HTTP agent	jenkins.healthcheck Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Jenkins: Get jobs info		HTTP agent	jenkins.job_info Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Jenkins: Get computer info		HTTP agent	jenkins.computer_info Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Jenkins: Disk space check message	The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast.	Dependent item	jenkins.disk_space.message Preprocessing JSON Path: `$['disk-space'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Jenkins: Temporary space check message	The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast.	Dependent item	jenkins.temporary_space.message Preprocessing JSON Path: `$['temporary-space'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Jenkins: Plugins check message	The message of plugins health check.	Dependent item	jenkins.plugins.message Preprocessing JSON Path: `$['plugins'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Jenkins: Thread deadlock check message	The message of thread deadlock health check.	Dependent item	jenkins.thread_deadlock.message Preprocessing JSON Path: `$['thread-deadlock'].message` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Jenkins: Disk space check	Returns FAIL if any of the Jenkins disk space monitors are reporting the disk space as less than the configured threshold.	Dependent item	jenkins.disk_space Preprocessing JSON Path: `$['disk-space'].healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Plugins check	Returns FAIL if any of the Jenkins plugins failed to start.	Dependent item	jenkins.plugins Preprocessing JSON Path: `$.plugins.healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Temporary space check	Returns FAIL if any of the Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold.	Dependent item	jenkins.temporary_space Preprocessing JSON Path: `$['temporary-space'].healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Thread deadlock check	Returns FAIL if there are any deadlocked threads in the Jenkins master JVM.	Dependent item	jenkins.thread_deadlock Preprocessing JSON Path: `$['thread-deadlock'].healthy` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Get gauges	Raw items for gauges metrics.	Dependent item	jenkins.gauges.raw Preprocessing JSON Path: `$.gauges`
Jenkins: Executors count	The number of executors available to Jenkins. This is corresponds to the sum of all the executors of all the online nodes.	Dependent item	jenkins.executor.count Preprocessing JSON Path: `$.['jenkins.executor.count.value'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Executors free	The number of executors available to Jenkins that are not currently in use.	Dependent item	jenkins.executor.free Preprocessing JSON Path: `$.['jenkins.executor.free.value'].value`
Jenkins: Executors in use	The number of executors available to Jenkins that are currently in use.	Dependent item	jenkins.executor.in_use Preprocessing JSON Path: `$.['jenkins.executor.in-use.value'].value`
Jenkins: Nodes count	The number of build nodes available to Jenkins, both online and offline.	Dependent item	jenkins.node.count Preprocessing JSON Path: `$.['jenkins.node.count.value'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Nodes offline	The number of build nodes available to Jenkins but currently offline.	Dependent item	jenkins.node.offline Preprocessing JSON Path: `$.['jenkins.node.offline.value'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Nodes online	The number of build nodes available to Jenkins and currently online.	Dependent item	jenkins.node.online Preprocessing JSON Path: `$.['jenkins.node.online.value'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Plugins active	The number of plugins in the Jenkins instance that started successfully.	Dependent item	jenkins.plugins.active Preprocessing JSON Path: `$.['jenkins.plugins.active'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Plugins failed	The number of plugins in the Jenkins instance that failed to start. A value other than 0 is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the plugin(s) or by resolving the plugin dependency issues.	Dependent item	jenkins.plugins.failed Preprocessing JSON Path: `$.['jenkins.plugins.failed'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Plugins inactive	The number of plugins in the Jenkins instance that are not currently enabled.	Dependent item	jenkins.plugins.inactive Preprocessing JSON Path: `$.['jenkins.plugins.inactive'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Plugins with update	The number of plugins in the Jenkins instance that have a newer version reported as available in the current Jenkins update center metadata held by Jenkins. This value is not indicative of an issue with Jenkins but high values can be used as a trigger to review the plugins with updates with a view to seeing whether those updates potentially contain fixes for issues that could be affecting your Jenkins instance.	Dependent item	jenkins.plugins.with_update Preprocessing JSON Path: `$.['jenkins.plugins.withUpdate'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Projects count	The number of projects.	Dependent item	jenkins.project.count Preprocessing JSON Path: `$.['jenkins.project.count.value'].value` Discard unchanged with heartbeat: `1h`
Jenkins: Jobs count	The number of jobs in Jenkins.	Dependent item	jenkins.job.count.value Preprocessing JSON Path: `$.['jenkins.job.count.value'].value` Discard unchanged with heartbeat: `3h`
Jenkins: Get meters	Raw items for meters metrics.	Dependent item	jenkins.meters.raw Preprocessing JSON Path: `$.meters`
Jenkins: Job scheduled, m1 rate	The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system.	Dependent item	jenkins.job.scheduled.m1.rate Preprocessing JSON Path: `$.['jenkins.job.scheduled'].m1_rate`
Jenkins: Jobs scheduled, m5 rate	The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system.	Dependent item	jenkins.job.scheduled.m5.rate Preprocessing JSON Path: `$.['jenkins.job.scheduled'].m5_rate`
Jenkins: Get timers	Raw items for timers metrics.	Dependent item	jenkins.timers.raw Preprocessing JSON Path: `$.timers`
Jenkins: Job blocked, m1 rate	The rate at which jobs in the build queue enter the blocked state.	Dependent item	jenkins.job.blocked.m1.rate Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].m1_rate`
Jenkins: Job blocked, m5 rate	The rate at which jobs in the build queue enter the blocked state.	Dependent item	jenkins.job.blocked.m5.rate Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].m5_rate`
Jenkins: Job blocked duration, p95	The amount of time which jobs spend in the blocked state.	Dependent item	jenkins.job.blocked.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].p95`
Jenkins: Job blocked duration, median	The amount of time which jobs spend in the blocked state.	Dependent item	jenkins.job.blocked.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.blocked.duration'].p50`
Jenkins: Job building, m1 rate	The rate at which jobs are built.	Dependent item	jenkins.job.building.m1.rate Preprocessing JSON Path: `$.['jenkins.job.building.duration'].m1_rate`
Jenkins: Job building, m5 rate	The rate at which jobs are built.	Dependent item	jenkins.job.building.m5.rate Preprocessing JSON Path: `$.['jenkins.job.building.duration'].m5_rate`
Jenkins: Job building duration, p95	The amount of time which jobs spend building.	Dependent item	jenkins.job.building.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.building.duration'].p95`
Jenkins: Job building duration, median	The amount of time which jobs spend building.	Dependent item	jenkins.job.building.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.building.duration'].p50`
Jenkins: Job buildable, m1 rate	The rate at which jobs in the build queue enter the buildable state.	Dependent item	jenkins.job.buildable.m1.rate Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].m1_rate`
Jenkins: Job buildable, m5 rate	The rate at which jobs in the build queue enter the buildable state.	Dependent item	jenkins.job.buildable.m5.rate Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].m5_rate`
Jenkins: Job buildable duration, p95	The amount of time which jobs spend in the buildable state.	Dependent item	jenkins.job.buildable.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].p95`
Jenkins: Job buildable duration, median	The amount of time which jobs spend in the buildable state.	Dependent item	jenkins.job.buildable.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.buildable.duration'].p50`
Jenkins: Job queuing, m1 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.queuing.m1.rate Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].m1_rate`
Jenkins: Job queuing, m5 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.queuing.m5.rate Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].m5_rate`
Jenkins: Job queuing duration, p95	The total time which jobs spend in the build queue.	Dependent item	jenkins.job.queuing.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].p95`
Jenkins: Job queuing duration, median	The total time which jobs spend in the build queue.	Dependent item	jenkins.job.queuing.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.queuing.duration'].p50`
Jenkins: Job total, m1 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.total.m1.rate Preprocessing JSON Path: `$.['jenkins.job.total.duration'].m1_rate`
Jenkins: Job total, m5 rate	The rate at which jobs are queued.	Dependent item	jenkins.job.total.m5.rate Preprocessing JSON Path: `$.['jenkins.job.total.duration'].m5_rate`
Jenkins: Job total duration, p95	The total time which jobs spend from entering the build queue to completing building.	Dependent item	jenkins.job.total.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.total.duration'].p95`
Jenkins: Job total duration, median	The total time which jobs spend from entering the build queue to completing building.	Dependent item	jenkins.job.total.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.total.duration'].p50`
Jenkins: Job waiting, m1 rate	The rate at which jobs enter the quiet period.	Dependent item	jenkins.job.waiting.m1.rate Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].m1_rate`
Jenkins: Job waiting, m5 rate	The rate at which jobs enter the quiet period.	Dependent item	jenkins.job.waiting.m5.rate Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].m5_rate`
Jenkins: Job waiting duration, p95	The total amount of time that jobs spend in their quiet period.	Dependent item	jenkins.job.waiting.duration.p95 Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].p95`
Jenkins: Job waiting duration, median	The total amount of time that jobs spend in their quiet period.	Dependent item	jenkins.job.waiting.duration.p50 Preprocessing JSON Path: `$.['jenkins.job.waiting.duration'].p50`
Jenkins: Build queue, blocked	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.blocked Preprocessing JSON Path: `$.['jenkins.queue.blocked.value'].value`
Jenkins: Build queue, size	The number of jobs that are in the Jenkins build queue.	Dependent item	jenkins.queue.size Preprocessing JSON Path: `$.['jenkins.queue.size.value'].value`
Jenkins: Build queue, buildable	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.buildable Preprocessing JSON Path: `$.['jenkins.queue.buildable.value'].value`
Jenkins: Build queue, pending	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.pending Preprocessing JSON Path: `$.['jenkins.queue.pending.value'].value`
Jenkins: Build queue, stuck	The number of jobs that are in the Jenkins build queue and currently in the blocked state.	Dependent item	jenkins.queue.stuck Preprocessing JSON Path: `$.['jenkins.queue.stuck.value'].value`
Jenkins: HTTP active requests, rate	The number of currently active requests against the Jenkins master Web UI.	Dependent item	jenkins.http.active_requests.rate Preprocessing JSON Path: `$.counters.['http.activeRequests'].count` Change per second
Jenkins: HTTP response 400, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/400 status code.	Dependent item	jenkins.http.bad_request.rate Preprocessing JSON Path: `$.['http.responseCodes.badRequest'].count` Change per second
Jenkins: HTTP response 500, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/500 status code.	Dependent item	jenkins.http.server_error.rate Preprocessing JSON Path: `$.['http.responseCodes.serverError'].count` Change per second
Jenkins: HTTP response 503, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/503 status code.	Dependent item	jenkins.http.service_unavailable.rate Preprocessing JSON Path: `$.['http.responseCodes.serviceUnavailable'].count` Change per second
Jenkins: HTTP response 200, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/200 status code.	Dependent item	jenkins.http.ok.rate Preprocessing JSON Path: `$.['http.responseCodes.ok'].count` Change per second
Jenkins: HTTP response other, rate	The rate at which the Jenkins master Web UI is responding to requests with a non-informational status code that is not in the list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, or HTTP/503.	Dependent item	jenkins.http.other.rate Preprocessing JSON Path: `$.['http.responseCodes.other'].count` Change per second
Jenkins: HTTP response 201, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/201 status code.	Dependent item	jenkins.http.created.rate Preprocessing JSON Path: `$.['http.responseCodes.created'].count` Change per second
Jenkins: HTTP response 204, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/204 status code.	Dependent item	jenkins.http.no_content.rate Preprocessing JSON Path: `$.['http.responseCodes.noContent'].count` Change per second
Jenkins: HTTP response 404, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/404 status code.	Dependent item	jenkins.http.not_found.rate Preprocessing JSON Path: `$.['http.responseCodes.notFound'].count` Change per second
Jenkins: HTTP response 304, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/304 status code.	Dependent item	jenkins.http.not_modified.rate Preprocessing JSON Path: `$.['http.responseCodes.notModified'].count` Change per second
Jenkins: HTTP response 403, rate	The rate at which the Jenkins master Web UI is responding to requests with an HTTP/403 status code.	Dependent item	jenkins.http.forbidden.rate Preprocessing JSON Path: `$.['http.responseCodes.forbidden'].count` Change per second
Jenkins: HTTP requests, rate	The rate at which the Jenkins master Web UI is receiving requests.	Dependent item	jenkins.http.requests.rate Preprocessing JSON Path: `$.['http.requests'].count` Change per second
Jenkins: HTTP requests, p95	The time spent generating the corresponding responses.	Dependent item	jenkins.http.requests_p95.rate Preprocessing JSON Path: `$.['http.requests'].p95`
Jenkins: HTTP requests, median	The time spent generating the corresponding responses.	Dependent item	jenkins.http.requests_p50.rate Preprocessing JSON Path: `$.['http.requests'].p50`
Jenkins: Version	Version of Jenkins server.	Dependent item	jenkins.version Preprocessing JSON Path: `$.['jenkins.versions.core'].value` Discard unchanged with heartbeat: `3h`
Jenkins: CPU Load	The system load on the Jenkins master as reported by the JVM's Operating System JMX bean. The calculation of system load is operating system dependent. Typically this is the sum of the number of processes that are currently running plus the number that are waiting to run. This is typically comparable against the number of CPU cores.	Dependent item	jenkins.system.cpu.load Preprocessing JSON Path: `$.['system.cpu.load'].value`
Jenkins: Uptime	The number of seconds since the Jenkins master JVM started.	Dependent item	jenkins.system.uptime Preprocessing JSON Path: `$.['vm.uptime.milliseconds'].value` Custom multiplier: `0.001`
Jenkins: File descriptor ratio	The ratio of used to total file descriptors	Dependent item	jenkins.descriptor.ratio Preprocessing JSON Path: `$.['vm.file.descriptor.ratio'].value` Custom multiplier: `100`
Jenkins: Service ping		HTTP agent	jenkins.ping Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `0` Regular expression: `{$JENKINS.PING.REPLY} 1` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `30m`

Triggers

Name	Description	Expression	Severity
Jenkins: Disk space is too low	Jenkins disk space monitors are reporting the disk space as less than the configured threshold. The message will reference the first node which fails this check. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.disk_space)=0 and length(last(/Jenkins by HTTP/jenkins.disk_space.message))>0`\|Warning
Jenkins: One or more Jenkins plugins failed to start	A failure is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the failing plugin(s) or by resolving the corresponding plugin dependency issues. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.plugins)=0 and length(last(/Jenkins by HTTP/jenkins.plugins.message))>0`\|Info	Manual close: Yes
Jenkins: Temporary space is too low	Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. The message will reference the first node which fails this check. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.temporary_space)=0 and length(last(/Jenkins by HTTP/jenkins.temporary_space.message))>0`\|Warning
Jenkins: There are deadlocked threads in Jenkins master JVM	There are any deadlocked threads in the Jenkins master JVM. Health check message: {{ITEM.LASTVALUE2}.regsub('(.*)',\1)}	`last(/Jenkins by HTTP/jenkins.thread_deadlock)=0 and length(last(/Jenkins by HTTP/jenkins.thread_deadlock.message))>0`\|Warning
Jenkins: Service has no online nodes		`last(/Jenkins by HTTP/jenkins.node.online)=0`\|Average
Jenkins: Version has changed	The Jenkins version has changed. Acknowledge to close the problem manually.	`last(/Jenkins by HTTP/jenkins.version,#1)<>last(/Jenkins by HTTP/jenkins.version,#2) and length(last(/Jenkins by HTTP/jenkins.version))>0`\|Info	Manual close: Yes
Jenkins: Host has been restarted	Uptime is less than 10 minutes.	`last(/Jenkins by HTTP/jenkins.system.uptime)<10m`\|Info	Manual close: Yes
Jenkins: Current number of used files is too high		`min(/Jenkins by HTTP/jenkins.descriptor.ratio,5m)>{$JENKINS.FILE_DESCRIPTORS.MAX.WARN}`\|Warning
Jenkins: Service is down		`last(/Jenkins by HTTP/jenkins.ping)=0`\|Average	Manual close: Yes

LLD rule Jobs discovery

Name Description Type Key and additional info

Jobs discovery

HTTP agent

jenkins.jobs

Preprocessing

JSON Path: $.jobs.[*]

Item prototypes for Jobs discovery

Name	Description	Type	Key and additional info
Jenkins job [{#NAME}]: Get job	Raw data for a job.	Dependent item	jenkins.job.get[{#NAME}] Preprocessing JSON Path: `$.jobs.[?(@.name == "{#NAME}")].first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Health score	Represents health of project. A number between 0-100. Job Description: {#DESCRIPTION} Job Url: {#URL}	Dependent item	jenkins.build.health[{#NAME}] Preprocessing JSON Path: `$.healthReport..score.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Build number	Details: {#URL}/lastBuild/	Dependent item	jenkins.last_build.number[{#NAME}] Preprocessing JSON Path: `$.lastBuild.number` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Build duration	Build duration (in seconds).	Dependent item	jenkins.last_build.duration[{#NAME}] Preprocessing JSON Path: `$.lastBuild.duration` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Build timestamp		Dependent item	jenkins.last_build.timestamp[{#NAME}] Preprocessing JSON Path: `$.lastBuild.timestamp` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Build result		Dependent item	jenkins.last_build.result[{#NAME}] Preprocessing JSON Path: `$.lastBuild.result` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Failed Build number	Details: {#URL}/lastFailedBuild/	Dependent item	jenkins.lastfailedbuild.number[{#NAME}] Preprocessing JSON Path: `$.lastFailedBuild.number` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Failed Build duration	Build duration (in seconds).	Dependent item	jenkins.lastfailedbuild.duration[{#NAME}] Preprocessing JSON Path: `$.lastFailedBuild.duration` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Failed Build timestamp		Dependent item	jenkins.lastfailedbuild.timestamp[{#NAME}] Preprocessing JSON Path: `$.lastFailedBuild.timestamp` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Successful Build number	Details: {#URL}/lastSuccessfulBuild/	Dependent item	jenkins.lastsuccessfulbuild.number[{#NAME}] Preprocessing JSON Path: `$.lastSuccessfulBuild.number` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Successful Build duration	Build duration (in seconds).	Dependent item	jenkins.lastsuccessfulbuild.duration[{#NAME}] Preprocessing JSON Path: `$.lastSuccessfulBuild.duration` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`
Jenkins job [{#NAME}]: Last Successful Build timestamp		Dependent item	jenkins.lastsuccessfulbuild.timestamp[{#NAME}] Preprocessing JSON Path: `$.lastSuccessfulBuild.timestamp` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `30m`

Trigger prototypes for Jobs discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jenkins job [{#NAME}]: Job is unhealthy		`last(/Jenkins by HTTP/jenkins.build.health[{#NAME}])<{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN}`\|Warning	Manual close: Yes

LLD rule Computers discovery

Name Description Type Key and additional info

Computers discovery

HTTP agent

jenkins.computers

Preprocessing

JSON Path: $.computer.[*]

Item prototypes for Computers discovery

Name	Description	Type	Key and additional info
Jenkins: Computer [{#DISPLAY_NAME}]: Get computer	Raw data for a computer.	Dependent item	jenkins.computer.get[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.computer.[?(@.displayName == "{#DISPLAY_NAME}")].first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Jenkins: Computer [{#DISPLAY_NAME}]: Executors	The maximum number of concurrent builds that Jenkins may perform on this node.	Dependent item	jenkins.computer.numExecutors[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.numExecutors` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: State	Represents the actual online/offline state. Node description: {#DESCRIPTION}	Dependent item	jenkins.computer.state[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.offline` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Computer [{#DISPLAY_NAME}]: Offline cause reason	If the computer was offline (either temporarily or not), will return the cause as a string (without user info). Empty string if the system was put offline without given a cause.	Dependent item	jenkins.computer.offline.reason[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.offlineCauseReason` Discard unchanged with heartbeat: `3h`
Jenkins: Computer [{#DISPLAY_NAME}]: Idle	Returns true if all the executors of this computer are idle.	Dependent item	jenkins.computer.idle[{#DISPLAY_NAME}] Preprocessing JSON Path: `$.idle` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Computer [{#DISPLAY_NAME}]: Temporarily offline	Returns true if this node is marked temporarily offline.	Dependent item	jenkins.computer.tempoffline[{#DISPLAYNAME}] Preprocessing JSON Path: `$.temporarilyOffline` Boolean to decimal Discard unchanged with heartbeat: `1h`
Jenkins: Computer [{#DISPLAY_NAME}]: Available disk space	The available disk space of $JENKINS_HOME on agent.	Dependent item	jenkins.computer.diskspace[{#DISPLAYNAME}] Preprocessing JSON Path: `$.monitorData['hudson.node_monitors.DiskSpaceMonitor'].size` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: Available temp space	The available disk space of the temporary directory. Java tools and tests/builds often create files in the temporary directory, and may not function properly if there's no available space.	Dependent item	jenkins.computer.tempspace[{#DISPLAYNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: Response time average	The round trip network response time from the master to the agent	Dependent item	jenkins.computer.responsetime[{#DISPLAYNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Jenkins: Computer [{#DISPLAY_NAME}]: Available physical memory	The total physical memory of the system, available bytes.	Dependent item	jenkins.computer.availablephysicalmemory[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: Available swap space	Available swap space in bytes.	Dependent item	jenkins.computer.availableswapspace[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: Total physical memory	Total physical memory of the system, in bytes.	Dependent item	jenkins.computer.totalphysicalmemory[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: Total swap space	Total number of swap space in bytes.	Dependent item	jenkins.computer.totalswapspace[{#DISPLAY_NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Jenkins: Computer [{#DISPLAY_NAME}]: Clock difference	The clock difference between the master and nodes.	Dependent item	jenkins.computer.clockdifference[{#DISPLAYNAME}] Preprocessing JSON Path: `$.monitorData['hudson.node_monitors.ClockMonitor'].diff` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`

Trigger prototypes for Computers discovery

Name	Description	Expression	Severity	Dependencies and additional info
Jenkins: Computer [{#DISPLAY_NAME}]: Node is down	Node down with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.computer.state[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0`\|Average	Depends on: Jenkins: Service has no online nodes Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline
Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline	Node is temporarily Offline with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)}	`last(/Jenkins by HTTP/jenkins.computer.temp_offline[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_iis_agent_active

View README Download JSON

IIS by Zabbix agent active

Overview

The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Windows Server 2012R2

Configuration

Setup

You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server

Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools

Optionally, it is possible to customize the template:

Set value for the macro {$IIS.QUEUE.MAX.WARN}, if you want to receive alerts when a number of requests in the application pool queue exceeds the threshold.
If you use a non-standard port for the IIS, don't forget to update the macros {$IIS.SERVICE} and {$IIS.PORT}.
Change the value of macro {$IIS.APPPOOL.MONITORED} to "0", if you want to disable all notifications about application pools state.
You can also add additional context macro {$IIS.APPPOOL.MONITORED:} for excluding specific application pools from monitoring.
Change regexp in the macros {$IIS.APPPOOL.MATCHES} and {$IIS.APPPOOL.NOT_MATCHES} used for filtering application pools discovery results.

Macros used

Name	Description	Default
{$IIS.PORT}	Listening port.	`80`
{$IIS.SERVICE}	The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/6.4/manual/config/items/itemtypes/simple_checks	`http`
{$IIS.QUEUE.MAX.WARN}	Maximum application pool's request queue length for trigger expression.
{$IIS.QUEUE.MAX.TIME}	The time during which the queue length may exceed the threshold.	`5m`
{$IIS.APPPOOL.NOT_MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`<CHANGE_IF_NEEDED>`
{$IIS.APPPOOL.MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`.+`
{$IIS.APPPOOL.MONITORED}	Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled.	`1`
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable.	`5m`

Items

Name	Description	Type	Key and additional info
IIS: World Wide Web Publishing Service (W3SVC) state	The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service".	Zabbix agent (active)	service.info[W3SVC] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Windows Process Activation Service (WAS) state	Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains.	Zabbix agent (active)	service.info[WAS] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: {$IIS.PORT} port ping		Simple check	net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Uptime	The service uptime expressed in seconds.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Service Uptime"]
IIS: Bytes Received per second	The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60]
IIS: Bytes Sent per second	The average rate per minute at which data bytes are sent by the service.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60]
IIS: Bytes Total per second	The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec).	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60]
IIS: Current connections	The number of active connections.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Current Connections"]
IIS: Total connection attempts	The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Connection attempts per second	The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Anonymous users per second	The number of requests from users over an anonymous connection per second. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60]
IIS: NonAnonymous users per second	The number of requests from users over a non-anonymous connection per second. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60]
IIS: Method GET requests per second	The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method COPY requests per second	The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method CGI requests per second	The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method DELETE requests per second	The rate of HTTP requests using the DELETE method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method HEAD requests per second	The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method ISAPI requests per second	The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method LOCK requests per second	The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method MKCOL requests per second	The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method MOVE requests per second	The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method OPTIONS requests per second	The rate of HTTP requests using the OPTIONS method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method POST requests per second	Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method PROPFIND requests per second	The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method PROPPATCH requests per second	The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method PUT requests per second	The rate of HTTP requests using the PUT method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method MS-SEARCH requests per second	The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method TRACE requests per second	The rate of HTTP requests using the TRACE method made. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method TRACE requests per second	The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method Total requests per second	The rate of all HTTP requests received. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method Total Other requests per second	Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Locked errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Not Found errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute.	Zabbix agent (active)	perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Files cache hits percentage	The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high.	Zabbix agent (active)	perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: URIs cache hits percentage	The ratio of user-mode URI Cache Hits to total cache requests (since service startup)	Zabbix agent (active)	perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: File cache misses	The total number of unsuccessful lookups in the user-mode file cache since service startup.	Zabbix agent (active)	perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: URI cache misses	The total number of unsuccessful lookups in the user-mode URI cache since service startup.	Zabbix agent (active)	perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Active agent availability	Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available	Zabbix internal	zabbix[host,active_agent,available]

Triggers

Name	Description	Expression	Severity
IIS: The World Wide Web Publishing Service (W3SVC) is not running	The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent active/service.info[W3SVC])<>0`\|High	Depends on: IIS: Windows process Activation Service (WAS) is not running
IIS: Windows process Activation Service (WAS) is not running	Windows Process Activation Service (WAS) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent active/service.info[WAS])<>0`\|High
IIS: Port {$IIS.PORT} is down		`last(/IIS by Zabbix agent active/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0`\|Average	Manual close: Yes Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent active/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m`\|Info	Manual close: Yes
IIS: Active checks are not available	Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time.	`min(/IIS by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2`\|High

LLD rule Application pools discovery

Name	Description	Type	Key and additional info
Application pools discovery		Zabbix agent (active)	wmi.getall[root\webAdministration, select Name from ApplicationPool]

Item prototypes for Application pools discovery

Name	Description	Type	Key and additional info
IIS: {#APPPOOL} Uptime	The web application uptime period since the last restart.	Zabbix agent (active)	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"]
IIS: AppPool {#APPPOOL} state	The state of the application pool.	Zabbix agent (active)	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: AppPool {#APPPOOL} recycles	The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started.	Zabbix agent (active)	perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: AppPool {#APPPOOL} current queue size	The number of requests in the queue.	Zabbix agent (active)	perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing Discard unchanged with heartbeat: `10m`

Trigger prototypes for Application pools discovery

Name	Description	Expression	Severity
IIS: {#APPPOOL} has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m`\|Info	Manual close: Yes
IIS: Application pool {#APPPOOL} is not in Running state		`last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|High	Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: Application pool {#APPPOOL} has been recycled		`last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|Info
IIS: Request queue of {#APPPOOL} is too large		`min(/IIS by Zabbix agent active/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN}`\|Warning	Depends on: IIS: Application pool {#APPPOOL} is not in Running state

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_iis_agent

View README Download JSON

IIS by Zabbix agent

Overview

The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Windows Server 2012R2

Configuration

Setup

You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server

Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools

Optionally, it is possible to customize the template:

Set value for the macro {$IIS.QUEUE.MAX.WARN}, if you want to receive alerts when a number of requests in the application pool queue exceeds the threshold.
If you use a non-standard port for the IIS, don't forget to update the macros {$IIS.SERVICE} and {$IIS.PORT}.
Change the value of macro {$IIS.APPPOOL.MONITORED} to "0", if you want to disable all notifications about application pools state.
You can also add additional context macro {$IIS.APPPOOL.MONITORED:} for excluding specific application pools from monitoring.
Change regexp in the macros {$IIS.APPPOOL.MATCHES} and {$IIS.APPPOOL.NOT_MATCHES} used for filtering application pools discovery results.

Macros used

Name	Description	Default
{$IIS.PORT}	Listening port.	`80`
{$IIS.SERVICE}	The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/6.4/manual/config/items/itemtypes/simple_checks	`http`
{$IIS.QUEUE.MAX.WARN}	Maximum application pool's request queue length for trigger expression.
{$IIS.QUEUE.MAX.TIME}	The time during which the queue length may exceed the threshold.	`5m`
{$IIS.APPPOOL.NOT_MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`<CHANGE_IF_NEEDED>`
{$IIS.APPPOOL.MATCHES}	This macro is used in application pools discovery. Can be overridden on the host or linked template level.	`.+`
{$IIS.APPPOOL.MONITORED}	Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled.	`1`

Items

Name	Description	Type	Key and additional info
IIS: World Wide Web Publishing Service (W3SVC) state	The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service".	Zabbix agent	service.info[W3SVC] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Windows Process Activation Service (WAS) state	Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains.	Zabbix agent	service.info[WAS] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: {$IIS.PORT} port ping		Simple check	net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Uptime	The service uptime expressed in seconds.	Zabbix agent	perfcounteren["\Web Service(_Total)\Service Uptime"]
IIS: Bytes Received per second	The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes.	Zabbix agent	perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60]
IIS: Bytes Sent per second	The average rate per minute at which data bytes are sent by the service.	Zabbix agent	perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60]
IIS: Bytes Total per second	The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec).	Zabbix agent	perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60]
IIS: Current connections	The number of active connections.	Zabbix agent	perfcounteren["\Web Service(_Total)\Current Connections"]
IIS: Total connection attempts	The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined.	Zabbix agent	perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Connection attempts per second	The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined.	Zabbix agent	perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Anonymous users per second	The number of requests from users over an anonymous connection per second. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60]
IIS: NonAnonymous users per second	The number of requests from users over a non-anonymous connection per second. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60]
IIS: Method GET requests per second	The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method COPY requests per second	The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method CGI requests per second	The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method DELETE requests per second	The rate of HTTP requests using the DELETE method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method HEAD requests per second	The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method ISAPI requests per second	The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method LOCK requests per second	The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method MKCOL requests per second	The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method MOVE requests per second	The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method OPTIONS requests per second	The rate of HTTP requests using the OPTIONS method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method POST requests per second	Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method PROPFIND requests per second	The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method PROPPATCH requests per second	The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method PUT requests per second	The rate of HTTP requests using the PUT method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method MS-SEARCH requests per second	The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method TRACE requests per second	The rate of HTTP requests using the TRACE method made. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method TRACE requests per second	The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method Total requests per second	The rate of all HTTP requests received. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Method Total Other requests per second	Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Locked errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Not Found errors per second	The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute.	Zabbix agent	perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: Files cache hits percentage	The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high.	Zabbix agent	perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: URIs cache hits percentage	The ratio of user-mode URI Cache Hits to total cache requests (since service startup)	Zabbix agent	perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: File cache misses	The total number of unsuccessful lookups in the user-mode file cache since service startup.	Zabbix agent	perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: URI cache misses	The total number of unsuccessful lookups in the user-mode URI cache since service startup.	Zabbix agent	perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing Discard unchanged with heartbeat: `10m`

Triggers

Name	Description	Expression	Severity
IIS: The World Wide Web Publishing Service (W3SVC) is not running	The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent/service.info[W3SVC])<>0`\|High	Depends on: IIS: Windows process Activation Service (WAS) is not running
IIS: Windows process Activation Service (WAS) is not running	Windows Process Activation Service (WAS) is not in the running state. IIS cannot start.	`last(/IIS by Zabbix agent/service.info[WAS])<>0`\|High
IIS: Port {$IIS.PORT} is down		`last(/IIS by Zabbix agent/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0`\|Average	Manual close: Yes Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m`\|Info	Manual close: Yes

LLD rule Application pools discovery

Name	Description	Type	Key and additional info
Application pools discovery		Zabbix agent	wmi.getall[root\webAdministration, select Name from ApplicationPool]

Item prototypes for Application pools discovery

Name	Description	Type	Key and additional info
IIS: {#APPPOOL} Uptime	The web application uptime period since the last restart.	Zabbix agent	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"]
IIS: AppPool {#APPPOOL} state	The state of the application pool.	Zabbix agent	perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: AppPool {#APPPOOL} recycles	The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started.	Zabbix agent	perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing Discard unchanged with heartbeat: `10m`
IIS: AppPool {#APPPOOL} current queue size	The number of requests in the queue.	Zabbix agent	perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing Discard unchanged with heartbeat: `10m`

Trigger prototypes for Application pools discovery

Name	Description	Expression	Severity
IIS: {#APPPOOL} has been restarted	Uptime is less than 10 minutes.	`last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m`\|Info	Manual close: Yes
IIS: Application pool {#APPPOOL} is not in Running state		`last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|High	Depends on: IIS: The World Wide Web Publishing Service (W3SVC) is not running
IIS: Application pool {#APPPOOL} has been recycled		`last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1`\|Info
IIS: Request queue of {#APPPOOL} is too large		`min(/IIS by Zabbix agent/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN}`\|Warning	Depends on: IIS: Application pool {#APPPOOL} is not in Running state

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_haproxy_http

View README Download JSON

HAProxy by HTTP

Overview

The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling the HAProxy stats page with HTTP agent.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HAProxy 1.8

Configuration

Setup

Set up the HAProxy stats page.

If you want to use authentication, set the username and password in the stats auth option of the configuration file.

The example configuration of HAProxy:

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    #stats auth Username:Password  # Authentication credentials

Set the hostname or IP address of the HAProxy stats host or container in the {$HAPROXY.STATS.HOST} macro. You can also change the status page port in the {$HAPROXY.STATS.PORT} macro, the status page scheme in the {$HAPROXY.STATS.SCHEME} macro and the status page path in the {$HAPROXY.STATS.PATH} macro if necessary.
If you have enabled authentication in the HAProxy configuration file in step 1, set the username and password in the {$HAPROXY.USERNAME} and {$HAPROXY.PASSWORD} macros.

Macros used

Name	Description	Default
{$HAPROXY.STATS.SCHEME}	The scheme of HAProxy stats page (http/https).	`http`
{$HAPROXY.STATS.HOST}	The hostname or IP address of the HAProxy stats host or container.	`<SET HAPROXY HOST>`
{$HAPROXY.STATS.PORT}	The port of the HAProxy stats host or container.	`8404`
{$HAPROXY.STATS.PATH}	The path of the HAProxy stats page.	`stats`
{$HAPROXY.USERNAME}	The username of the HAProxy stats page.
{$HAPROXY.PASSWORD}	The password of the HAProxy stats page.
{$HAPROXY.RESPONSE_TIME.MAX.WARN}	The HAProxy stats page maximum response time in seconds for trigger expression.	`10s`
{$HAPROXY.FRONT_DREQ.MAX.WARN}	The HAProxy maximum denied requests for trigger expression.	`10`
{$HAPROXY.FRONT_EREQ.MAX.WARN}	The HAProxy maximum number of request errors for trigger expression.	`10`
{$HAPROXY.BACK_QCUR.MAX.WARN}	Maximum number of requests on Backend unassigned in queue for trigger expression.	`10`
{$HAPROXY.BACK_RTIME.MAX.WARN}	Maximum of average Backend response time for trigger expression.	`10s`
{$HAPROXY.BACK_QTIME.MAX.WARN}	Maximum of average time spent in queue on Backend for trigger expression.	`10s`
{$HAPROXY.BACK_ERESP.MAX.WARN}	Maximum of responses with error on Backend for trigger expression.	`10`
{$HAPROXY.SERVER_QCUR.MAX.WARN}	Maximum number of requests on server unassigned in queue for trigger expression.	`10`
{$HAPROXY.SERVER_RTIME.MAX.WARN}	Maximum of average server response time for trigger expression.	`10s`
{$HAPROXY.SERVER_QTIME.MAX.WARN}	Maximum of average time spent in queue on server for trigger expression.	`10s`
{$HAPROXY.SERVER_ERESP.MAX.WARN}	Maximum of responses with error on server for trigger expression.	`10`
{$HAPROXY.FRONT_SUTIL.MAX.WARN}	Maximum of session usage percentage on frontend for trigger expression.	`80`

Items

Name	Description	Type	Key and additional info
HAProxy: Get stats	HAProxy Statistics Report in CSV format	HTTP agent	haproxy.get Preprocessing Regular expression: `# ([\s\S]*)\n \1` CSV to JSON
HAProxy: Get nodes	Array for LLD rules.	Dependent item	haproxy.get.nodes Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HAProxy: Get stats page	HAProxy Statistics Report HTML	HTTP agent	haproxy.get_html
HAProxy: Version		Dependent item	haproxy.version Preprocessing Regular expression: `HAProxy version ([^,]*), \1` ⛔️Custom on fail: Set error to: `HAProxy version is not found` Discard unchanged with heartbeat: `1d`
HAProxy: Uptime		Dependent item	haproxy.uptime Preprocessing JavaScript: `The text is too long. Please see the template.`
HAProxy: Service status		Simple check	net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
HAProxy: Service response time		Simple check	net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"]

Triggers

Name	Description	Expression	Severity
HAProxy: Version has changed	HAProxy version has changed. Acknowledge to close the problem manually.	`last(/HAProxy by HTTP/haproxy.version,#1)<>last(/HAProxy by HTTP/haproxy.version,#2) and length(last(/HAProxy by HTTP/haproxy.version))>0`\|Info	Manual close: Yes
HAProxy: has been restarted	Uptime is less than 10 minutes.	`last(/HAProxy by HTTP/haproxy.uptime)<10m`\|Info	Manual close: Yes
HAProxy: Service is down		`last(/HAProxy by HTTP/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0`\|Average	Manual close: Yes
HAProxy: Service response time is too high		`min(/HAProxy by HTTP/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: HAProxy: Service is down

LLD rule Backend discovery

Name	Description	Type	Key and additional info
Backend discovery	Discovery backends	Dependent item	haproxy.backend.discovery

Item prototypes for Backend discovery

Name	Description	Type	Key and additional info
HAProxy Backend {#PXNAME}: Raw data	The raw data of the Backend with the name `{#PXNAME}`	Dependent item	haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
HAProxy Backend {#PXNAME}: Status	Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server.	Dependent item	haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
HAProxy Backend {#PXNAME}: Responses time	Average backend response time (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
HAProxy Backend {#PXNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
HAProxy Backend {#PXNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
HAProxy Backend {#PXNAME}: Response errors per second	Number of requests whose responses yielded an error	Dependent item	haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
HAProxy Backend {#PXNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
HAProxy Backend {#PXNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
HAProxy Backend {#PXNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
HAProxy Backend {#PXNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
HAProxy Backend {#PXNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
HAProxy Backend {#PXNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
HAProxy Backend {#PXNAME}: Number of active servers	Number of active servers.	Dependent item	haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
HAProxy Backend {#PXNAME}: Number of backup servers	Number of backup servers.	Dependent item	haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
HAProxy Backend {#PXNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
HAProxy Backend {#PXNAME}: Weight	Total effective weight.	Dependent item	haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Backend discovery

Name	Description	Expression
HAProxy backend {#PXNAME}: Server is DOWN	Backend is not available.	`count(/HAProxy by HTTP/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Average
HAProxy backend {#PXNAME}: Average response time is high	Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN}`\|Warning
HAProxy backend {#PXNAME}: Number of responses with error is high	Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN}`\|Warning
HAProxy backend {#PXNAME}: Current number of requests unassigned in queue is high	Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN}`\|Warning
HAProxy backend {#PXNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN}`\|Warning

LLD rule Frontend discovery

Name	Description	Type	Key and additional info
Frontend discovery	Discovery frontends	Dependent item	haproxy.frontend.discovery

Item prototypes for Frontend discovery

Name	Description	Type	Key and additional info
HAProxy Frontend {#PXNAME}: Raw data	The raw data of the Frontend with the name `{#PXNAME}`	Dependent item	haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
HAProxy Frontend {#PXNAME}: Status	Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic.	Dependent item	haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `6h`
HAProxy Frontend {#PXNAME}: Requests rate	HTTP requests per second	Dependent item	haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.req_rate`
HAProxy Frontend {#PXNAME}: Sessions rate	Number of sessions created per second	Dependent item	haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rate`
HAProxy Frontend {#PXNAME}: Established sessions	The current number of established sessions.	Dependent item	haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.scur`
HAProxy Frontend {#PXNAME}: Session limits	The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend.	Dependent item	haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.slim` Discard unchanged with heartbeat: `1h`
HAProxy Frontend {#PXNAME}: Session utilization	Percentage of sessions used (scur / slim * 100).	Calculated	haproxy.frontend.sutil[{#PXNAME},{#SVNAME}]
HAProxy Frontend {#PXNAME}: Request errors per second	Number of request errors per second.	Dependent item	haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.ereq` Change per second
HAProxy Frontend {#PXNAME}: Denied requests per second	Requests denied due to security concerns (ACL-restricted) per second.	Dependent item	haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dreq` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
HAProxy Frontend {#PXNAME}: Incoming traffic	Number of bits received by the frontend	Dependent item	haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
HAProxy Frontend {#PXNAME}: Outgoing traffic	Number of bits sent by the frontend	Dependent item	haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second

Trigger prototypes for Frontend discovery

Name	Description	Expression
HAProxy frontend {#PXNAME}: Session utilization is high	Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box.	`min(/HAProxy by HTTP/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN}`\|Warning
HAProxy frontend {#PXNAME}: Number of request errors is high	Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN}`\|Warning
HAProxy frontend {#PXNAME}: Number of requests denied is high	Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN}`\|Warning

LLD rule Server discovery

Name	Description	Type	Key and additional info
Server discovery	Discovery servers	Dependent item	haproxy.server.discovery

Item prototypes for Server discovery

Name	Description	Type	Key and additional info
HAProxy Server {#PXNAME} {#SVNAME}: Raw data	The raw data of the Server named `{#SVNAME}` and the proxy with the name `{#PXNAME}`	Dependent item	haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
HAProxy {#PXNAME} {#SVNAME}: Status		Dependent item	haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
HAProxy {#PXNAME} {#SVNAME}: Responses time	Average server response time (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
HAProxy {#PXNAME} {#SVNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
HAProxy {#PXNAME} {#SVNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
HAProxy {#PXNAME} {#SVNAME}: Response errors per second	Number of requests whose responses yielded an error.	Dependent item	haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
HAProxy {#PXNAME} {#SVNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
HAProxy {#PXNAME} {#SVNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
HAProxy {#PXNAME} {#SVNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
HAProxy {#PXNAME} {#SVNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
HAProxy {#PXNAME} {#SVNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
HAProxy {#PXNAME} {#SVNAME}: Server is active	Shows whether the server is active (marked with a Y) or a backup (marked with a -).	Dependent item	haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
HAProxy {#PXNAME} {#SVNAME}: Server is backup	Shows whether the server is a backup (marked with a Y) or active (marked with a -).	Dependent item	haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
HAProxy {#PXNAME} {#SVNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
HAProxy {#PXNAME} {#SVNAME}: Weight	Effective weight.	Dependent item	haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`
HAProxy {#PXNAME} {#SVNAME}: Configured maxqueue	Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit).	Dependent item	haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qlimit` Discard unchanged with heartbeat: `6h` Matches regular expression: `^\d+$` ⛔️Custom on fail: Set value to: `0`
HAProxy {#PXNAME} {#SVNAME}: Server was selected per second	Number of times that server was selected.	Dependent item	haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.lbtot` Change per second
HAProxy {#PXNAME} {#SVNAME}: Status of last health check	Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK".	Dependent item	haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.check_status` Discard unchanged with heartbeat: `10m`

Trigger prototypes for Server discovery

Name	Description	Expression	Severity
HAProxy {#PXNAME} {#SVNAME}: Server is DOWN	Server is not available.	`count(/HAProxy by HTTP/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Average response time is high	Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Number of responses with error is high	Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high	Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}.	`min(/HAProxy by HTTP/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Health check error	Please check the server for faults.	`find(/HAProxy by HTTP/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK\|^$)")=0`\|Warning	Depends on: HAProxy {#PXNAME} {#SVNAME}: Server is DOWN

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_haproxy_agent

View README Download JSON

HAProxy by Zabbix agent

Overview

The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by polling the HAProxy stats page with Zabbix agent.

Note, that this template doesn't support authentication and redirects (limitations of web.page.get).

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HAProxy 1.8

Configuration

Setup

Set up the HAProxy stats page.

The example configuration of HAProxy:

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s

Set the hostname or IP address of the HAProxy stats host or container in the {$HAPROXY.STATS.HOST} macro. You can also change the status page port in the {$HAPROXY.STATS.PORT} macro, the status page scheme in the {$HAPROXY.STATS.SCHEME} macro and the status page path in the {$HAPROXY.STATS.PATH} macro if necessary.

Macros used

Name	Description	Default
{$HAPROXY.STATS.SCHEME}	The scheme of HAProxy stats page(http/https).	`http`
{$HAPROXY.STATS.HOST}	The hostname or IP address of the HAProxy stats host or container.	`localhost`
{$HAPROXY.STATS.PORT}	The port of the HAProxy stats host or container.	`8404`
{$HAPROXY.STATS.PATH}	The path of HAProxy stats page.	`stats`
{$HAPROXY.RESPONSE_TIME.MAX.WARN}	The HAProxy stats page maximum response time in seconds for trigger expression.	`10s`
{$HAPROXY.FRONT_DREQ.MAX.WARN}	The HAProxy maximum denied requests for trigger expression.	`10`
{$HAPROXY.FRONT_EREQ.MAX.WARN}	The HAProxy maximum number of request errors for trigger expression.	`10`
{$HAPROXY.BACK_QCUR.MAX.WARN}	Maximum number of requests on BACKEND unassigned in queue for trigger expression.	`10`
{$HAPROXY.BACK_RTIME.MAX.WARN}	Maximum of average BACKEND response time for trigger expression.	`10s`
{$HAPROXY.BACK_QTIME.MAX.WARN}	Maximum of average time spent in queue on BACKEND for trigger expression.	`10s`
{$HAPROXY.BACK_ERESP.MAX.WARN}	Maximum of responses with error on BACKEND for trigger expression.	`10`
{$HAPROXY.SERVER_QCUR.MAX.WARN}	Maximum number of requests on server unassigned in queue for trigger expression.	`10`
{$HAPROXY.SERVER_RTIME.MAX.WARN}	Maximum of average server response time for trigger expression.	`10s`
{$HAPROXY.SERVER_QTIME.MAX.WARN}	Maximum of average time spent in queue on server for trigger expression.	`10s`
{$HAPROXY.SERVER_ERESP.MAX.WARN}	Maximum of responses with error on server for trigger expression.	`10`
{$HAPROXY.FRONT_SUTIL.MAX.WARN}	Maximum of session usage percentage on frontend for trigger expression.	`80`

Items

Name	Description	Type	Key and additional info
HAProxy: Get stats	HAProxy Statistics Report in CSV format	Zabbix agent	web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH};csv"] Preprocessing Regular expression: `# ([\s\S]*) \1` CSV to JSON
HAProxy: Get nodes	Array for LLD rules.	Dependent item	haproxy.get.nodes Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
HAProxy: Get stats page	HAProxy Statistics Report HTML	Zabbix agent	web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH}"]
HAProxy: Version		Dependent item	haproxy.version Preprocessing Regular expression: `HAProxy version ([^,]*), \1` ⛔️Custom on fail: Set error to: `HAProxy version is not found` Discard unchanged with heartbeat: `1d`
HAProxy: Uptime		Dependent item	haproxy.uptime Preprocessing JavaScript: `The text is too long. Please see the template.`
HAProxy: Service status		Zabbix agent	net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
HAProxy: Service response time		Zabbix agent	net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"]

Triggers

Name	Description	Expression	Severity
HAProxy: Version has changed	HAProxy version has changed. Acknowledge to close the problem manually.	`last(/HAProxy by Zabbix agent/haproxy.version,#1)<>last(/HAProxy by Zabbix agent/haproxy.version,#2) and length(last(/HAProxy by Zabbix agent/haproxy.version))>0`\|Info	Manual close: Yes
HAProxy: has been restarted	Uptime is less than 10 minutes.	`last(/HAProxy by Zabbix agent/haproxy.uptime)<10m`\|Info	Manual close: Yes
HAProxy: Service is down		`last(/HAProxy by Zabbix agent/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0`\|Average	Manual close: Yes
HAProxy: Service response time is too high		`min(/HAProxy by Zabbix agent/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: HAProxy: Service is down

LLD rule Backend discovery

Name	Description	Type	Key and additional info
Backend discovery	Discovery backends	Dependent item	haproxy.backend.discovery

Item prototypes for Backend discovery

Name	Description	Type	Key and additional info
HAProxy Backend {#PXNAME}: Raw data	The raw data of the Backend with the name `{#PXNAME}`	Dependent item	haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
HAProxy Backend {#PXNAME}: Status	Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server.	Dependent item	haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
HAProxy Backend {#PXNAME}: Responses time	Average backend response time (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
HAProxy Backend {#PXNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
HAProxy Backend {#PXNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
HAProxy Backend {#PXNAME}: Response errors per second	Number of requests whose responses yielded an error	Dependent item	haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
HAProxy Backend {#PXNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
HAProxy Backend {#PXNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests	Dependent item	haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
HAProxy Backend {#PXNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
HAProxy Backend {#PXNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
HAProxy Backend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
HAProxy Backend {#PXNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
HAProxy Backend {#PXNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
HAProxy Backend {#PXNAME}: Number of active servers	Number of active servers.	Dependent item	haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
HAProxy Backend {#PXNAME}: Number of backup servers	Number of backup servers.	Dependent item	haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
HAProxy Backend {#PXNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
HAProxy Backend {#PXNAME}: Weight	Total effective weight.	Dependent item	haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Backend discovery

Name	Description	Expression
HAProxy backend {#PXNAME}: Server is DOWN	Backend is not available.	`count(/HAProxy by Zabbix agent/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Average
HAProxy backend {#PXNAME}: Average response time is high	Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN}`\|Warning
HAProxy backend {#PXNAME}: Number of responses with error is high	Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN}`\|Warning
HAProxy backend {#PXNAME}: Current number of requests unassigned in queue is high	Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN}`\|Warning
HAProxy backend {#PXNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN}`\|Warning

LLD rule Frontend discovery

Name	Description	Type	Key and additional info
Frontend discovery	Discovery frontends	Dependent item	haproxy.frontend.discovery

Item prototypes for Frontend discovery

Name	Description	Type	Key and additional info
HAProxy Frontend {#PXNAME}: Raw data	The raw data of the Frontend with the name `{#PXNAME}`	Dependent item	haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
HAProxy Frontend {#PXNAME}: Status	Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic.	Dependent item	haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `6h`
HAProxy Frontend {#PXNAME}: Requests rate	HTTP requests per second	Dependent item	haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.req_rate`
HAProxy Frontend {#PXNAME}: Sessions rate	Number of sessions created per second	Dependent item	haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rate`
HAProxy Frontend {#PXNAME}: Established sessions	The current number of established sessions.	Dependent item	haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.scur`
HAProxy Frontend {#PXNAME}: Session limits	The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend.	Dependent item	haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.slim` Discard unchanged with heartbeat: `1h`
HAProxy Frontend {#PXNAME}: Session utilization	Percentage of sessions used (scur / slim * 100).	Calculated	haproxy.frontend.sutil[{#PXNAME},{#SVNAME}]
HAProxy Frontend {#PXNAME}: Request errors per second	Number of request errors per second.	Dependent item	haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.ereq` Change per second
HAProxy Frontend {#PXNAME}: Denied requests per second	Requests denied due to security concerns (ACL-restricted) per second.	Dependent item	haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dreq` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
HAProxy Frontend {#PXNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
HAProxy Frontend {#PXNAME}: Incoming traffic	Number of bits received by the frontend	Dependent item	haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
HAProxy Frontend {#PXNAME}: Outgoing traffic	Number of bits sent by the frontend	Dependent item	haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second

Trigger prototypes for Frontend discovery

Name	Description	Expression
HAProxy frontend {#PXNAME}: Session utilization is high	Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box.	`min(/HAProxy by Zabbix agent/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN}`\|Warning
HAProxy frontend {#PXNAME}: Number of request errors is high	Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN}`\|Warning
HAProxy frontend {#PXNAME}: Number of requests denied is high	Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN}`\|Warning

LLD rule Server discovery

Name	Description	Type	Key and additional info
Server discovery	Discovery servers	Dependent item	haproxy.server.discovery

Item prototypes for Server discovery

Name	Description	Type	Key and additional info
HAProxy Server {#PXNAME} {#SVNAME}: Raw data	The raw data of the Server named `{#SVNAME}` and the proxy with the name `{#PXNAME}`	Dependent item	haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
HAProxy {#PXNAME} {#SVNAME}: Status		Dependent item	haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.status` Discard unchanged with heartbeat: `10m`
HAProxy {#PXNAME} {#SVNAME}: Responses time	Average server response time (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.rtime` Custom multiplier: `0.001`
HAProxy {#PXNAME} {#SVNAME}: Errors connection per second	Number of requests that encountered an error attempting to connect to a backend server.	Dependent item	haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.econ` Change per second
HAProxy {#PXNAME} {#SVNAME}: Responses denied per second	Responses denied due to security concerns (ACL-restricted).	Dependent item	haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.dresp` Change per second
HAProxy {#PXNAME} {#SVNAME}: Response errors per second	Number of requests whose responses yielded an error.	Dependent item	haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.eresp` Change per second
HAProxy {#PXNAME} {#SVNAME}: Unassigned requests	Current number of requests unassigned in queue.	Dependent item	haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qcur`
HAProxy {#PXNAME} {#SVNAME}: Time in queue	Average time spent in queue (in ms) for the last 1,024 requests.	Dependent item	haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qtime` Custom multiplier: `0.001`
HAProxy {#PXNAME} {#SVNAME}: Redispatched requests per second	Number of times a request was redispatched to a different backend.	Dependent item	haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wredis` Change per second
HAProxy {#PXNAME} {#SVNAME}: Retried connections per second	Number of times a connection was retried.	Dependent item	haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.wretr` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second	Number of informational HTTP responses per second.	Dependent item	haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_1xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second	Number of successful HTTP responses per second.	Dependent item	haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_2xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second	Number of HTTP redirections per second.	Dependent item	haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_3xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second	Number of HTTP client errors per second.	Dependent item	haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_4xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second	Number of HTTP server errors per second.	Dependent item	haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.hrsp_5xx` Change per second
HAProxy {#PXNAME} {#SVNAME}: Incoming traffic	Number of bits received by the backend	Dependent item	haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bin` Custom multiplier: `8` Change per second
HAProxy {#PXNAME} {#SVNAME}: Outgoing traffic	Number of bits sent by the backend	Dependent item	haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bout` Custom multiplier: `8` Change per second
HAProxy {#PXNAME} {#SVNAME}: Server is active	Shows whether the server is active (marked with a Y) or a backup (marked with a -).	Dependent item	haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.act` Discard unchanged with heartbeat: `1h`
HAProxy {#PXNAME} {#SVNAME}: Server is backup	Shows whether the server is a backup (marked with a Y) or active (marked with a -).	Dependent item	haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.bck` Discard unchanged with heartbeat: `1h`
HAProxy {#PXNAME} {#SVNAME}: Sessions per second	Cumulative number of sessions (end-to-end connections) per second.	Dependent item	haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.stot` Change per second
HAProxy {#PXNAME} {#SVNAME}: Weight	Effective weight.	Dependent item	haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.weight` Discard unchanged with heartbeat: `1h`
HAProxy {#PXNAME} {#SVNAME}: Configured maxqueue	Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit).	Dependent item	haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.qlimit` Discard unchanged with heartbeat: `6h` Matches regular expression: `^\d+$` ⛔️Custom on fail: Set value to: `0`
HAProxy {#PXNAME} {#SVNAME}: Server was selected per second	Number of times that server was selected.	Dependent item	haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.lbtot` Change per second
HAProxy {#PXNAME} {#SVNAME}: Status of last health check	Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK".	Dependent item	haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing JSON Path: `$.check_status` Discard unchanged with heartbeat: `10m`

Trigger prototypes for Server discovery

Name	Description	Expression	Severity
HAProxy {#PXNAME} {#SVNAME}: Server is DOWN	Server is not available.	`count(/HAProxy by Zabbix agent/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Average response time is high	Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Number of responses with error is high	Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high	Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Average time spent in queue is high	Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}.	`min(/HAProxy by Zabbix agent/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN}`\|Warning
HAProxy {#PXNAME} {#SVNAME}: Health check error	Please check the server for faults.	`find(/HAProxy by Zabbix agent/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK\|^$)")=0`\|Warning	Depends on: HAProxy {#PXNAME} {#SVNAME}: Server is DOWN

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_hadoop_http

View README Download JSON

Hadoop by HTTP

Overview

The template for monitoring Hadoop over HTTP that works without any external scripts. It collects metrics by polling the Hadoop API remotely using an HTTP agent and JSONPath preprocessing. Zabbix server (or proxy) execute direct requests to ResourceManager, NodeManagers, NameNode, DataNodes APIs. All metrics are collected at once, thanks to the Zabbix bulk data collection.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Hadoop 3.1 and later

Configuration

Setup

You should define the IP address (or FQDN) and Web-UI port for the ResourceManager in {$HADOOP.RESOURCEMANAGER.HOST} and {$HADOOP.RESOURCEMANAGER.PORT} macros and for the NameNode in {$HADOOP.NAMENODE.HOST} and {$HADOOP.NAMENODE.PORT} macros respectively. Macros can be set in the template or overridden at the host level.

Macros used

Name	Description	Default
{$HADOOP.RESOURCEMANAGER.HOST}	The Hadoop ResourceManager host IP address or FQDN.	`ResourceManager`
{$HADOOP.RESOURCEMANAGER.PORT}	The Hadoop ResourceManager Web-UI port.	`8088`
{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN}	The Hadoop ResourceManager API page maximum response time in seconds for trigger expression.	`10s`
{$HADOOP.NAMENODE.HOST}	The Hadoop NameNode host IP address or FQDN.	`NameNode`
{$HADOOP.NAMENODE.PORT}	The Hadoop NameNode Web-UI port.	`9870`
{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN}	The Hadoop NameNode API page maximum response time in seconds for trigger expression.	`10s`
{$HADOOP.CAPACITY_REMAINING.MIN.WARN}	The Hadoop cluster capacity remaining percent for trigger expression.	`20`

Items

Name	Description	Type	Key and additional info
ResourceManager: Service status	Hadoop ResourceManager API port availability.	Simple check	net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
ResourceManager: Service response time	Hadoop ResourceManager API performance.	Simple check	net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"]
Hadoop: Get ResourceManager stats		HTTP agent	hadoop.resourcemanager.get
ResourceManager: Uptime		Dependent item	hadoop.resourcemanager.uptime Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
ResourceManager: Get info		Dependent item	hadoop.resourcemanager.info Preprocessing JSON Path: `$.beans[?(@.name=~'Hadoop:service=ResourceManager,name=*')]` ⛔️Custom on fail: Set value to: `[]`
ResourceManager: RPC queue & processing time	Average time spent on processing RPC requests.	Dependent item	hadoop.resourcemanager.rpcprocessingtime_avg Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Active NMs	Number of Active NodeManagers.	Dependent item	hadoop.resourcemanager.numactivenm Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
ResourceManager: Decommissioning NMs	Number of Decommissioning NodeManagers.	Dependent item	hadoop.resourcemanager.numdecommissioningnm Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
ResourceManager: Decommissioned NMs	Number of Decommissioned NodeManagers.	Dependent item	hadoop.resourcemanager.numdecommissionednm Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Lost NMs	Number of Lost NodeManagers.	Dependent item	hadoop.resourcemanager.numlostnm Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
ResourceManager: Unhealthy NMs	Number of Unhealthy NodeManagers.	Dependent item	hadoop.resourcemanager.numunhealthynm Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Rebooted NMs	Number of Rebooted NodeManagers.	Dependent item	hadoop.resourcemanager.numrebootednm Preprocessing JSON Path: `The text is too long. Please see the template.`
ResourceManager: Shutdown NMs	Number of Shutdown NodeManagers.	Dependent item	hadoop.resourcemanager.numshutdownnm Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Service status	Hadoop NameNode API port availability.	Simple check	net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
NameNode: Service response time	Hadoop NameNode API performance.	Simple check	net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"]
Hadoop: Get NameNode stats		HTTP agent	hadoop.namenode.get
NameNode: Uptime		Dependent item	hadoop.namenode.uptime Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
NameNode: Get info		Dependent item	hadoop.namenode.info Preprocessing JSON Path: `$.beans[?(@.name=~'Hadoop:service=NameNode,name=*')]` ⛔️Custom on fail: Set value to: `[]`
NameNode: RPC queue & processing time	Average time spent on processing RPC requests.	Dependent item	hadoop.namenode.rpcprocessingtime_avg Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Block Pool Renaming		Dependent item	hadoop.namenode.percentblockpool_used Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Transactions since last checkpoint	Total number of transactions since last checkpoint.	Dependent item	hadoop.namenode.transactionssincelast_checkpoint Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Percent capacity remaining	Available capacity in percent.	Dependent item	hadoop.namenode.percent_remaining Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Capacity remaining	Available capacity.	Dependent item	hadoop.namenode.capacity_remaining Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Corrupt blocks	Number of corrupt blocks.	Dependent item	hadoop.namenode.corrupt_blocks Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Missing blocks	Number of missing blocks.	Dependent item	hadoop.namenode.missing_blocks Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Failed volumes	Number of failed volumes.	Dependent item	hadoop.namenode.volumefailurestotal Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Alive DataNodes	Count of alive DataNodes.	Dependent item	hadoop.namenode.numlivedata_nodes Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Dead DataNodes	Count of dead DataNodes.	Dependent item	hadoop.namenode.numdeaddata_nodes Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Stale DataNodes	DataNodes that do not send a heartbeat within 30 seconds are marked as "stale".	Dependent item	hadoop.namenode.numstaledata_nodes Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
NameNode: Total files	Total count of files tracked by the NameNode.	Dependent item	hadoop.namenode.files_total Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Total load	The current number of concurrent file accesses (read/write) across all DataNodes.	Dependent item	hadoop.namenode.total_load Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Blocks allocable	Maximum number of blocks allocable.	Dependent item	hadoop.namenode.block_capacity Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Total blocks	Count of blocks tracked by NameNode.	Dependent item	hadoop.namenode.blocks_total Preprocessing JSON Path: `The text is too long. Please see the template.`
NameNode: Under-replicated blocks	The number of blocks with insufficient replication.	Dependent item	hadoop.namenode.underreplicatedblocks Preprocessing JSON Path: `The text is too long. Please see the template.`
Hadoop: Get NodeManagers states		HTTP agent	hadoop.nodemanagers.get Preprocessing JavaScript: `The text is too long. Please see the template.`
Hadoop: Get DataNodes states		HTTP agent	hadoop.datanodes.get Preprocessing JavaScript: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
ResourceManager: Service is unavailable		`last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"])=0`\|Average	Manual close: Yes
ResourceManager: Service response time is too high		`min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"],5m)>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: ResourceManager: Service is unavailable
ResourceManager: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m`\|Info	Manual close: Yes
ResourceManager: Failed to fetch ResourceManager API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.resourcemanager.uptime,30m)=1`\|Warning	Manual close: Yes Depends on: ResourceManager: Service is unavailable
ResourceManager: Cluster has no active NodeManagers	Cluster is unable to execute any jobs without at least one NodeManager.	`max(/Hadoop by HTTP/hadoop.resourcemanager.num_active_nm,5m)=0`\|High
ResourceManager: Cluster has unhealthy NodeManagers	YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved.	`min(/Hadoop by HTTP/hadoop.resourcemanager.num_unhealthy_nm,15m)>0`\|Average
NameNode: Service is unavailable		`last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"])=0`\|Average	Manual close: Yes
NameNode: Service response time is too high		`min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"],5m)>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: NameNode: Service is unavailable
NameNode: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m`\|Info	Manual close: Yes
NameNode: Failed to fetch NameNode API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.namenode.uptime,30m)=1`\|Warning	Manual close: Yes Depends on: NameNode: Service is unavailable
NameNode: Cluster capacity remaining is low	A good practice is to ensure that disk use never exceeds 80 percent capacity.	`max(/Hadoop by HTTP/hadoop.namenode.percent_remaining,15m)<{$HADOOP.CAPACITY_REMAINING.MIN.WARN}`\|Warning
NameNode: Cluster has missing blocks	A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica.	`min(/Hadoop by HTTP/hadoop.namenode.missing_blocks,15m)>0`\|Average
NameNode: Cluster has volume failures	HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice.	`min(/Hadoop by HTTP/hadoop.namenode.volume_failures_total,15m)>0`\|Average
NameNode: Cluster has DataNodes in Dead state	The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes.	`min(/Hadoop by HTTP/hadoop.namenode.num_dead_data_nodes,5m)>0`\|Average

LLD rule Node manager discovery

Name Description Type Key and additional info

Node manager discovery

HTTP agent

hadoop.nodemanager.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Node manager discovery

Name	Description	Type	Key and additional info
Hadoop NodeManager {#HOSTNAME}: Get stats		HTTP agent	hadoop.nodemanager.get[{#HOSTNAME}]
{#HOSTNAME}: RPC queue & processing time	Average time spent on processing RPC requests.	Dependent item	hadoop.nodemanager.rpcprocessingtime_avg[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Container launch avg duration		Dependent item	hadoop.nodemanager.containerlaunchduration_avg[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Threads	The number of JVM threads.	Dependent item	hadoop.nodemanager.jvm.threads[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Garbage collection time	The JVM garbage collection time in milliseconds.	Dependent item	hadoop.nodemanager.jvm.gc_time[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Heap usage	The JVM heap usage in MBytes.	Dependent item	hadoop.nodemanager.jvm.memheapused[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Uptime		Dependent item	hadoop.nodemanager.uptime[{#HOSTNAME}] Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
Hadoop NodeManager {#HOSTNAME}: Get raw info		Dependent item	hadoop.nodemanager.raw_info[{#HOSTNAME}] Preprocessing JSON Path: `$.[?(@.HostName=='{#HOSTNAME}')].first()` ⛔️Custom on fail: Discard value
{#HOSTNAME}: State	State of the node - valid values are: NEW, RUNNING, UNHEALTHY, DECOMMISSIONING, DECOMMISSIONED, LOST, REBOOTED, SHUTDOWN.	Dependent item	hadoop.nodemanager.state[{#HOSTNAME}] Preprocessing JSON Path: `$.State` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Version		Dependent item	hadoop.nodemanager.version[{#HOSTNAME}] Preprocessing JSON Path: `$.NodeManagerVersion` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Number of containers		Dependent item	hadoop.nodemanager.numcontainers[{#HOSTNAME}] Preprocessing JSON Path: `$.NumContainers`
{#HOSTNAME}: Used memory		Dependent item	hadoop.nodemanager.usedmemory[{#HOSTNAME}] Preprocessing JSON Path: `$.UsedMemoryMB`
{#HOSTNAME}: Available memory		Dependent item	hadoop.nodemanager.availablememory[{#HOSTNAME}] Preprocessing JSON Path: `$.AvailableMemoryMB`

Trigger prototypes for Node manager discovery

Name	Description	Expression	Severity
{#HOSTNAME}: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m`\|Info	Manual close: Yes
{#HOSTNAME}: Failed to fetch NodeManager API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}],30m)=1`\|Warning	Manual close: Yes Depends on: {#HOSTNAME}: NodeManager has state {ITEM.VALUE}.
{#HOSTNAME}: NodeManager has state {ITEM.VALUE}.	The state is different from normal.	`last(/Hadoop by HTTP/hadoop.nodemanager.state[{#HOSTNAME}])<>"RUNNING"`\|Average

LLD rule Data node discovery

Name Description Type Key and additional info

Data node discovery

HTTP agent

hadoop.datanode.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Data node discovery

Name	Description	Type	Key and additional info
Hadoop DataNode {#HOSTNAME}: Get stats		HTTP agent	hadoop.datanode.get[{#HOSTNAME}]
{#HOSTNAME}: Remaining	Remaining disk space.	Dependent item	hadoop.datanode.remaining[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Used	Used disk space.	Dependent item	hadoop.datanode.dfs_used[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Number of failed volumes	Number of failed storage volumes.	Dependent item	hadoop.datanode.numfailedvolumes[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Threads	The number of JVM threads.	Dependent item	hadoop.datanode.jvm.threads[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Garbage collection time	The JVM garbage collection time in milliseconds.	Dependent item	hadoop.datanode.jvm.gc_time[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: JVM Heap usage	The JVM heap usage in MBytes.	Dependent item	hadoop.datanode.jvm.memheapused[{#HOSTNAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
{#HOSTNAME}: Uptime		Dependent item	hadoop.datanode.uptime[{#HOSTNAME}] Preprocessing JSON Path: `$.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()` Custom multiplier: `0.001`
Hadoop DataNode {#HOSTNAME}: Get raw info		Dependent item	hadoop.datanode.raw_info[{#HOSTNAME}] Preprocessing JSON Path: `$.[?(@.HostName=='{#HOSTNAME}')].first()` ⛔️Custom on fail: Discard value
{#HOSTNAME}: Version	DataNode software version.	Dependent item	hadoop.datanode.version[{#HOSTNAME}] Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Admin state	Administrative state.	Dependent item	hadoop.datanode.admin_state[{#HOSTNAME}] Preprocessing JSON Path: `$.adminState` Discard unchanged with heartbeat: `6h`
{#HOSTNAME}: Oper state	Operational state.	Dependent item	hadoop.datanode.oper_state[{#HOSTNAME}] Preprocessing JSON Path: `$.operState` Discard unchanged with heartbeat: `6h`

Trigger prototypes for Data node discovery

Name	Description	Expression	Severity
{#HOSTNAME}: Service has been restarted	Uptime is less than 10 minutes.	`last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m`\|Info	Manual close: Yes
{#HOSTNAME}: Failed to fetch DataNode API page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}],30m)=1`\|Warning	Manual close: Yes Depends on: {#HOSTNAME}: DataNode has state {ITEM.VALUE}.
{#HOSTNAME}: DataNode has state {ITEM.VALUE}.	The state is different from normal.	`last(/Hadoop by HTTP/hadoop.datanode.oper_state[{#HOSTNAME}])<>"Live"`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_gitlab_http

View README Download JSON

GitLab by HTTP

Overview

This template is designed to monitor GitLab by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template GitLab by HTTP — collects metrics by an HTTP agent from the GitLab /-/metrics endpoint. See https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

GitLab 13.5.3 EE

Configuration

Setup

This template works with self-hosted GitLab instances. Internal service metrics are collected from the GitLab /-/metrics endpoint. To access metrics following two methods are available:

Explicitly allow monitoring instance IP address in gitlab whitelist configuration.
Get token from Gitlab Admin -> Monitoring -> Health check page: http://your.gitlab.address/admin/health_check; Use this token in macro {$GITLAB.HEALTH.TOKEN} as variable path, like: ?token=your_token. Remember to change the macros {$GITLAB.URL}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Gitlab instance version and configuration. See Gitlab's documentation for further information about its metric collection.

Macros used

Name	Description	Default
{$GITLAB.URL}	URL of a GitLab instance.	`http://localhost`
{$GITLAB.HEALTH.TOKEN}	The token path for Gitlab health check. Example `?token=your_token`
{$GITLAB.UNICORN.UTILIZATION.MAX.WARN}	The maximum percentage of Unicorn workers utilization for a trigger expression.	`90`
{$GITLAB.PUMA.UTILIZATION.MAX.WARN}	The maximum percentage of Puma thread utilization for a trigger expression.	`90`
{$GITLAB.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures for a trigger expression.	`2`
{$GITLAB.REDIS.FAIL.MAX.WARN}	The maximum number of Redis client exceptions for a trigger expression.	`2`
{$GITLAB.UNICORN.QUEUE.MAX.WARN}	The maximum number of Unicorn queued requests for a trigger expression.	`1`
{$GITLAB.PUMA.QUEUE.MAX.WARN}	The maximum number of Puma queued requests for a trigger expression.	`1`
{$GITLAB.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors for a trigger expression.	`90`

Items

Name	Description	Type	Key and additional info
GitLab: Get instance metrics		HTTP agent	gitlab.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Prometheus to JSON
GitLab: Instance readiness check	The readiness probe checks whether the GitLab instance is ready to accept traffic via Rails Controllers.	HTTP agent	gitlab.readiness Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"master_check":[{"status":"failed"}]}` JSON Path: `$.master_check[0].status` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `30m`
GitLab: Application server status	Checks whether the application server is running. This probe is used to know if Rails Controllers are not deadlocked due to a multi-threading.	HTTP agent	gitlab.liveness Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"status": "failed"}` JSON Path: `$.status` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `30m`
GitLab: Version	Version of the GitLab instance.	Dependent item	gitlab.deployments.version Preprocessing JSON Path: `$[?(@.name=="deployments")].labels.version.first()` Discard unchanged with heartbeat: `3h`
GitLab: Ruby: First process start time	Minimum UNIX timestamp of ruby processes start time.	Dependent item	gitlab.ruby.processstarttime_seconds.first Preprocessing JSON Path: `$[?(@.name=="ruby_process_start_time_seconds")].value.min()` Discard unchanged with heartbeat: `3h`
GitLab: Ruby: Last process start time	Maximum UNIX timestamp ruby processes start time.	Dependent item	gitlab.ruby.processstarttime_seconds.last Preprocessing JSON Path: `$[?(@.name=="ruby_process_start_time_seconds")].value.max()` Discard unchanged with heartbeat: `3h`
GitLab: User logins, total	Counter of how many users have logged in since GitLab was started or restarted.	Dependent item	gitlab.usersessionlogins_total Preprocessing JSON Path: `$[?(@.name=="user_session_logins_total")].value.first()` ⛔️Custom on fail: Discard value
GitLab: User CAPTCHA logins failed, total	Counter of failed CAPTCHA attempts during login.	Dependent item	gitlab.failedlogincaptcha_total Preprocessing JSON Path: `$[?(@.name=="failed_login_captcha_total")].value.first()` ⛔️Custom on fail: Discard value
GitLab: User CAPTCHA logins, total	Counter of successful CAPTCHA attempts during login.	Dependent item	gitlab.successfullogincaptcha_total Preprocessing JSON Path: `$[?(@.name=="successful_login_captcha_total")].value.first()` ⛔️Custom on fail: Discard value
GitLab: Upload file does not exist	Number of times an upload record could not find its file.	Dependent item	gitlab.uploadfiledoesnotexist Preprocessing JSON Path: `$[?(@.name=="upload_file_does_not_exist")].value.first()` ⛔️Custom on fail: Discard value
GitLab: Pipelines: Processing events, total	Total amount of pipeline processing events.	Dependent item	gitlab.pipeline.processingeventstotal Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
GitLab: Pipelines: Created, total	Counter of pipelines created.	Dependent item	gitlab.pipeline.created_total Preprocessing JSON Path: `$[?(@.name=="pipelines_created_total")].value.sum()` ⛔️Custom on fail: Discard value
GitLab: Pipelines: Auto DevOps pipelines, total	Counter of completed Auto DevOps pipelines.	Dependent item	gitlab.pipeline.autodevopscompleted.total Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
GitLab: Pipelines: Auto DevOps pipelines, failed	Counter of completed Auto DevOps pipelines with status "failed".	Dependent item	gitlab.pipeline.autodevopscompleted_total.failed Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
GitLab: Pipelines: CI/CD creation duration	The sum of the time in seconds it takes to create a CI/CD pipeline.	Dependent item	gitlab.pipeline.pipeline_creation Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
GitLab: Pipelines: Pipelines: CI/CD creation count	The count of the time it takes to create a CI/CD pipeline.	Dependent item	gitlab.pipeline.pipeline_creation.count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
GitLab: Database: Connection pool, busy	Connections to the main database in use where the owner is still alive.	Dependent item	gitlab.database.connectionpoolbusy Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Database: Connection pool, current	Current connections to the main database in the pool.	Dependent item	gitlab.database.connectionpoolconnections Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Database: Connection pool, dead	Connections to the main database in use where the owner is not alive.	Dependent item	gitlab.database.connectionpooldead Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Database: Connection pool, idle	Connections to the main database not in use.	Dependent item	gitlab.database.connectionpoolidle Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Database: Connection pool, size	Total connection to the main database pool capacity.	Dependent item	gitlab.database.connectionpoolsize Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Database: Connection pool, waiting	Threads currently waiting on this queue.	Dependent item	gitlab.database.connectionpoolwaiting Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Redis: Client requests rate, queues	Number of Redis client requests per second. (Instance: queues)	Dependent item	gitlab.redis.client_requests.queues.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Redis: Client requests rate, cache	Number of Redis client requests per second. (Instance: cache)	Dependent item	gitlab.redis.client_requests.cache.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Redis: Client requests rate, shared_state	Number of Redis client requests per second. (Instance: shared_state)	Dependent item	gitlab.redis.clientrequests.sharedstate.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Redis: Client exceptions rate, queues	Number of Redis client exceptions per second. (Instance: queues)	Dependent item	gitlab.redis.client_exceptions.queues.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Redis: Client exceptions rate, cache	Number of Redis client exceptions per second. (Instance: cache)	Dependent item	gitlab.redis.client_exceptions.cache.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Redis: client exceptions rate, shared_state	Number of Redis client exceptions per second. (Instance: shared_state)	Dependent item	gitlab.redis.clientexceptions.sharedstate.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Cache: Misses rate, total	The cache read miss count.	Dependent item	gitlab.cache.misses_total.rate Preprocessing JSON Path: `$[?(@.name=="gitlab_cache_misses_total")].value.sum()` Change per second
GitLab: Cache: Operations rate, total	The count of cache operations.	Dependent item	gitlab.cache.operations_total.rate Preprocessing JSON Path: `$[?(@.name=="gitlab_cache_operations_total")].value.sum()` Change per second
GitLab: Ruby: CPU usage per second	Average CPU time util in seconds.	Dependent item	gitlab.ruby.processcpuseconds.rate Preprocessing JSON Path: `$[?(@.name=="ruby_process_cpu_seconds_total")].value.avg()` ⛔️Custom on fail: Discard value Change per second
GitLab: Ruby: Running_threads	Number of running Ruby threads.	Dependent item	gitlab.ruby.threads_running Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Ruby: File descriptors opened, avg	Average number of opened file descriptors.	Dependent item	gitlab.ruby.file_descriptors.avg Preprocessing JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.avg()`
GitLab: Ruby: File descriptors opened, max	Maximum number of opened file descriptors.	Dependent item	gitlab.ruby.file_descriptors.max Preprocessing JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.max()`
GitLab: Ruby: File descriptors opened, min	Minimum number of opened file descriptors.	Dependent item	gitlab.ruby.file_descriptors.min Preprocessing JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.min()`
GitLab: Ruby: File descriptors, max	Maximum number of open file descriptors per process.	Dependent item	gitlab.ruby.processmaxfds Preprocessing JSON Path: `$[?(@.name=="ruby_process_max_fds")].value.avg()`
GitLab: Ruby: RSS memory, avg	Average RSS Memory usage in bytes.	Dependent item	gitlab.ruby.processresidentmemory_bytes.avg Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Ruby: RSS memory, min	Minimum RSS Memory usage in bytes.	Dependent item	gitlab.ruby.processresidentmemory_bytes.min Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: Ruby: RSS memory, max	Maximum RSS Memory usage in bytes.	Dependent item	gitlab.ruby.processresidentmemory_bytes.max Preprocessing JSON Path: `The text is too long. Please see the template.`
GitLab: HTTP requests rate, total	Number of requests received into the system.	Dependent item	gitlab.http.requests.rate Preprocessing JSON Path: `$[?(@.name=="http_requests_total")].value.sum()` Change per second
GitLab: HTTP requests rate, 5xx	Number of handle failures of requests with HTTP-code 5xx.	Dependent item	gitlab.http.requests.5xx.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: HTTP requests rate, 4xx	Number of handle failures of requests with code 4XX.	Dependent item	gitlab.http.requests.4xx.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
GitLab: Transactions per second	Transactions per second (gitlabtransaction* metrics).	Dependent item	gitlab.transactions.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity
GitLab: Gitlab instance is not able to accept traffic		`last(/GitLab by HTTP/gitlab.readiness)=0`\|High	Depends on: GitLab: Liveness check was failed
GitLab: Liveness check was failed	The application server is not running or Rails Controllers are deadlocked.	`last(/GitLab by HTTP/gitlab.liveness)=0`\|High
GitLab: Version has changed	The GitLab version has changed. Acknowledge to close the problem manually.	`last(/GitLab by HTTP/gitlab.deployments.version,#1)<>last(/GitLab by HTTP/gitlab.deployments.version,#2) and length(last(/GitLab by HTTP/gitlab.deployments.version))>0`\|Info	Manual close: Yes
GitLab: Too many Redis queues client exceptions	"Too many Redis client exceptions during the requests to Redis instance queues."	`min(/GitLab by HTTP/gitlab.redis.client_exceptions.queues.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`\|Warning
GitLab: Too many Redis cache client exceptions	"Too many Redis client exceptions during the requests to Redis instance cache."	`min(/GitLab by HTTP/gitlab.redis.client_exceptions.cache.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`\|Warning
GitLab: Too many Redis shared_state client exceptions	"Too many Redis client exceptions during the requests to Redis instance shared_state."	`min(/GitLab by HTTP/gitlab.redis.client_exceptions.shared_state.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`\|Warning
GitLab: Failed to fetch info data	Zabbix has not received a metrics data for the last 30 minutes	`nodata(/GitLab by HTTP/gitlab.ruby.threads_running,30m)=1`\|Warning	Manual close: Yes Depends on: GitLab: Liveness check was failed
GitLab: Current number of open files is too high		`min(/GitLab by HTTP/gitlab.ruby.file_descriptors.max,5m)/last(/GitLab by HTTP/gitlab.ruby.process_max_fds)*100>{$GITLAB.OPEN.FDS.MAX.WARN}`\|Warning
GitLab: Too many HTTP requests failures	"Too many requests failed on GitLab instance with 5xx HTTP code"	`min(/GitLab by HTTP/gitlab.http.requests.5xx.rate,5m)>{$GITLAB.HTTP.FAIL.MAX.WARN}`\|Warning

LLD rule Unicorn metrics discovery

Name Description Type Key and additional info

Unicorn metrics discovery

DiscoveryUnicorn specific metrics, when Unicorn is used.

HTTP agent

gitlab.unicorn.discovery

Preprocessing

Prometheus to JSON: unicorn_workers
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.

Item prototypes for Unicorn metrics discovery

Name Description Type Key and additional info

GitLab: Unicorn: Workers

The number of Unicorn workers

Dependent item

gitlab.unicorn.unicorn_workers[{#SINGLETON}]

Preprocessing

JSON Path: $[?(@.name=='unicorn_workers')].value.sum()

GitLab: Unicorn: Active connections

The number of active Unicorn connections.

Dependent item

gitlab.unicorn.active_connections[{#SINGLETON}]

Preprocessing

JSON Path: $[?(@.name=='unicorn_active_connections')].value.sum()

GitLab: Unicorn: Queued connections

The number of queued Unicorn connections.

Dependent item

gitlab.unicorn.queued_connections[{#SINGLETON}]

Preprocessing

JSON Path: $[?(@.name=='unicorn_queued_connections')].value.sum()

Trigger prototypes for Unicorn metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
GitLab: Unicorn worker utilization is too high		`min(/GitLab by HTTP/gitlab.unicorn.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.unicorn.unicorn_workers[{#SINGLETON}])*100>{$GITLAB.UNICORN.UTILIZATION.MAX.WARN}`\|Warning
GitLab: Unicorn is queueing requests		`min(/GitLab by HTTP/gitlab.unicorn.queued_connections[{#SINGLETON}],5m)>{$GITLAB.UNICORN.QUEUE.MAX.WARN}`\|Warning

LLD rule Puma metrics discovery

Name Description Type Key and additional info

Puma metrics discovery

Discovery of Puma specific metrics when Puma is used.

HTTP agent

gitlab.puma.discovery

Preprocessing

Prometheus to JSON: puma_workers
JavaScript: The text is too long. Please see the template.

Item prototypes for Puma metrics discovery

Name	Description	Type	Key and additional info
GitLab: Active connections	Number of puma threads processing a request.	Dependent item	gitlab.puma.active_connections[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_active_connections')].value.sum()`
GitLab: Workers	Total number of puma workers.	Dependent item	gitlab.puma.workers[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_workers')].value.sum()`
GitLab: Running workers	The number of booted puma workers.	Dependent item	gitlab.puma.running_workers[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_running_workers')].value.sum()`
GitLab: Stale workers	The number of old puma workers.	Dependent item	gitlab.puma.stale_workers[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_stale_workers')].value.sum()`
GitLab: Running threads	The number of running puma threads.	Dependent item	gitlab.puma.running[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_running')].value.sum()`
GitLab: Queued connections	The number of connections in that puma worker's "todo" set waiting for a worker thread.	Dependent item	gitlab.puma.queued_connections[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_queued_connections')].value.sum()`
GitLab: Pool capacity	The number of requests the puma worker is capable of taking right now.	Dependent item	gitlab.puma.pool_capacity[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_pool_capacity')].value.sum()`
GitLab: Max threads	The maximum number of puma worker threads.	Dependent item	gitlab.puma.max_threads[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_max_threads')].value.sum()`
GitLab: Idle threads	The number of spawned puma threads which are not processing a request.	Dependent item	gitlab.puma.idle_threads[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_idle_threads')].value.sum()`
GitLab: Killer terminations, total	The number of workers terminated by PumaWorkerKiller.	Dependent item	gitlab.puma.killerterminationstotal[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.name=='puma_killer_terminations_total')].value.sum()` ⛔️Custom on fail: Discard value

Trigger prototypes for Puma metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
GitLab: Puma instance thread utilization is too high		`min(/GitLab by HTTP/gitlab.puma.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.puma.max_threads[{#SINGLETON}])*100>{$GITLAB.PUMA.UTILIZATION.MAX.WARN}`\|Warning
GitLab: Puma is queueing requests		`min(/GitLab by HTTP/gitlab.puma.queued_connections[{#SINGLETON}],15m)>{$GITLAB.PUMA.QUEUE.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_generic_java_jmx

View README Download JSON

Generic Java JMX

Overview

Official JMX Template from Zabbix distribution. Could be useful for many Java Applications (JMX).

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Java Applications

Configuration

Setup

Refer to the vendor documentation.

Macros used

Name	Description	Default
{$JMX.NONHEAP.MEM.USAGE.MAX}	A threshold in percent for Non-heap memory utilization trigger.	`85`
{$JMX.NONHEAP.MEM.USAGE.TIME}	The time during which the Non-heap memory utilization may exceed the threshold.	`10m`
{$JMX.HEAP.MEM.USAGE.MAX}	A threshold in percent for Heap memory utilization trigger.	`85`
{$JMX.HEAP.MEM.USAGE.TIME}	The time during which the Heap memory utilization may exceed the threshold.	`10m`
{$JMX.MP.USAGE.MAX}	A threshold in percent for memory pools utilization trigger. Use a context to change the threshold for a specific pool.	`85`
{$JMX.MP.USAGE.TIME}	The time during which the memory pools utilization may exceed the threshold.	`10m`
{$JMX.FILE.DESCRIPTORS.MAX}	A threshold in percent for file descriptors count trigger.	`85`
{$JMX.FILE.DESCRIPTORS.TIME}	The time during which the file descriptors count may exceed the threshold.	`3m`
{$JMX.CPU.LOAD.MAX}	A threshold in percent for CPU utilization trigger.	`85`
{$JMX.CPU.LOAD.TIME}	The time during which the CPU utilization may exceed the threshold.	`5m`
{$JMX.MEM.POOL.NAME.MATCHES}	This macro used in memory pool discovery as a filter.	`Old Gen\|G1\|Perm Gen\|Code Cache\|Tenured Gen`
{$JMX.USER}	JMX username.
{$JMX.PASSWORD}	JMX password.

Items

Name	Description	Type	Key and additional info
ClassLoading: Loaded class count	Displays number of classes that are currently loaded in the Java virtual machine.	JMX agent	jmx["java.lang:type=ClassLoading","LoadedClassCount"] Preprocessing Discard unchanged with heartbeat: `10m`
ClassLoading: Total loaded class count	Displays the total number of classes that have been loaded since the Java virtual machine has started execution.	JMX agent	jmx["java.lang:type=ClassLoading","TotalLoadedClassCount"] Preprocessing Discard unchanged with heartbeat: `10m`
ClassLoading: Unloaded class count	Displays the total number of classes that have been loaded since the Java virtual machine has started execution.	JMX agent	jmx["java.lang:type=ClassLoading","UnloadedClassCount"] Preprocessing Discard unchanged with heartbeat: `10m`
Compilation: Name of the current JIT compiler	Displays the total number of classes unloaded since the Java virtual machine has started execution.	JMX agent	jmx["java.lang:type=Compilation","Name"] Preprocessing Discard unchanged with heartbeat: `30m`
Compilation: Accumulated time spent	Displays the approximate accumulated elapsed time spent in compilation, in seconds.	JMX agent	jmx["java.lang:type=Compilation","TotalCompilationTime"] Preprocessing Custom multiplier: `0.001` Discard unchanged with heartbeat: `10m`
Memory: Heap memory committed	Current heap memory allocated. This amount of memory is guaranteed for the Java virtual machine to use.	JMX agent	jmx["java.lang:type=Memory","HeapMemoryUsage.committed"]
Memory: Heap memory maximum size	Maximum amount of heap that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.	JMX agent	jmx["java.lang:type=Memory","HeapMemoryUsage.max"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Heap memory used	Current memory usage outside the heap.	JMX agent	jmx["java.lang:type=Memory","HeapMemoryUsage.used"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Non-Heap memory committed	Current memory allocated outside the heap. This amount of memory is guaranteed for the Java virtual machine to use.	JMX agent	jmx["java.lang:type=Memory","NonHeapMemoryUsage.committed"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Non-Heap memory maximum size	Maximum amount of non-heap memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.	JMX agent	jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Non-Heap memory used	Current memory usage outside the heap	JMX agent	jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"] Preprocessing Discard unchanged with heartbeat: `10m`
Memory: Object pending finalization count	The approximate number of objects for which finalization is pending.	JMX agent	jmx["java.lang:type=Memory","ObjectPendingFinalizationCount"] Preprocessing Discard unchanged with heartbeat: `10m`
OperatingSystem: File descriptors maximum count	This is the number of file descriptors we can have opened in the same process, as determined by the operating system. You can never have more file descriptors than this number.	JMX agent	jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"] Preprocessing Discard unchanged with heartbeat: `10m`
OperatingSystem: File descriptors opened	This is the number of opened file descriptors at the moment, if this reaches the MaxFileDescriptorCount, the application will throw an IOException: Too many open files. This could mean you are opening file descriptors and never closing them.	JMX agent	jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"]
OperatingSystem: Process CPU Load	ProcessCpuLoad represents the CPU load in this process.	JMX agent	jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"] Preprocessing Custom multiplier: `100`
Runtime: JVM uptime		JMX agent	jmx["java.lang:type=Runtime","Uptime"] Preprocessing Custom multiplier: `0.001`
Runtime: JVM name		JMX agent	jmx["java.lang:type=Runtime","VmName"] Preprocessing Discard unchanged with heartbeat: `30m`
Runtime: JVM version		JMX agent	jmx["java.lang:type=Runtime","VmVersion"] Preprocessing Discard unchanged with heartbeat: `30m`
Threading: Daemon thread count	Number of daemon threads running.	JMX agent	jmx["java.lang:type=Threading","DaemonThreadCount"] Preprocessing Discard unchanged with heartbeat: `10m`
Threading: Peak thread count	Maximum number of threads being executed at the same time since the JVM was started or the peak was reset.	JMX agent	jmx["java.lang:type=Threading","PeakThreadCount"]
Threading: Thread count	The number of threads running at the current moment.	JMX agent	jmx["java.lang:type=Threading","ThreadCount"]
Threading: Total started thread count	The number of threads started since the JVM was launched.	JMX agent	jmx["java.lang:type=Threading","TotalStartedThreadCount"]

Triggers

Name	Expression	Severity
Compilation: {HOST.NAME} uses suboptimal JIT compiler	`find(/Generic Java JMX/jmx["java.lang:type=Compilation","Name"],,"like","Client")=1`\|Info	Manual close: Yes
Memory: Heap memory usage is high	`min(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.used"],{$JMX.HEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])*{$JMX.HEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])>0`\|Warning
Memory: Non-Heap memory usage is high	`min(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"],{$JMX.NONHEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])*{$JMX.NONHEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])>0`\|Warning
OperatingSystem: Opened file descriptor count is high	`min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"],{$JMX.FILE.DESCRIPTORS.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"])*{$JMX.FILE.DESCRIPTORS.MAX}/100)`\|Warning
OperatingSystem: Process CPU Load is high	`min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"],{$JMX.CPU.LOAD.TIME})>{$JMX.CPU.LOAD.MAX}`\|Average
Runtime: JVM is not reachable	`nodata(/Generic Java JMX/jmx["java.lang:type=Runtime","Uptime"],5m)=1`\|Average	Manual close: Yes
Runtime: {HOST.NAME} runs suboptimal VM type	`find(/Generic Java JMX/jmx["java.lang:type=Runtime","VmName"],,"like","Server")<>1`\|Info	Manual close: Yes

LLD rule Garbage collector discovery

Name	Description	Type	Key and additional info
Garbage collector discovery	Garbage collectors metrics discovery.	JMX agent	jmx.discovery["beans","java.lang:name=*,type=GarbageCollector"]

Item prototypes for Garbage collector discovery

Name Description Type Key and additional info

GarbageCollector: {#JMXNAME} number of collections per second

Displays the total number of collections that have occurred per second.

JMX agent

jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionCount"]

Preprocessing

Change per second

GarbageCollector: {#JMXNAME} accumulated time spent in collection

Displays the approximate accumulated collection elapsed time, in seconds.

JMX agent

jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionTime"]

Preprocessing

Custom multiplier: 0.001
Discard unchanged with heartbeat: 10m

LLD rule Memory pool discovery

Name	Description	Type	Key and additional info
Memory pool discovery	Memory pools metrics discovery.	JMX agent	jmx.discovery["beans","java.lang:name=*,type=MemoryPool"]

Item prototypes for Memory pool discovery

Name Description Type Key and additional info

Memory pool: {#JMXNAME} committed

Current memory allocated.

JMX agent

jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.committed"]

Preprocessing

Discard unchanged with heartbeat: 10m

Memory pool: {#JMXNAME} maximum size

Maximum amount of memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.

JMX agent

jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"]

Preprocessing

Discard unchanged with heartbeat: 10m

Memory pool: {#JMXNAME} used

Current memory usage.

JMX agent

jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"]

Trigger prototypes for Memory pool discovery

Name	Description	Expression	Severity	Dependencies and additional info
Memory pool: {#JMXNAME} memory usage is high		`min(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"],{$JMX.MP.USAGE.TIME:"{#JMXNAME}"})>(last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])*{$JMX.MP.USAGE.MAX:"{#JMXNAME}"}/100) and last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])>0`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_exchange_active

View README Download JSON

Microsoft Exchange Server 2016 by Zabbix agent active

Overview

Official Template for Microsoft Exchange Server 2016.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Microsoft Exchange Server 2016 CU18

Configuration

Setup

Metrics are collected by Zabbix agent active.

1. Import the template into Zabbix.

2. Link the imported template to a host with MS Exchange.

Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent active" template.

Macros used

Name	Description	Default
{$MS.EXCHANGE.PERF.INTERVAL}	Update interval for perfcounteren items.	`60`
{$MS.EXCHANGE.DB.FAULTS.TIME}	The time during which the database page faults may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.FAULTS.WARN}	Threshold for database page faults trigger.	`0`
{$MS.EXCHANGE.LOG.STALLS.TIME}	The time during which the log records stalled may exceed the threshold.	`10m`
{$MS.EXCHANGE.LOG.STALLS.WARN}	Threshold for log records stalled trigger.	`100`
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME}	The time during which the active database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}	Threshold for active database read operations latency trigger.	`0.02`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	The time during which the active database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}	Threshold for active database write operations latency trigger.	`0.05`
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME}	The time during which the passive database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}	Threshold for passive database read operations latency trigger.	`0.2`
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	The time during which the passive database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.TIME}	The time during which the RPC requests latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.WARN}	Threshold for RPC requests latency trigger.	`0.05`
{$MS.EXCHANGE.RPC.COUNT.TIME}	The time during which the RPC total requests may exceed the threshold.	`5m`
{$MS.EXCHANGE.RPC.COUNT.WARN}	Threshold for LDAP triggers.	`70`
{$MS.EXCHANGE.LDAP.TIME}	The time during which the LDAP metrics may exceed the threshold.	`5m`
{$MS.EXCHANGE.LDAP.WARN}	Threshold for LDAP triggers.	`0.05`
{$AGENT.TIMEOUT}	Timeout after which agent is considered unavailable.	`5m`

Items

Name	Description	Type	Key and additional info
MS Exchange: Databases total mounted	Shows the number of active database copies on the server.	Zabbix agent (active)	perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing Discard unchanged with heartbeat: `3h`
MS Exchange [Client Access Server]: ActiveSync: ping command pending	Shows the number of ping commands currently pending in the queue.	Zabbix agent (active)	perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: ActiveSync: requests per second	Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load.	Zabbix agent (active)	perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: ActiveSync: sync commands per second	Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder.	Zabbix agent (active)	perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Autodiscover: requests per second	Shows the number of Autodiscover service requests processed each second. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Availability Service: availability requests per second	Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring.	Zabbix agent (active)	perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Outlook Web App: current unique users	Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Outlook Web App: requests per second	Shows the number of requests handled by Outlook Web App per second. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: MSExchangeWS: requests per second	Shows the number of requests processed each second. Determines current user load.	Zabbix agent (active)	perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange: Active agent availability	Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available	Zabbix internal	zabbix[host,active_agent,available]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
MS Exchange: Active checks are not available	Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2`\|High

LLD rule Databases discovery

Name Description Type Key and additional info

Databases discovery

Discovery of Exchange databases.

Zabbix agent (active)

perf_instance.discovery["MSExchange Active Manager"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Databases discovery

Name	Description	Type	Key and additional info
Active Manager [{#INSTANCE}]: Database copy role	Database copy active or passive role.	Zabbix agent (active)	perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing Discard unchanged with heartbeat: `3h`
Information Store [{#INSTANCE}]: Database state	Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing Discard unchanged with heartbeat: `3m`
Information Store [{#INSTANCE}]: Active mailboxes count	Number of active mailboxes in this database.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"]
Information Store [{#INSTANCE}]: Page faults per second	Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high.	Zabbix agent (active)	perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log records stalled	Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second.	Zabbix agent (active)	perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log threads waiting	Indicates the number of threads waiting to complete an update of the database by writing their data to the log.	Zabbix agent (active)	perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests per second	Shows the number of RPC operations per second for each database instance.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests latency	RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Information Store [{#INSTANCE}]: RPC requests total	Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times.	Zabbix agent (active)	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations per second	Shows the number of database read operations.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations latency	Shows the average length of time per database read operation. Should be less than 20 ms on average.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database read operations latency	Shows the average length of time per passive database read operation. Should be less than 200ms on average.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Active database write operations per second	Shows the number of database write operations per second for each attached database instance.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database write operations latency	Shows the average length of time per database write operation. Should be less than 50ms on average.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database write operations latency	Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	Zabbix agent (active)	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`

Trigger prototypes for Databases discovery

Name	Description	Expression
Information Store [{#INSTANCE}]: Page faults is too high	Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN}`\|Average
Information Store [{#INSTANCE}]: Log records stalls is too high	Stalled log records too high. The average value should be less than 10 threads waiting.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN}`\|Average
Information Store [{#INSTANCE}]: RPC Requests latency is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN}`\|Warning
Information Store [{#INSTANCE}]: RPC Requests total count is too high	Should be below 70 at all times.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 20ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 200ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	Should be less than 50ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})`\|Warning

LLD rule Web services discovery

Name	Description	Type	Key and additional info
Web services discovery	Discovery of Exchange web services.	Zabbix agent (active)	perfinstanceen.discovery["Web Service"]

Item prototypes for Web services discovery

Name	Description	Type	Key and additional info
Web Service [{#INSTANCE}]: Current connections	Shows the current number of connections established to the each Web Service.	Zabbix agent (active)	perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}]

LLD rule LDAP discovery

Name	Description	Type	Key and additional info
LDAP discovery	Discovery of domain controller.	Zabbix agent (active)	perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"]

Item prototypes for LDAP discovery

Name Description Type Key and additional info

Domain Controller [{#INSTANCE}]: Read time

Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent (active)

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Domain Controller [{#INSTANCE}]: Search time

Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent (active)

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for LDAP discovery

Name	Description	Expression	Severity	Dependencies and additional info
Domain Controller [{#INSTANCE}]: LDAP read time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average
Domain Controller [{#INSTANCE}]: LDAP search time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_exchange

View README Download JSON

Microsoft Exchange Server 2016 by Zabbix agent

Overview

Official Template for Microsoft Exchange Server 2016.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Microsoft Exchange Server 2016 CU18

Configuration

Setup

Metrics are collected by Zabbix agent.

1. Import the template into Zabbix.

2. Link the imported template to a host with MS Exchange.

Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent" template.

Macros used

Name	Description	Default
{$MS.EXCHANGE.PERF.INTERVAL}	Update interval for perfcounteren items.	`60`
{$MS.EXCHANGE.DB.FAULTS.TIME}	The time during which the database page faults may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.FAULTS.WARN}	Threshold for database page faults trigger.	`0`
{$MS.EXCHANGE.LOG.STALLS.TIME}	The time during which the log records stalled may exceed the threshold.	`10m`
{$MS.EXCHANGE.LOG.STALLS.WARN}	Threshold for log records stalled trigger.	`100`
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME}	The time during which the active database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}	Threshold for active database read operations latency trigger.	`0.02`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	The time during which the active database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}	Threshold for active database write operations latency trigger.	`0.05`
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME}	The time during which the passive database read operations latency may exceed the threshold.	`5m`
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}	Threshold for passive database read operations latency trigger.	`0.2`
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	The time during which the passive database write operations latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.TIME}	The time during which the RPC requests latency may exceed the threshold.	`10m`
{$MS.EXCHANGE.RPC.WARN}	Threshold for RPC requests latency trigger.	`0.05`
{$MS.EXCHANGE.RPC.COUNT.TIME}	The time during which the RPC total requests may exceed the threshold.	`5m`
{$MS.EXCHANGE.RPC.COUNT.WARN}	Threshold for LDAP triggers.	`70`
{$MS.EXCHANGE.LDAP.TIME}	The time during which the LDAP metrics may exceed the threshold.	`5m`
{$MS.EXCHANGE.LDAP.WARN}	Threshold for LDAP triggers.	`0.05`

Items

Name	Description	Type	Key and additional info
MS Exchange: Databases total mounted	Shows the number of active database copies on the server.	Zabbix agent	perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing Discard unchanged with heartbeat: `3h`
MS Exchange [Client Access Server]: ActiveSync: ping command pending	Shows the number of ping commands currently pending in the queue.	Zabbix agent	perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: ActiveSync: requests per second	Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load.	Zabbix agent	perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: ActiveSync: sync commands per second	Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder.	Zabbix agent	perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Autodiscover: requests per second	Shows the number of Autodiscover service requests processed each second. Determines current user load.	Zabbix agent	perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Availability Service: availability requests per second	Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring.	Zabbix agent	perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Outlook Web App: current unique users	Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load.	Zabbix agent	perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: Outlook Web App: requests per second	Shows the number of requests handled by Outlook Web App per second. Determines current user load.	Zabbix agent	perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
MS Exchange [Client Access Server]: MSExchangeWS: requests per second	Shows the number of requests processed each second. Determines current user load.	Zabbix agent	perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}]

LLD rule Databases discovery

Name Description Type Key and additional info

Databases discovery

Discovery of Exchange databases.

Zabbix agent

perf_instance.discovery["MSExchange Active Manager"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Databases discovery

Name	Description	Type	Key and additional info
Active Manager [{#INSTANCE}]: Database copy role	Database copy active or passive role.	Zabbix agent	perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing Discard unchanged with heartbeat: `3h`
Information Store [{#INSTANCE}]: Database state	Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing Discard unchanged with heartbeat: `3m`
Information Store [{#INSTANCE}]: Active mailboxes count	Number of active mailboxes in this database.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"]
Information Store [{#INSTANCE}]: Page faults per second	Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high.	Zabbix agent	perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log records stalled	Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second.	Zabbix agent	perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: Log threads waiting	Indicates the number of threads waiting to complete an update of the database by writing their data to the log.	Zabbix agent	perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests per second	Shows the number of RPC operations per second for each database instance.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Information Store [{#INSTANCE}]: RPC requests latency	RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Information Store [{#INSTANCE}]: RPC requests total	Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times.	Zabbix agent	perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations per second	Shows the number of database read operations.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database read operations latency	Shows the average length of time per database read operation. Should be less than 20 ms on average.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database read operations latency	Shows the average length of time per passive database read operation. Should be less than 200ms on average.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Active database write operations per second	Shows the number of database write operations per second for each attached database instance.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}]
Database Counters [{#INSTANCE}]: Active database write operations latency	Shows the average length of time per database write operation. Should be less than 50ms on average.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`
Database Counters [{#INSTANCE}]: Passive database write operations latency	Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	Zabbix agent	perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing Custom multiplier: `0.001`

Trigger prototypes for Databases discovery

Name	Description	Expression
Information Store [{#INSTANCE}]: Page faults is too high	Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN}`\|Average
Information Store [{#INSTANCE}]: Log records stalls is too high	Stalled log records too high. The average value should be less than 10 threads waiting.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN}`\|Average
Information Store [{#INSTANCE}]: RPC Requests latency is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN}`\|Warning
Information Store [{#INSTANCE}]: RPC Requests total count is too high	Should be below 70 at all times.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 20ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average read time latency is too high	Should be less than 200ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME}	Should be less than 50ms on average.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN}`\|Warning
Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}	Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter.	`avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})`\|Warning

LLD rule Web services discovery

Name	Description	Type	Key and additional info
Web services discovery	Discovery of Exchange web services.	Zabbix agent	perfinstanceen.discovery["Web Service"]

Item prototypes for Web services discovery

Name	Description	Type	Key and additional info
Web Service [{#INSTANCE}]: Current connections	Shows the current number of connections established to the each Web Service.	Zabbix agent	perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}]

LLD rule LDAP discovery

Name	Description	Type	Key and additional info
LDAP discovery	Discovery of domain controller.	Zabbix agent	perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"]

Item prototypes for LDAP discovery

Name Description Type Key and additional info

Domain Controller [{#INSTANCE}]: Read time

Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Domain Controller [{#INSTANCE}]: Search time

Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable.

Zabbix agent

perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}]

Preprocessing

Custom multiplier: 0.001

Trigger prototypes for LDAP discovery

Name	Description	Expression	Severity	Dependencies and additional info
Domain Controller [{#INSTANCE}]: LDAP read time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average
Domain Controller [{#INSTANCE}]: LDAP search time is too high	Should be less than 50ms at all times, with spikes less than 100ms.	`min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_etcd_http

View README Download JSON

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

For the users of etcd version <= 3.4 !

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Setup

Make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to the etcd node. Set the hostname or IP address of the etcd host in the {$ETCD.HOST} macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag.

For more details, see the etcd documentation.

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run: zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.HOST}	The hostname or IP address of the `etcd` API endpoint.	`<SET ETCD HOST>`
{$ETCD.PORT}	The port of the `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPCCODE.NOTMATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Etcd: Service's TCP port state		Simple check	net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Etcd: Get node metrics		HTTP agent	etcd.get_metrics
Etcd: Node health		HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Etcd: Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Etcd: Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Etcd: Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Etcd: Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Etcd: Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Etcd: Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Etcd: Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Etcd: Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Etcd: Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
Etcd: HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Get version		HTTP agent	etcd.get_version
Etcd: Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Etcd: Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
Etcd: DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Etcd: Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Etcd: Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Etcd: Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Etcd: Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Etcd: Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Etcd: CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Etcd: Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Etcd: Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Etcd: Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
Etcd: PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Etcd: Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Etcd: Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Etcd: Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity
Etcd: Service is unavailable		`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0`\|Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`\|Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`\|Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`\|Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`\|Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`\|Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`\|Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`\|Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`\|Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`\|Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`\|Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`\|Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

Etcd: RPCs completed with code {#GRPC.CODE}

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}		`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`\|Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd: Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_envoy_proxy_http

View README Download JSON

Envoy Proxy by HTTP

Overview

The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Envoy Proxy by HTTP - collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Envoy Proxy 1.20.2

Configuration

Setup

Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview

Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.

Macros used

Name	Description	Default
{$ENVOY.URL}	Instance URL.	`http://localhost:9901`
{$ENVOY.METRICS.PATH}	The path Zabbix will scrape metrics in prometheus format from.	`/stats/prometheus`
{$ENVOY.CERT.MIN}	Minimum number of days before certificate expiration used for trigger expression.	`7`

Items

Name	Description	Type	Key and additional info
Envoy Proxy: Get node metrics	Get server metrics.	HTTP agent	envoy.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Envoy Proxy: Server state	State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS).	Dependent item	envoy.server.state Preprocessing Prometheus pattern: `VALUE(envoy_server_state)` Discard unchanged with heartbeat: `3h`
Envoy Proxy: Server live	1 if the server is not currently draining, 0 otherwise.	Dependent item	envoy.server.live Preprocessing Prometheus pattern: `VALUE(envoy_server_live)` Discard unchanged with heartbeat: `3h`
Envoy Proxy: Uptime	Current server uptime in seconds.	Dependent item	envoy.server.uptime Preprocessing Prometheus pattern: `VALUE(envoy_server_uptime)` ⛔️Custom on fail: Discard value
Envoy Proxy: Certificate expiration, day before	Number of days until the next certificate being managed will expire.	Dependent item	envoy.server.daysuntilfirstcertexpiring Preprocessing Prometheus pattern: `VALUE(envoy_server_days_until_first_cert_expiring)`
Envoy Proxy: Server concurrency	Number of worker threads.	Dependent item	envoy.server.concurrency Preprocessing Prometheus pattern: `VALUE(envoy_server_concurrency)`
Envoy Proxy: Memory allocated	Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart.	Dependent item	envoy.server.memory_allocated Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_allocated)`
Envoy Proxy: Memory heap size	Current reserved heap size in bytes. New Envoy process heap size on hot restart.	Dependent item	envoy.server.memoryheapsize Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_heap_size)`
Envoy Proxy: Memory physical size	Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart.	Dependent item	envoy.server.memoryphysicalsize Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_physical_size)`
Envoy Proxy: Filesystem, flushed by timer rate	Total number of times internal flush buffers are written to a file due to flush timeout per second.	Dependent item	envoy.filesystem.flushedbytimer.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_flushed_by_timer)` Change per second
Envoy Proxy: Filesystem, write completed rate	Total number of times a file was written per second.	Dependent item	envoy.filesystem.write_completed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_write_completed)` Change per second
Envoy Proxy: Filesystem, write failed rate	Total number of times an error occurred during a file write operation per second.	Dependent item	envoy.filesystem.write_failed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_write_failed)` Change per second
Envoy Proxy: Filesystem, reopen failed rate	Total number of times a file was failed to be opened per second.	Dependent item	envoy.filesystem.reopen_failed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_reopen_failed)` Change per second
Envoy Proxy: Connections, total	Total connections of both new and old Envoy processes.	Dependent item	envoy.server.total_connections Preprocessing Prometheus pattern: `VALUE(envoy_server_total_connections)`
Envoy Proxy: Connections, parent	Total connections of the old Envoy process on hot restart.	Dependent item	envoy.server.parent_connections Preprocessing Prometheus pattern: `VALUE(envoy_server_parent_connections)`
Envoy Proxy: Clusters, warming	Number of currently warming (not active) clusters.	Dependent item	envoy.clustermanager.warmingclusters Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_warming_clusters)`
Envoy Proxy: Clusters, active	Number of currently active (warmed) clusters.	Dependent item	envoy.clustermanager.activeclusters Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_active_clusters)`
Envoy Proxy: Clusters, added rate	Total clusters added (either via static config or CDS) per second.	Dependent item	envoy.clustermanager.clusteradded.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_added)` Change per second
Envoy Proxy: Clusters, modified rate	Total clusters modified (via CDS) per second.	Dependent item	envoy.clustermanager.clustermodified.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_modified)` Change per second
Envoy Proxy: Clusters, removed rate	Total clusters removed (via CDS) per second.	Dependent item	envoy.clustermanager.clusterremoved.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_removed)` Change per second
Envoy Proxy: Clusters, updates rate	Total cluster updates per second.	Dependent item	envoy.clustermanager.clusterupdated.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_updated)` Change per second
Envoy Proxy: Listeners, active	Number of currently active listeners.	Dependent item	envoy.listenermanager.totallisteners_active Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_active)`
Envoy Proxy: Listeners, draining	Number of currently draining listeners.	Dependent item	envoy.listenermanager.totallisteners_draining Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_draining)`
Envoy Proxy: Listener, warming	Number of currently warming listeners.	Dependent item	envoy.listenermanager.totallisteners_warming Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_warming)`
Envoy Proxy: Listener manager, initialized	A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers.	Dependent item	envoy.listenermanager.workersstarted Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_workers_started)` Discard unchanged with heartbeat: `3h`
Envoy Proxy: Listeners, create failure	Total failed listener object additions to workers per second.	Dependent item	envoy.listenermanager.listenercreate_failure.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_create_failure)` Change per second
Envoy Proxy: Listeners, create success	Total listener objects successfully added to workers per second.	Dependent item	envoy.listenermanager.listenercreate_success.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_create_success)` Change per second
Envoy Proxy: Listeners, added	Total listeners added (either via static config or LDS) per second.	Dependent item	envoy.listenermanager.listeneradded.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_added)` Change per second
Envoy Proxy: Listeners, stopped	Total listeners stopped per second.	Dependent item	envoy.listenermanager.listenerstopped.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_stopped)` Change per second

Triggers

Name	Description	Expression	Severity
Envoy Proxy: Server state is not live		`last(/Envoy Proxy by HTTP/envoy.server.state) > 0`\|Average
Envoy Proxy: Service has been restarted	Uptime is less than 10 minutes.	`last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m`\|Info	Manual close: Yes
Envoy Proxy: Failed to fetch metrics data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1`\|Warning	Manual close: Yes
Envoy Proxy: SSL certificate expires soon	Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire.	`last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN}`\|Warning

LLD rule Cluster metrics discovery

Name Description Type Key and additional info

Cluster metrics discovery

Dependent item

envoy.lld.cluster

Preprocessing

Prometheus to JSON: envoy_cluster_membership_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster metrics discovery

Name	Description	Type	Key and additional info
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total	Current cluster membership total.	Dependent item	envoy.cluster.membershiptotal["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy	Current cluster healthy total (inclusive of both health checking and outlier detection).	Dependent item	envoy.cluster.membershiphealthy["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy	Current cluster unhealthy.	Calculated	envoy.cluster.membershipunhealthy["{#CLUSTERNAME}"]
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded	Current cluster degraded total.	Dependent item	envoy.cluster.membershipdegraded["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total	Current cluster total connections.	Dependent item	envoy.cluster.upstreamcxtotal["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active	Current cluster total active connections.	Dependent item	envoy.cluster.upstreamcxactive["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate	Current cluster request total per second.	Dependent item	envoy.cluster.upstreamrqtotal.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate	Current cluster requests that timed out waiting for a response per second.	Dependent item	envoy.cluster.upstreamrqtimeout.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate	Total upstream requests completed per second.	Dependent item	envoy.cluster.upstreamrqcompleted.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq2x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq3x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq4x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstreamrq5x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending	Total active requests pending a connection pool connection.	Dependent item	envoy.cluster.upstreamrqpendingactive["{#CLUSTERNAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active	Total active requests.	Dependent item	envoy.cluster.upstreamrqactive["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate	Total sent connection bytes per second.	Dependent item	envoy.cluster.upstreamcxtxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate	Total received connection bytes per second.	Dependent item	envoy.cluster.upstreamcxrxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second

Trigger prototypes for Cluster metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Envoy Proxy: There are unhealthy clusters		`last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0`\|Average

LLD rule Listeners metrics discovery

Name Description Type Key and additional info

Listeners metrics discovery

Dependent item

envoy.lld.listeners

Preprocessing

Prometheus to JSON: envoy_listener_downstream_cx_active
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Listeners metrics discovery

Name Description Type Key and additional info

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active

Total active connections.

Dependent item

envoy.listener.downstreamcxactive["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate

Total connections per second.

Dependent item

envoy.listener.downstreamcxtotal.rate["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
Change per second

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing

Sockets currently undergoing listener filter processing.

Dependent item

envoy.listener.downstreamprecxactive["{#LISTENERADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.

LLD rule HTTP metrics discovery

Name Description Type Key and additional info

HTTP metrics discovery

Dependent item

envoy.lld.http

Preprocessing

Prometheus to JSON: envoy_http_downstream_rq_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP metrics discovery

Name	Description	Type	Key and additional info
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate	Total active connections per second.	Dependent item	envoy.http.downstreamrqtotal.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active	Total active requests.	Dependent item	envoy.http.downstreamrqactive["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate	Total requests closed due to a timeout on the request path per second.	Dependent item	envoy.http.downstreamrqtimeout["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate	Total connections per second.	Dependent item	envoy.http.downstreamcxtotal["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active	Total active connections.	Dependent item	envoy.http.downstreamcxactive["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate	Total bytes received per second.	Dependent item	envoy.http.downstreamcxrxbytestotal.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate	Total bytes sent per second.	Dependent item	envoy.http.downstreamcxtxbytestota.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_elasticsearch_http

View README Download JSON

Elasticsearch Cluster by HTTP

Overview

The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Elasticsearch 6.5, 7.6

Configuration

Setup

Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST} macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Macros used

Name	Description	Default
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.
{$ELASTICSEARCH.HOST}	The hostname or IP address of the Elasticsearch host.	`<SET ELASTICSEARCH HOST>`
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
ES: Service status	Checks if the service is running and accepting TCP connections.	Simple check	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
ES: Service response time	Checks performance of the TCP service.	Simple check	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]
ES: Get cluster health	Returns the health status of a cluster.	HTTP agent	es.cluster.get_health
ES: Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	Dependent item	es.cluster.status Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
ES: Number of nodes	The number of nodes within the cluster.	Dependent item	es.cluster.numberofnodes Preprocessing JSON Path: `$.number_of_nodes` Discard unchanged with heartbeat: `1h`
ES: Number of data nodes	The number of nodes that are dedicated to data nodes.	Dependent item	es.cluster.numberofdata_nodes Preprocessing JSON Path: `$.number_of_data_nodes` Discard unchanged with heartbeat: `1h`
ES: Number of relocating shards	The number of shards that are under relocation.	Dependent item	es.cluster.relocating_shards Preprocessing JSON Path: `$.relocating_shards`
ES: Number of initializing shards	The number of shards that are under initialization.	Dependent item	es.cluster.initializing_shards Preprocessing JSON Path: `$.initializing_shards`
ES: Number of unassigned shards	The number of shards that are not allocated.	Dependent item	es.cluster.unassigned_shards Preprocessing JSON Path: `$.unassigned_shards`
ES: Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	Dependent item	es.cluster.delayedunassignedshards Preprocessing JSON Path: `$.delayed_unassigned_shards`
ES: Number of pending tasks	The number of cluster-level changes that have not yet been executed.	Dependent item	es.cluster.numberofpending_tasks Preprocessing JSON Path: `$.number_of_pending_tasks`
ES: Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	Dependent item	es.cluster.taskmaxwaitinginqueue Preprocessing JSON Path: `$.task_max_waiting_in_queue_millis` Custom multiplier: `0.001`
ES: Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	Dependent item	es.cluster.inactiveshardspercentasnumber Preprocessing JSON Path: `$.active_shards_percent_as_number` JavaScript: `The text is too long. Please see the template.`
ES: Get cluster stats	Returns cluster statistics.	HTTP agent	es.cluster.get_stats
ES: Cluster uptime	Uptime duration in seconds since JVM has last started.	Dependent item	es.nodes.jvm.max_uptime Preprocessing JSON Path: `$.nodes.jvm.max_uptime_in_millis` Custom multiplier: `0.001`
ES: Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	Dependent item	es.indices.docs.count Preprocessing JSON Path: `$.indices.docs.count` Discard unchanged with heartbeat: `1h`
ES: Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	Dependent item	es.indices.count Preprocessing JSON Path: `$.indices.count` Discard unchanged with heartbeat: `1h`
ES: Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	Dependent item	es.nodes.fs.totalinbytes Preprocessing JSON Path: `$.nodes.fs.total_in_bytes` Discard unchanged with heartbeat: `1h`
ES: Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.freeinbyes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	Dependent item	es.nodes.fs.availableinbytes Preprocessing JSON Path: `$.nodes.fs.available_in_bytes` Discard unchanged with heartbeat: `1h`
ES: Nodes with the data role	The number of selected nodes with the data role.	Dependent item	es.nodes.count.data Preprocessing JSON Path: `$.nodes.count.data` Discard unchanged with heartbeat: `1h`
ES: Nodes with the ingest role	The number of selected nodes with the ingest role.	Dependent item	es.nodes.count.ingest Preprocessing JSON Path: `$.nodes.count.ingest` Discard unchanged with heartbeat: `1h`
ES: Nodes with the master role	The number of selected nodes with the master role.	Dependent item	es.nodes.count.master Preprocessing JSON Path: `$.nodes.count.master` Discard unchanged with heartbeat: `1h`
ES: Get nodes stats	Returns cluster nodes statistics.	HTTP agent	es.nodes.get_stats

Triggers

Name	Description	Expression	Severity
ES: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0`\|Average	Manual close: Yes
ES: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: ES: Service is down
ES: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`\|Average
ES: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`\|High
ES: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`\|High
ES: The number of nodes within the cluster has decreased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`\|Info	Manual close: Yes
ES: The number of nodes within the cluster has increased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`\|Info	Manual close: Yes
ES: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`\|Average
ES: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`\|Average
ES: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`\|Info	Manual close: Yes
ES: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`\|High
ES: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`\|Disaster

LLD rule Cluster nodes discovery

Name Description Type Key and additional info

Cluster nodes discovery

Discovery ES cluster nodes.

HTTP agent

es.nodes.discovery

Preprocessing

JSON Path: $.nodes.[*]
Discard unchanged with heartbeat: 1d

Item prototypes for Cluster nodes discovery

Name	Description	Type	Key and additional info
ES {#ES.NODE}: Get data	Returns cluster nodes statistics.	Dependent item	es.node.get.data[{#ES.NODE}] Preprocessing JSON Path: `$..[?(@.name=='{#ES.NODE}')].first()`
ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	Dependent item	es.node.fs.total.totalinbytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.total_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.freeinbytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	Dependent item	es.node.fs.total.availableinbytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.available_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	Dependent item	es.node.jvm.uptime[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.uptime_in_millis.first()` Custom multiplier: `0.001`
ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heapmaxin_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_max_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	Dependent item	es.node.jvm.mem.heapusedin_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	Dependent item	es.node.jvm.mem.heapusedpercent[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_percent.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heapcommittedin_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_committed_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	Dependent item	es.node.http.current_open[{#ES.NODE}] Preprocessing JSON Path: `$..http.current_open.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	Dependent item	es.node.http.opened.rate[{#ES.NODE}] Preprocessing JSON Path: `$..http.total_opened.first()` Change per second
ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	Dependent item	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	Dependent item	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.recovery.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	Dependent item	es.node.indices.merges.totalthrottledtime[{#ES.NODE}] Preprocessing JSON Path: `$..indices.merges.total_throttled_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Rate of queries	The number of query operations per second.	Dependent item	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Change per second
ES {#ES.NODE}: Total number of query	The total number of query operations.	Dependent item	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	Dependent item	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	Dependent item	es.node.indices.search.querytimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.query_latency[{#ES.NODE}]
ES {#ES.NODE}: Current query operations	The number of query operations currently running.	Dependent item	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_current.first()`
ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	Dependent item	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Change per second
ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	Dependent item	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	Dependent item	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	Dependent item	es.node.indices.search.fetchtimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.fetch_latency[{#ES.NODE}]
ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	Dependent item	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_current.first()`
ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	Dependent item	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.completed.first()` Change per second
ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	Dependent item	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.active.first()`
ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	Dependent item	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.queue.first()`
ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	Dependent item	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.rejected.first()` Change per second
ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	Dependent item	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.completed.first()` Change per second
ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	Dependent item	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.active.first()`
ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	Dependent item	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.queue.first()`
ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	Dependent item	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.rejected.first()` Change per second
ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.completed.first()` Change per second
ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.active.first()`
ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.queue.first()`
ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.rejected.first()` Change per second
ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	Dependent item	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	Dependent item	es.node.indices.indexing.indextimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available indextotal and indextimeinmillis metrics.	Calculated	es.node.indices.indexing.index_latency[{#ES.NODE}]
ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	Dependent item	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_current.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	Dependent item	es.node.indices.flush.total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	Dependent item	es.node.indices.flush.totaltimein_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.totaltimein_millis metrics.	Calculated	es.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	Dependent item	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total.first()` Change per second
ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	Dependent item	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total_time_in_millis.first()` Custom multiplier: `0.001` Simple change

Trigger prototypes for Cluster nodes discovery

Name	Description	Expression	Severity
ES {#ES.NODE}: has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`\|Info	Manual close: Yes
ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`\|Warning	Depends on: ES {#ES.NODE}: Percent of JVM heap in use is critical
ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`\|High
ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`\|Warning
ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`\|Warning
ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`\|Warning
ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`\|Warning
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`\|Warning
ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`\|Warning
ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_docker

View README Download JSON

Docker by Zabbix agent 2

Overview

The template to monitor Docker engine by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Docker by Zabbix agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Docker 23.0.3

Configuration

Setup

Setup and configure Zabbix agent 2 compiled with the Docker monitoring plugin. The user by which the Zabbix agent 2 is running should have access permissions to the Docker socket.

Test availability: zabbix_get -s docker-host -k docker.info

Macros used

Name	Description	Default
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES}	Filter of discoverable containers.	`.*`
{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES}	Filter to exclude discovered containers.	`CHANGE_IF_NEEDED`
{$DOCKER.LLD.FILTER.IMAGE.MATCHES}	Filter of discoverable images.	`.*`
{$DOCKER.LLD.FILTER.IMAGE.NOT_MATCHES}	Filter to exclude discovered images.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Docker: Ping		Zabbix agent	docker.ping Preprocessing Discard unchanged with heartbeat: `10m`
Docker: Get info		Zabbix agent	docker.info
Docker: Get containers		Zabbix agent	docker.containers
Docker: Get images		Zabbix agent	docker.images
Docker: Get data_usage		Zabbix agent	docker.data_usage
Docker: Containers total	Total number of containers on this host.	Dependent item	docker.containers.total Preprocessing JSON Path: `$.Containers`
Docker: Containers running	Total number of containers running on this host.	Dependent item	docker.containers.running Preprocessing JSON Path: `$.ContainersRunning`
Docker: Containers stopped	Total number of containers stopped on this host.	Dependent item	docker.containers.stopped Preprocessing JSON Path: `$.ContainersStopped`
Docker: Containers paused	Total number of containers paused on this host.	Dependent item	docker.containers.paused Preprocessing JSON Path: `$.ContainersPaused`
Docker: Images total	Number of images with intermediate image layers.	Dependent item	docker.images.total Preprocessing JSON Path: `$.Images`
Docker: Storage driver	Docker storage driver. https://docs.docker.com/storage/storagedriver/	Dependent item	docker.driver Preprocessing JSON Path: `$.Driver` Discard unchanged with heartbeat: `1d`
Docker: Memory limit enabled		Dependent item	docker.mem_limit.enabled Preprocessing JSON Path: `$.MemoryLimit` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Swap limit enabled		Dependent item	docker.swap_limit.enabled Preprocessing JSON Path: `$.SwapLimit` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Kernel memory enabled		Dependent item	docker.kernel_mem.enabled Preprocessing JSON Path: `$.KernelMemory` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Kernel memory TCP enabled		Dependent item	docker.kernelmemtcp.enabled Preprocessing JSON Path: `$.KernelMemoryTCP` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: CPU CFS Period enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpucfsperiod.enabled Preprocessing JSON Path: `$.CpuCfsPeriod` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: CPU CFS Quota enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpucfsquota.enabled Preprocessing JSON Path: `$.CpuCfsQuota` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: CPU Shares enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpu_shares.enabled Preprocessing JSON Path: `$.CPUShares` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: CPU Set enabled	https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler	Dependent item	docker.cpu_set.enabled Preprocessing JSON Path: `$.CPUSet` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Pids limit enabled		Dependent item	docker.pids_limit.enabled Preprocessing JSON Path: `$.PidsLimit` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: IPv4 Forwarding enabled		Dependent item	docker.ipv4_forwarding.enabled Preprocessing JSON Path: `$.IPv4Forwarding` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Debug enabled		Dependent item	docker.debug.enabled Preprocessing JSON Path: `$.Debug` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Nfd	Number of used File Descriptors.	Dependent item	docker.nfd Preprocessing JSON Path: `$.NFd`
Docker: OomKill disabled		Dependent item	docker.oomkill.disabled Preprocessing JSON Path: `$.OomKillDisable` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Goroutines	Number of goroutines.	Dependent item	docker.goroutines Preprocessing JSON Path: `$.NGoroutines`
Docker: Logging driver		Dependent item	docker.logging_driver Preprocessing JSON Path: `$.LoggingDriver` Discard unchanged with heartbeat: `1d`
Docker: Cgroup driver		Dependent item	docker.cgroup_driver Preprocessing JSON Path: `$.CgroupDriver` Discard unchanged with heartbeat: `1d`
Docker: NEvents listener		Dependent item	docker.nevents_listener Preprocessing JSON Path: `$.NEventsListener`
Docker: Kernel version		Dependent item	docker.kernel_version Preprocessing JSON Path: `$.KernelVersion` Discard unchanged with heartbeat: `1d`
Docker: Operating system		Dependent item	docker.operating_system Preprocessing JSON Path: `$.OperatingSystem` Discard unchanged with heartbeat: `1d`
Docker: OS type		Dependent item	docker.os_type Preprocessing JSON Path: `$.OSType` Discard unchanged with heartbeat: `1d`
Docker: Architecture		Dependent item	docker.architecture Preprocessing JSON Path: `$.Architecture` Discard unchanged with heartbeat: `1d`
Docker: NCPU		Dependent item	docker.ncpu Preprocessing JSON Path: `$.NCPU`
Docker: Memory total		Dependent item	docker.mem.total Preprocessing JSON Path: `$.MemTotal`
Docker: Docker root dir		Dependent item	docker.root_dir Preprocessing JSON Path: `$.DockerRootDir` Discard unchanged with heartbeat: `1d`
Docker: Name		Dependent item	docker.name Preprocessing JSON Path: `$.Name`
Docker: Server version		Dependent item	docker.server_version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Docker: Default runtime		Dependent item	docker.default_runtime Preprocessing JSON Path: `$.DefaultRuntime` Discard unchanged with heartbeat: `1d`
Docker: Live restore enabled		Dependent item	docker.live_restore.enabled Preprocessing JSON Path: `$.LiveRestoreEnabled` Boolean to decimal Discard unchanged with heartbeat: `1d`
Docker: Layers size		Dependent item	docker.layers_size Preprocessing JSON Path: `$.LayersSize`
Docker: Images size		Dependent item	docker.images_size Preprocessing JSON Path: `$.Images[*].Size.sum()`
Docker: Containers size		Dependent item	docker.containers_size Preprocessing JSON Path: `$.Containers[*].SizeRw.sum()`
Docker: Volumes size		Dependent item	docker.volumes_size Preprocessing JSON Path: `$.Volumes[*].UsageData.Size.sum()`
Docker: Images available	Number of top-level images.	Dependent item	docker.images.top_level Preprocessing JSON Path: `$.length()`

Triggers

Name	Description	Expression	Severity
Docker: Service is down		`last(/Docker by Zabbix agent 2/docker.ping)=0`\|Average	Manual close: Yes
Docker: Failed to fetch info data	Zabbix has not received data for items for the last 30 minutes.	`nodata(/Docker by Zabbix agent 2/docker.name,30m)=1`\|Warning	Manual close: Yes Depends on: Docker: Service is down
Docker: Version has changed	Docker version has changed. Acknowledge to close the problem manually.	`last(/Docker by Zabbix agent 2/docker.server_version,#1)<>last(/Docker by Zabbix agent 2/docker.server_version,#2) and length(last(/Docker by Zabbix agent 2/docker.server_version))>0`\|Info	Manual close: Yes

LLD rule Images discovery

Name	Description	Type	Key and additional info
Images discovery	Discovery of images metrics.	Zabbix agent	docker.images.discovery

Item prototypes for Images discovery

Name Description Type Key and additional info

Image {#NAME}: Created

Dependent item

docker.image.created["{#ID}"]

Preprocessing

JSON Path: $[?(@.Id == "{#ID}")].Created.first()
Discard unchanged with heartbeat: 1d

Image {#NAME}: Size

Dependent item

docker.image.size["{#ID}"]

Preprocessing

JSON Path: $[?(@.Id == "{#ID}")].Size.first()

LLD rule Containers discovery

Name

Description

Type

Key and additional info

Containers discovery

Discovery of containers metrics.

Parameter:

true - Returns all containers

false - Returns only running containers

Zabbix agent

docker.containers.discovery[false]

Item prototypes for Containers discovery

Name	Description	Type	Key and additional info
Container {#NAME}: Get stats	Get container stats based on resource usage.	Zabbix agent	docker.container_stats["{#NAME}"]
Container {#NAME}: CPU total usage per second		Dependent item	docker.containerstats.cpuusage.total.rate["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.total_usage` Change per second Custom multiplier: `1.0E-9`
Container {#NAME}: CPU percent usage		Dependent item	docker.containerstats.cpupct_usage["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.percent_usage`
Container {#NAME}: CPU kernelmode usage per second		Dependent item	docker.containerstats.cpuusage.kernel.rate["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.usage_in_kernelmode` Change per second Custom multiplier: `1.0E-9`
Container {#NAME}: CPU usermode usage per second		Dependent item	docker.containerstats.cpuusage.user.rate["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.cpu_usage.usage_in_usermode` Change per second Custom multiplier: `1.0E-9`
Container {#NAME}: Online CPUs		Dependent item	docker.containerstats.onlinecpus["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.online_cpus`
Container {#NAME}: Throttling periods	Number of periods with throttling active.	Dependent item	docker.containerstats.cpuusage.throttling_periods["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.throttling_data.periods`
Container {#NAME}: Throttled periods	Number of periods when the container hits its throttling limit.	Dependent item	docker.containerstats.cpuusage.throttled_periods["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.throttling_data.throttled_periods`
Container {#NAME}: Throttled time	Aggregate time the container was throttled for in nanoseconds.	Dependent item	docker.containerstats.cpuusage.throttled_time["{#NAME}"] Preprocessing JSON Path: `$.cpu_stats.throttling_data.throttled_time` Custom multiplier: `1.0E-9`
Container {#NAME}: Memory usage		Dependent item	docker.container_stats.memory.usage["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.usage`
Container {#NAME}: Memory maximum usage		Dependent item	docker.containerstats.memory.maxusage["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.max_usage`
Container {#NAME}: Memory commit bytes		Dependent item	docker.containerstats.memory.commitbytes["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.commitbytes`
Container {#NAME}: Memory commit peak bytes		Dependent item	docker.containerstats.memory.commitpeak_bytes["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.commitpeakbytes`
Container {#NAME}: Memory private working set		Dependent item	docker.containerstats.memory.privateworking_set["{#NAME}"] Preprocessing JSON Path: `$.memory_stats.privateworkingset`
Container {#NAME}: Current PIDs count	Current number of PIDs the container has created.	Dependent item	docker.containerstats.pidsstats.current["{#NAME}"] Preprocessing JSON Path: `$.pids_stats.current`
Container {#NAME}: Networks bytes received per second		Dependent item	docker.networks.rx_bytes["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_bytes.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks packets received per second		Dependent item	docker.networks.rx_packets["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_packets.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks errors received per second		Dependent item	docker.networks.rx_errors["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_errors.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks incoming packets dropped per second		Dependent item	docker.networks.rx_dropped["{#NAME}"] Preprocessing JSON Path: `$.networks[*].rx_dropped.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks bytes sent per second		Dependent item	docker.networks.tx_bytes["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_bytes.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks packets sent per second		Dependent item	docker.networks.tx_packets["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_packets.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks errors sent per second		Dependent item	docker.networks.tx_errors["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_errors.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Networks outgoing packets dropped per second		Dependent item	docker.networks.tx_dropped["{#NAME}"] Preprocessing JSON Path: `$.networks[*].tx_dropped.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
Container {#NAME}: Get info	Return low-level information about a container.	Zabbix agent	docker.container_info["{#NAME}",full]
Container {#NAME}: Created		Dependent item	docker.container_info.created["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Image		Dependent item	docker.container_info.image["{#NAME}"] Preprocessing JSON Path: `$[?(@.Names[0] == "{#NAME}")].Image.first()` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Restart count		Dependent item	docker.containerinfo.restartcount["{#NAME}"] Preprocessing JSON Path: `$.RestartCount`
Container {#NAME}: Status		Dependent item	docker.container_info.state.status["{#NAME}"] Preprocessing JSON Path: `$.State.Status` Discard unchanged with heartbeat: `1h`
Container {#NAME}: Health status	Container's `HEALTHCHECK`.	Dependent item	docker.container_info.state.health["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` In range: `1 -> 4` ⛔️Custom on fail: Set value to: `4`
Container {#NAME}: Health failing streak		Dependent item	docker.container_info.state.health.failing["{#NAME}"] Preprocessing JSON Path: `$.State.Health.FailingStreak` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Container {#NAME}: Running		Dependent item	docker.container_info.state.running["{#NAME}"] Preprocessing JSON Path: `$.State.Running` Boolean to decimal
Container {#NAME}: Paused		Dependent item	docker.container_info.state.paused["{#NAME}"] Preprocessing JSON Path: `$.State.Paused` Boolean to decimal
Container {#NAME}: Restarting		Dependent item	docker.container_info.state.restarting["{#NAME}"] Preprocessing JSON Path: `$.State.Restarting` Boolean to decimal
Container {#NAME}: OOMKilled		Dependent item	docker.container_info.state.oomkilled["{#NAME}"] Preprocessing JSON Path: `$.State.OOMKilled` Boolean to decimal
Container {#NAME}: Dead		Dependent item	docker.container_info.state.dead["{#NAME}"] Preprocessing JSON Path: `$.State.Dead` Boolean to decimal
Container {#NAME}: Pid		Dependent item	docker.container_info.state.pid["{#NAME}"] Preprocessing JSON Path: `$.State.Pid` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Exit code		Dependent item	docker.container_info.state.exitcode["{#NAME}"] Preprocessing JSON Path: `$.State.ExitCode` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Error		Dependent item	docker.container_info.state.error["{#NAME}"] Preprocessing JSON Path: `$.State.Error` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Started at		Dependent item	docker.container_info.started["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`
Container {#NAME}: Finished at	Time at which the container last terminated.	Dependent item	docker.container_info.finished["{#NAME}"] Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`

Trigger prototypes for Containers discovery

Name	Description	Expression	Severity
Container {#NAME}: Health state container is unhealthy	Container health state is unhealthy.	`count(/Docker by Zabbix agent 2/docker.container_info.state.health["{#NAME}"],2m,,2)>=2`\|High
Container {#NAME}: Container has been stopped with error code		`last(/Docker by Zabbix agent 2/docker.container_info.state.exitcode["{#NAME}"])>0 and last(/Docker by Zabbix agent 2/docker.container_info.state.running["{#NAME}"])=0`\|Average	Manual close: Yes
Container {#NAME}: An error has occurred in the container	Container {#NAME} has an error. Acknowledge to close the problem manually.	`last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#1)<>last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#2) and length(last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"]))>0`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_controlm_http

View README Download JSON

Control-M enterprise manager by HTTP

Overview

The template to monitor Control-M by Zabbix that work without any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Control-M 9.21.0

Configuration

Setup

This template is intended to be used on Control-M Enterprise Manager instances.

It monitors:

active SLA services;
discovers Control-M servers using Low Level Discovery;
creates host prototypes for discovered servers with the Control-M server by HTTP template.

To use this template, you must set macros: {$API.TOKEN} and {$API.URI.ENDPOINT}.

To access the API token, use one of the following Control-M interfaces:

{$API.URI.ENDPOINT} - is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, Automation API port and path.

For example, https://monitored.controlm.instance:8443/automation-api.

Macros used

Name	Description	Default
{$API.URI.ENDPOINT}	The API endpoint is a URI - for example, `https://monitored.controlm.instance:8443/automation-api`.	`<set the api uri endpoint here>`
{$API.TOKEN}	A token to use for API connections.	`<set the token here>`

Items

Name	Description	Type	Key and additional info
Control-M: Get Control-M servers	Gets a list of servers.	HTTP agent	controlm.servers
Control-M: Get SLA services	Gets all the SLA active services.	HTTP agent	controlm.services

LLD rule Server discovery

Name Description Type Key and additional info

Server discovery

Discovers the Control-M servers.

Dependent item

controlm.server.discovery

Preprocessing

Discard unchanged with heartbeat: 2h

LLD rule SLA services discovery

Name Description Type Key and additional info

SLA services discovery

Discovers the SLA services in the Control-M environment.

Dependent item

controlm.services.discovery

Preprocessing

JSON Path: $.activeServices
⛔️Custom on fail: Set value to: []
Discard unchanged with heartbeat: 1h

Item prototypes for SLA services discovery

Name	Description	Type	Key and additional info
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: stats	Gets the service statistics.	Dependent item	service.stats['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing JSON Path: `$.activeServices.[?(@.serviceName == '{#SERVICE.NAME}')]` ⛔️Custom on fail: Discard value JSON Path: `$.[?(@.serviceJob == '{#SERVICE.JOB}')].first()` ⛔️Custom on fail: Discard value
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status	Gets the service status.	Dependent item	service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'executed'	Gets the number of jobs in the state - `executed`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',executed] Preprocessing JSON Path: `$.statusByJobs.executed` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitCondition'	Gets the number of jobs in the state - `waitCondition`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitCondition] Preprocessing JSON Path: `$.statusByJobs.waitCondition` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitResource'	Gets the number of jobs in the state - `waitResource`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitResource] Preprocessing JSON Path: `$.statusByJobs.waitResource` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitHost'	Gets the number of jobs in the state - `waitHost`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitHost] Preprocessing JSON Path: `$.statusByJobs.waitHost` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitWorkload'	Gets the number of jobs in the state - `waitWorkload`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitWorkload] Preprocessing JSON Path: `$.statusByJobs.waitWorkload` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'completed'	Gets the number of jobs in the state - `completed`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',completed] Preprocessing JSON Path: `$.statusByJobs.completed` Discard unchanged with heartbeat: `1h`
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'error'	Gets the number of jobs in the state - `error`.	Dependent item	service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error] Preprocessing JSON Path: `$.statusByJobs.error` Discard unchanged with heartbeat: `1h`

Trigger prototypes for SLA services discovery

Name	Description	Expression	Severity
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}]	The service has encountered an issue.	`last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=0 or last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=10`\|Average	Manual close: Yes
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}]	The service has finished its job late.	`last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=3`\|Warning	Manual close: Yes
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs in 'error' state	There are services present which are in the state - `error`.	`last(/Control-M enterprise manager by HTTP/service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error],#1)>0`\|Average

Control-M server by HTTP

Overview

This template is designed to get metrics from the Control-M server using the Control-M Automation API with HTTP agent.

This template monitors server statistics, discovers jobs and agents using Low Level Discovery.

To use this template, macros {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME} need to be set.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Control-M 9.21.0

Configuration

Setup

This template is primarily intended for using in conjunction with the Control-M enterprise manager by HTTP template in order to create host prototypes.

It monitors:

server statistics;
discovers jobs using Low Level Discovery;
discovers agents using Low Level Discovery.

However, if you wish to monitor the Control-M server separately with this template, you must set the following macros: {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME}.

To access the {$API.TOKEN} macro, use one of the following interfaces:

{$API.URI.ENDPOINT} - is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, the Automation API port and path.

For example, https://monitored.controlm.instance:8443/automation-api.

{$SERVER.NAME} - is the name of the Control-M server to be monitored.

Macros used

Name	Description	Default
{$SERVER.NAME}	The name of the Control-M server.	`<set the server name here>`
{$API.URI.ENDPOINT}	The API endpoint is a URI - for example, `https://monitored.controlm.instance:8443/automation-api`.	`<set the api uri endpoint here>`
{$API.TOKEN}	A token to use for API connections.	`<set the token here>`

Items

Name	Description	Type	Key and additional info
Control-M: Get Control-M server stats	Gets the statistics of the server.	HTTP agent	controlm.server.stats Preprocessing JSON Path: `$.[?(@.name == '{$SERVER.NAME}')].first()` ⛔️Custom on fail: Set error to: `Could not get server stats.`
Control-M: Get jobs	Gets the status of jobs.	HTTP agent	controlm.jobs
Control-M: Get agents	Gets agents for the server.	HTTP agent	controlm.agents
Control-M: Jobs statistics	Gets the statistics of jobs.	Dependent item	controlm.jobs.statistics Preprocessing JSON Path: `$.['returned', 'total']`
Control-M: Jobs returned	Gets the count of returned jobs.	Dependent item	controlm.jobs.statistics.returned Preprocessing JSON Path: `$.[0]` Discard unchanged with heartbeat: `1h`
Control-M: Jobs total	Gets the count of total jobs.	Dependent item	controlm.jobs.statistics.total Preprocessing JSON Path: `$.[1]` Discard unchanged with heartbeat: `1h`
Control-M: Server state	Gets the metric of the server state.	Dependent item	server.state Preprocessing JSON Path: `$.state` ⛔️Custom on fail: Set error to: `Could not get server state.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Control-M: Server message	Gets the metric of the server message.	Dependent item	server.message Preprocessing JSON Path: `$.message` ⛔️Custom on fail: Set error to: `Could not get server message.` Discard unchanged with heartbeat: `1h`
Control-M: Server version	Gets the metric of the server version.	Dependent item	server.version Preprocessing JSON Path: `$.version` ⛔️Custom on fail: Set error to: `Could not get server version.` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Control-M: Server is down	The server is down.	`last(/Control-M server by HTTP/server.state)=0 or last(/Control-M server by HTTP/server.state)=10`\|High
Control-M: Server disconnected	The server is disconnected.	`last(/Control-M server by HTTP/server.message,#1)="Disconnected"`\|High
Control-M: Server error	The server has encountered an error.	`last(/Control-M server by HTTP/server.message,#1)<>"Connected" and last(/Control-M server by HTTP/server.message,#1)<>"Disconnected" and last(/Control-M server by HTTP/server.message,#1)<>""`\|High
Control-M: Server version has changed	The server version has changed. Acknowledge to close the problem manually.	`last(/Control-M server by HTTP/server.version,#1)<>last(/Control-M server by HTTP/server.version,#2) and length(last(/Control-M server by HTTP/server.version))>0`\|Info	Manual close: Yes

LLD rule Jobs discovery

Name Description Type Key and additional info

Jobs discovery

Discovers jobs on the server.

Dependent item

controlm.jobs.discovery

Preprocessing

JSON Path: $.statuses
⛔️Custom on fail: Set value to: []
Discard unchanged with heartbeat: 1h

Item prototypes for Jobs discovery

Name	Description	Type	Key and additional info
Job [{#JOB.ID}]: stats	Gets the statistics of a job.	Dependent item	job.stats['{#JOB.ID}'] Preprocessing JSON Path: `$.statuses.[?(@.jobId == '{#JOB.ID}')].first()` ⛔️Custom on fail: Discard value
Job [{#JOB.ID}]: status	Gets the status of a job.	Dependent item	job.status['{#JOB.ID}'] Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Job [{#JOB.ID}]: number of runs	Gets the number of runs for a job.	Dependent item	job.numberOfRuns['{#JOB.ID}'] Preprocessing JSON Path: `$.numberOfRuns` Discard unchanged with heartbeat: `1h`
Job [{#JOB.ID}]: type	Gets the job type.	Dependent item	job.type['{#JOB.ID}'] Preprocessing JSON Path: `$.type` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Job [{#JOB.ID}]: held status	Gets the held status of a job.	Dependent item	job.held['{#JOB.ID}'] Preprocessing JSON Path: `$.held` JavaScript: `The text is too long. Please see the template.`

Trigger prototypes for Jobs discovery

Name	Description	Expression	Severity	Dependencies and additional info
Job [{#JOB.ID}]: status [{ITEM.VALUE}]	The job has encountered an issue.	`last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=1 or last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=10`\|Warning	Manual close: Yes

LLD rule Agent discovery

Name Description Type Key and additional info

Agent discovery

Discovers agents on the server.

Dependent item

controlm.agent.discovery

Preprocessing

JSON Path: $.agents
⛔️Custom on fail: Set value to: []
Discard unchanged with heartbeat: 1h

Item prototypes for Agent discovery

Name Description Type Key and additional info

Agent [{#AGENT.NAME}]: stats

Gets the statistics of an agent.

Dependent item

agent.stats['{#AGENT.NAME}']

Preprocessing

JSON Path: $.agents.[?(@.nodeid == '{#AGENT.NAME}')].first()
⛔️Custom on fail: Discard value

Agent [{#AGENT.NAME}]: status

Gets the status of an agent.

Dependent item

agent.status['{#AGENT.NAME}']

Preprocessing

JSON Path: $.status
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Agent [{#AGENT.NAME}]: version

Gets the version number of an agent.

Dependent item

agent.version['{#AGENT.NAME}']

Preprocessing

JSON Path: $.version
⛔️Custom on fail: Set value to: Unknown
Discard unchanged with heartbeat: 1h

Trigger prototypes for Agent discovery

Name	Description	Expression	Severity
Agent [{#AGENT.NAME}]: status [{ITEM.VALUE}]	The agent has encountered an issue.	`last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=1 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=10`\|Average	Manual close: Yes
Agent [{#AGENT.NAME}}: status disabled	The agent is disabled.	`last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=2 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=3`\|Info	Manual close: Yes
Agent [{#AGENT.NAME}]: version has changed	The agent version has changed. Acknowledge to close the problem manually.	`last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)<>last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#2)`\|Info	Manual close: Yes
Agent [{#AGENT.NAME}]: unknown version	The agent version is unknown.	`last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)="Unknown"`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

consul_cluster_http

View README Download JSON

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services.
In case of Open Source version leave this macro empty.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODENAME.NOTMATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICENAME.NOTMATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Consul cluster: Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Consul cluster: Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Consul cluster: Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul cluster: Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.getclusterserf Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Consul cluster: Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.getcatalogservices Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity
Consul cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`\|Info	Manual close: Yes
Consul: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`\|Average
Consul: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`\|Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Consul: Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Consul: Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodespassing["{#SERVICENAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodeswarning["{#SERVICENAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodescritical["{#SERVICENAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul cluster: ["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.getservicestats["{#SERVICE_NAME}"] Preprocessing Check for not supported value ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.CLUSTER.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

consul_node_http

View README Download JSON

HashiCorp Consul Node by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics. See documentation.
More information about metrics you can find in official documentation.

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values. More information about metrics you can find in official documentation.

This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICENAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOTMATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Consul: Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.getnodeinfo Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Consul: Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Consul: Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Consul: Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Consul: Number of check monitors	Number of check monitors on current node.	Dependent item	consul.checkmonitorsnumber Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Consul: Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpusecondstotal.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Consul: Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtualmemorybytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Consul: RSS memory usage	Resident memory size in bytes.	Dependent item	consul.residentmemorybytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Consul: Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Consul: Open file descriptors	Number of open file descriptors.	Dependent item	consul.processopenfds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Consul: Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.processmaxfds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Consul: Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Consul: Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.clientrpcfailed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
Consul: UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
Consul: UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
Consul: GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Consul: GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Consul: Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Consul: Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Consul: Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Consul: Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Consul: Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Consul: Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.pushpullnode.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.pushpullnode.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
Consul: ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.getlocalservices
Consul: Get local services check	Data collection check.	Dependent item	consul.getlocalservices.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity
Consul: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`\|Info	Manual close: Yes
Consul: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`\|Warning
Consul: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`\|Warning	Depends on: Consul: Node's health score is critical
Consul: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`\|Average
Consul: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`\|Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Discover metrics for services that are registered with the local agent.

Dependent item

consul.nodeserviceslld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

Consul: ["{#SERVICE_NAME}"]: Aggregated status

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregatedstate["{#SERVICEID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICEID}/{#SERVICECHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICEID}/{#SERVICECHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`\|Warning
Consul: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`\|Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Discovery HTTP API methods specific metrics.

Dependent item

consul.httpapidiscovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

Consul: HTTP request: ["{#HTTP_METHOD}"], p90

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Consul: Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Consul: Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Consul: Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Consul: Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Consul: Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leaderlastcontact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leaderlastcontact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_cloudflare_http

View README Download JSON

Cloudflare by HTTP

Overview

This template is designed for the effortless deployment of Cloudflare monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Cloudflare

Configuration

Setup

1. Create a host, for example mywebsite.com, for a site in your Cloudflare account.

2. Link the template to the host.

3. Customize the values of {$CLOUDFLARE.API.TOKEN}, {$CLOUDFLARE.ZONE_ID} macros.
Cloudflare API Tokens are available in your Cloudflare account under My Profile > API Tokens.
Zone ID is available in your Cloudflare account under Account Home > Site.

Macros used

Name	Description	Default
{$CLOUDFLARE.API.URL}	The URL of Cloudflare API endpoint.	`https://api.cloudflare.com/client/v4`
{$CLOUDFLARE.API.TOKEN}	Your Cloudflare API Token.	`<change>`
{$CLOUDFLARE.ZONE_ID}	Your Cloudflare Site Zone ID.	`<change>`
{$CLOUDFLARE.GET_DATA.TIMEOUT}	Response timeout for Cloudflare API.	`3s`
{$CLOUDFLARE.ERRORS.MAX.WARN}	Maximum responses with errors in %.	`30`
{$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN}	Minimum of cached bandwidth in %.	`50`

Items

Name	Description	Type	Key and additional info
Cloudflare: Total bandwidth	The volume of all data.	Dependent item	cloudflare.bandwidth.all Preprocessing JSON Path: `$.bandwidth.all`
Cloudflare: Cached bandwidth	The volume of cached data.	Dependent item	cloudflare.bandwidth.cached Preprocessing JSON Path: `$.bandwidth.cached`
Cloudflare: Uncached bandwidth	The volume of uncached data.	Dependent item	cloudflare.bandwidth.uncached Preprocessing JSON Path: `$.bandwidth.uncached`
Cloudflare: Cache hit ratio of bandwidth	The ratio of the amount cached bandwidth to the bandwidth in percentage.	Dependent item	cloudflare.bandwidth.cachehitratio Preprocessing JSON Path: `$.bandwidth.cache_hit_ratio`
Cloudflare: SSL encrypted bandwidth	The volume of encrypted data.	Dependent item	cloudflare.bandwidth.ssl.encrypted Preprocessing JSON Path: `$.bandwidth.encrypted`
Cloudflare: Unencrypted bandwidth	The volume of unencrypted data.	Dependent item	cloudflare.bandwidth.ssl.unencrypted Preprocessing JSON Path: `$.bandwidth.unencrypted`
Cloudflare: DNS queries	The amount of all DNS queries.	Dependent item	cloudflare.dns.query.all Preprocessing JSON Path: `$.dns.query.all`
Cloudflare: Stale DNS queries	The number of stale DNS queries.	Dependent item	cloudflare.dns.query.stale Preprocessing JSON Path: `$.dns.query.stale`
Cloudflare: Uncached DNS queries	The number of uncached DNS queries.	Dependent item	cloudflare.dns.query.uncached Preprocessing JSON Path: `$.dns.query.uncached`
Cloudflare: Get data	The JSON with result of Cloudflare API request.	Script	cloudflare.get
Cloudflare: Total page views	The amount of all pageviews.	Dependent item	cloudflare.pageviews.all Preprocessing JSON Path: `$.pageviews.all`
Cloudflare: Total requests	The amount of all requests.	Dependent item	cloudflare.requests.all Preprocessing JSON Path: `$.requests.all`
Cloudflare: Cached requests		Dependent item	cloudflare.requests.cached Preprocessing JSON Path: `$.requests.cached`
Cloudflare: Uncached requests	The number of uncached requests.	Dependent item	cloudflare.requests.uncached Preprocessing JSON Path: `$.requests.uncached`
Cloudflare: Cache hit ratio % over time	The ratio of the amount cached requests to all requests in percentage.	Dependent item	cloudflare.requests.cachehitratio Preprocessing JSON Path: `$.requests.cache_hit_ratio`
Cloudflare: Response codes 1xx	The number requests with 1xx response codes.	Dependent item	cloudflare.requests.response_100 Preprocessing JSON Path: `$.requests.response_100`
Cloudflare: Response codes 2xx	The number requests with 2xx response codes.	Dependent item	cloudflare.requests.response_200 Preprocessing JSON Path: `$.requests.response_200`
Cloudflare: Response codes 3xx	The number requests with 3xx response codes.	Dependent item	cloudflare.requests.response_300 Preprocessing JSON Path: `$.requests.response_300`
Cloudflare: Response codes 4xx	The number requests with 4xx response codes.	Dependent item	cloudflare.requests.response_400 Preprocessing JSON Path: `$.requests.response_400`
Cloudflare: Response codes 5xx	The number requests with 5xx response codes.	Dependent item	cloudflare.requests.response_500 Preprocessing JSON Path: `$.requests.response_500`
Cloudflare: Non-2xx responses ratio	The ratio of the amount requests with non-2xx response codes to all requests in percentage.	Dependent item	cloudflare.requests.others_ratio Preprocessing JSON Path: `$.requests.others_ratio`
Cloudflare: 2xx responses ratio	The ratio of the amount requests with 2xx response codes to all requests in percentage.	Dependent item	cloudflare.requests.success_ratio Preprocessing JSON Path: `$.requests.success_ratio`
Cloudflare: SSL encrypted requests	The number of encrypted requests.	Dependent item	cloudflare.requests.ssl.encrypted Preprocessing JSON Path: `$.requests.encrypted`
Cloudflare: Unencrypted requests	The number of unencrypted requests.	Dependent item	cloudflare.requests.ssl.unencrypted Preprocessing JSON Path: `$.requests.unencrypted`
Cloudflare: Total threats	The number of all threats.	Dependent item	cloudflare.threats.all Preprocessing JSON Path: `$.threats.all`
Cloudflare: Unique visitors	The number of all visitors IPs.	Dependent item	cloudflare.uniques.all Preprocessing JSON Path: `$.uniques.all`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Cloudflare: Cached bandwidth is too low		`max(/Cloudflare by HTTP/cloudflare.bandwidth.cache_hit_ratio,#3) < {$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN}`\|Warning
Cloudflare: Ratio of non-2xx responses is too high	A large number of errors can indicate a malfunction of the site.	`min(/Cloudflare by HTTP/cloudflare.requests.others_ratio,#3) > {$CLOUDFLARE.ERRORS.MAX.WARN}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_certificate_agent2

View README Download JSON

Website certificate by Zabbix agent 2

Overview

The template to monitor TLS/SSL certificate on the website by Zabbix agent 2 that works without any external scripts. Zabbix agent 2 with the WebCertificate plugin requests certificate using the web.certificate.get key and returns JSON with certificate attributes.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Website TLS/SSL certificate

Configuration

Setup

1. Setup and configure zabbix-agent2 with the WebCertificate plugin.

2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]

3. Create a host for the TLS/SSL certificate with Zabbix agent interface.

4. Link the template to the host.

5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.

Macros used

Name	Description	Default
{$CERT.EXPIRY.WARN}	Number of days until the certificate expires.	`7`
{$CERT.WEBSITE.HOSTNAME}	The website DNS name for the connection.	`<Put DNS name>`
{$CERT.WEBSITE.PORT}	The TLS/SSL port number of the website.	`443`
{$CERT.WEBSITE.IP}	The website IP address for the connection.

Items

Name	Description	Type	Key and additional info
Cert: Get	Returns the JSON with attributes of a certificate of the requested site.	Zabbix agent	web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing Discard unchanged with heartbeat: `6h`
Cert: Validation result	The certificate validation result. Possible values: valid/invalid/valid-but-self-signed	Dependent item	cert.validation Preprocessing JSON Path: `$.result.value`
Cert: Last validation status	Last check result message.	Dependent item	cert.message Preprocessing JSON Path: `$.result.message`
Cert: Version	The version of the encoded certificate.	Dependent item	cert.version Preprocessing JSON Path: `$.x509.version`
Cert: Serial number	The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero.	Dependent item	cert.serial_number Preprocessing JSON Path: `$.x509.serial_number`
Cert: Signature algorithm	The algorithm identifier for the algorithm used by the CA to sign the certificate.	Dependent item	cert.signature_algorithm Preprocessing JSON Path: `$.x509.signature_algorithm`
Cert: Issuer	The field identifies the entity that has signed and issued the certificate.	Dependent item	cert.issuer Preprocessing JSON Path: `$.x509.issuer`
Cert: Valid from	The date on which the certificate validity period begins.	Dependent item	cert.not_before Preprocessing JSON Path: `$.x509.not_before.timestamp`
Cert: Expires on	The date on which the certificate validity period ends.	Dependent item	cert.not_after Preprocessing JSON Path: `$.x509.not_after.timestamp`
Cert: Subject	The field identifies the entity associated with the public key stored in the subject public key field.	Dependent item	cert.subject Preprocessing JSON Path: `$.x509.subject`
Cert: Subject alternative name	The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI).	Dependent item	cert.alternative_names Preprocessing JSON Path: `$.x509.alternative_names`
Cert: Public key algorithm	The digital signature algorithm is used to verify the signature of a certificate.	Dependent item	cert.publickeyalgorithm Preprocessing JSON Path: `$.x509.public_key_algorithm`
Cert: Fingerprint	The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form.	Dependent item	cert.sha1_fingerprint Preprocessing JSON Path: `$.sha1_fingerprint`

Triggers

Name	Description	Expression	Severity
Cert: SSL certificate is invalid	SSL certificate has expired or it is issued for another domain.	`find(/Website certificate by Zabbix agent 2/cert.validation,,"like","invalid")=1`\|High
Cert: SSL certificate expires soon	The SSL certificate should be updated or it will become untrusted.	`(last(/Website certificate by Zabbix agent 2/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN}`\|Warning	Depends on: Cert: SSL certificate is invalid
Cert: Fingerprint has changed	The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Acknowledge to close the problem manually. There could be multiple valid certificates on some installations. In this case, the trigger will have a false positive. You can ignore it or disable the trigger.	`last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint,#2)`\|Info	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_ceph_agent2

View README Download JSON

Ceph by Zabbix agent 2

Overview

The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Ceph by Zabbix agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Ceph 14.2

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
Set the {$CEPH.CONNSTRING} such as
Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Zabbix agent configuration file.

Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Macros used

Name	Description	Default
{$CEPH.USER}		`zabbix`
{$CEPH.API.KEY}		`zabbix_pass`
{$CEPH.CONNSTRING}		`https://localhost:8003`

Items

Name	Description	Type	Key and additional info
Ceph: Get overall cluster status		Zabbix agent	ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Get OSD stats		Zabbix agent	ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Get OSD dump		Zabbix agent	ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Get df		Zabbix agent	ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Ping		Zabbix agent	ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] Preprocessing Discard unchanged with heartbeat: `30m`
Ceph: Number of Monitors	The number of Monitors configured in a Ceph cluster.	Dependent item	ceph.num_mon Preprocessing JSON Path: `$.num_mon` Discard unchanged with heartbeat: `30m`
Ceph: Overall cluster status	The overall Ceph cluster status, eg 0 - HEALTHOK, 1 - HEALTHWARN or 2 - HEALTH_ERR.	Dependent item	ceph.overall_status Preprocessing JSON Path: `$.overall_status` Discard unchanged with heartbeat: `10m`
Ceph: Minimum Mon release version	minmonrelease_name	Dependent item	ceph.minmonrelease_name Preprocessing JSON Path: `$.min_mon_release_name` Discard unchanged with heartbeat: `1h`
Ceph: Ceph Read bandwidth	The global read bytes per second.	Dependent item	ceph.rd_bytes.rate Preprocessing JSON Path: `$.rd_bytes` Change per second
Ceph: Ceph Write bandwidth	The global write bytes per second.	Dependent item	ceph.wr_bytes.rate Preprocessing JSON Path: `$.wr_bytes` Change per second
Ceph: Ceph Read operations per sec	The global read operations per second.	Dependent item	ceph.rd_ops.rate Preprocessing JSON Path: `$.rd_ops` Change per second
Ceph: Ceph Write operations per sec	The global write operations per second.	Dependent item	ceph.wr_ops.rate Preprocessing JSON Path: `$.wr_ops` Change per second
Ceph: Total bytes available	The total bytes available in a Ceph cluster.	Dependent item	ceph.totalavailbytes Preprocessing JSON Path: `$.total_avail_bytes`
Ceph: Total bytes	The total (RAW) capacity of a Ceph cluster in bytes.	Dependent item	ceph.total_bytes Preprocessing JSON Path: `$.total_bytes`
Ceph: Total bytes used	The total bytes used in a Ceph cluster.	Dependent item	ceph.totalusedbytes Preprocessing JSON Path: `$.total_used_bytes`
Ceph: Total number of objects	The total number of objects in a Ceph cluster.	Dependent item	ceph.total_objects Preprocessing JSON Path: `$.total_objects`
Ceph: Number of Placement Groups	The total number of Placement Groups in a Ceph cluster.	Dependent item	ceph.num_pg Preprocessing JSON Path: `$.num_pg` Discard unchanged with heartbeat: `10m`
Ceph: Number of Placement Groups in Temporary state	The total number of Placement Groups in a pg_temp state	Dependent item	ceph.numpgtemp Preprocessing JSON Path: `$.num_pg_temp`
Ceph: Number of Placement Groups in Active state	The total number of Placement Groups in an active state.	Dependent item	ceph.pg_states.active Preprocessing JSON Path: `$.pg_states.active`
Ceph: Number of Placement Groups in Clean state	The total number of Placement Groups in a clean state.	Dependent item	ceph.pg_states.clean Preprocessing JSON Path: `$.pg_states.clean`
Ceph: Number of Placement Groups in Peering state	The total number of Placement Groups in a peering state.	Dependent item	ceph.pg_states.peering Preprocessing JSON Path: `$.pg_states.peering`
Ceph: Number of Placement Groups in Scrubbing state	The total number of Placement Groups in a scrubbing state.	Dependent item	ceph.pg_states.scrubbing Preprocessing JSON Path: `$.pg_states.scrubbing`
Ceph: Number of Placement Groups in Undersized state	The total number of Placement Groups in an undersized state.	Dependent item	ceph.pg_states.undersized Preprocessing JSON Path: `$.pg_states.undersized`
Ceph: Number of Placement Groups in Backfilling state	The total number of Placement Groups in a backfill state.	Dependent item	ceph.pg_states.backfilling Preprocessing JSON Path: `$.pg_states.backfilling`
Ceph: Number of Placement Groups in degraded state	The total number of Placement Groups in a degraded state.	Dependent item	ceph.pg_states.degraded Preprocessing JSON Path: `$.pg_states.degraded`
Ceph: Number of Placement Groups in inconsistent state	The total number of Placement Groups in an inconsistent state.	Dependent item	ceph.pg_states.inconsistent Preprocessing JSON Path: `$.pg_states.inconsistent`
Ceph: Number of Placement Groups in Unknown state	The total number of Placement Groups in an unknown state.	Dependent item	ceph.pg_states.unknown Preprocessing JSON Path: `$.pg_states.unknown`
Ceph: Number of Placement Groups in remapped state	The total number of Placement Groups in a remapped state.	Dependent item	ceph.pg_states.remapped Preprocessing JSON Path: `$.pg_states.remapped`
Ceph: Number of Placement Groups in recovering state	The total number of Placement Groups in a recovering state.	Dependent item	ceph.pg_states.recovering Preprocessing JSON Path: `$.pg_states.recovering`
Ceph: Number of Placement Groups in backfill_toofull state	The total number of Placement Groups in a backfill_toofull state.	Dependent item	ceph.pgstates.backfilltoofull Preprocessing JSON Path: `$.pg_states.backfill_toofull`
Ceph: Number of Placement Groups in backfill_wait state	The total number of Placement Groups in a backfill_wait state.	Dependent item	ceph.pgstates.backfillwait Preprocessing JSON Path: `$.pg_states.backfill_wait`
Ceph: Number of Placement Groups in recovery_wait state	The total number of Placement Groups in a recovery_wait state.	Dependent item	ceph.pgstates.recoverywait Preprocessing JSON Path: `$.pg_states.recovery_wait`
Ceph: Number of Pools	The total number of pools in a Ceph cluster.	Dependent item	ceph.num_pools Preprocessing JSON Path: `$.num_pools`
Ceph: Number of OSDs	The number of the known storage daemons in a Ceph cluster.	Dependent item	ceph.num_osd Preprocessing JSON Path: `$.num_osd` Discard unchanged with heartbeat: `10m`
Ceph: Number of OSDs in state: UP	The total number of the online storage daemons in a Ceph cluster.	Dependent item	ceph.numosdup Preprocessing JSON Path: `$.num_osd_up` Discard unchanged with heartbeat: `10m`
Ceph: Number of OSDs in state: IN	The total number of the participating storage daemons in a Ceph cluster.	Dependent item	ceph.numosdin Preprocessing JSON Path: `$.num_osd_in` Discard unchanged with heartbeat: `10m`
Ceph: Ceph OSD avg fill	The average fill of OSDs.	Dependent item	ceph.osd_fill.avg Preprocessing JSON Path: `$.osd_fill.avg`
Ceph: Ceph OSD max fill	The percentage of the most filled OSD.	Dependent item	ceph.osd_fill.max Preprocessing JSON Path: `$.osd_fill.max`
Ceph: Ceph OSD min fill	The percentage fill of the minimum filled OSD.	Dependent item	ceph.osd_fill.min Preprocessing JSON Path: `$.osd_fill.min`
Ceph: Ceph OSD max PGs	The maximum amount of Placement Groups on OSDs.	Dependent item	ceph.osd_pgs.max Preprocessing JSON Path: `$.osd_pgs.max`
Ceph: Ceph OSD min PGs	The minimum amount of Placement Groups on OSDs.	Dependent item	ceph.osd_pgs.min Preprocessing JSON Path: `$.osd_pgs.min`
Ceph: Ceph OSD avg PGs	The average amount of Placement Groups on OSDs.	Dependent item	ceph.osd_pgs.avg Preprocessing JSON Path: `$.osd_pgs.avg`
Ceph: Ceph OSD Apply latency Avg	The average apply latency of OSDs.	Dependent item	ceph.osdlatencyapply.avg Preprocessing JSON Path: `$.osd_latency_apply.avg`
Ceph: Ceph OSD Apply latency Max	The maximum apply latency of OSDs.	Dependent item	ceph.osdlatencyapply.max Preprocessing JSON Path: `$.osd_latency_apply.max`
Ceph: Ceph OSD Apply latency Min	The minimum apply latency of OSDs.	Dependent item	ceph.osdlatencyapply.min Preprocessing JSON Path: `$.osd_latency_apply.min`
Ceph: Ceph OSD Commit latency Avg	The average commit latency of OSDs.	Dependent item	ceph.osdlatencycommit.avg Preprocessing JSON Path: `$.osd_latency_commit.avg`
Ceph: Ceph OSD Commit latency Max	The maximum commit latency of OSDs.	Dependent item	ceph.osdlatencycommit.max Preprocessing JSON Path: `$.osd_latency_commit.max`
Ceph: Ceph OSD Commit latency Min	The minimum commit latency of OSDs.	Dependent item	ceph.osdlatencycommit.min Preprocessing JSON Path: `$.osd_latency_commit.min`
Ceph: Ceph backfill full ratio	The backfill full ratio setting of the Ceph cluster as configured on OSDMap.	Dependent item	ceph.osdbackfillfullratio Preprocessing JSON Path: `$.osd_backfillfull_ratio` Discard unchanged with heartbeat: `10m`
Ceph: Ceph full ratio	The full ratio setting of the Ceph cluster as configured on OSDMap.	Dependent item	ceph.osdfullratio Preprocessing JSON Path: `$.osd_full_ratio` Discard unchanged with heartbeat: `10m`
Ceph: Ceph nearfull ratio	The near full ratio setting of the Ceph cluster as configured on OSDMap.	Dependent item	ceph.osdnearfullratio Preprocessing JSON Path: `$.osd_nearfull_ratio` Discard unchanged with heartbeat: `10m`

Triggers

Name	Description	Expression	Severity
Ceph: Can not connect to cluster	The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues).	`last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0`\|Average
Ceph: Cluster in ERROR state		`last(/Ceph by Zabbix agent 2/ceph.overall_status)=2`\|Average	Manual close: Yes
Ceph: Cluster in WARNING state		`last(/Ceph by Zabbix agent 2/ceph.overall_status)=1`\|Warning	Manual close: Yes Depends on: Ceph: Cluster in ERROR state
Ceph: Minimum monitor release version has changed	A Ceph version has changed. Acknowledge to close the problem manually.	`last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0`\|Info	Manual close: Yes

LLD rule OSD

Name	Description	Type	Key and additional info
OSD		Zabbix agent	ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for OSD

Name	Description	Type	Key and additional info
Ceph: [osd.{#OSDNAME}] OSD in		Dependent item	ceph.osd[{#OSDNAME},in] Preprocessing JSON Path: `$.osds.{#OSDNAME}.in` Discard unchanged with heartbeat: `10m`
Ceph: [osd.{#OSDNAME}] OSD up		Dependent item	ceph.osd[{#OSDNAME},up] Preprocessing JSON Path: `$.osds.{#OSDNAME}.up` Discard unchanged with heartbeat: `10m`
Ceph: [osd.{#OSDNAME}] OSD PGs		Dependent item	ceph.osd[{#OSDNAME},num_pgs] Preprocessing JSON Path: `$.osds.{#OSDNAME}.num_pgs` ⛔️Custom on fail: Discard value
Ceph: [osd.{#OSDNAME}] OSD fill		Dependent item	ceph.osd[{#OSDNAME},fill] Preprocessing JSON Path: `$.osds.{#OSDNAME}.osd_fill` ⛔️Custom on fail: Discard value
Ceph: [osd.{#OSDNAME}] OSD latency apply	The time taken to flush an update to disks.	Dependent item	ceph.osd[{#OSDNAME},latency_apply] Preprocessing JSON Path: `$.osds.{#OSDNAME}.osd_latency_apply` ⛔️Custom on fail: Discard value
Ceph: [osd.{#OSDNAME}] OSD latency commit	The time taken to commit an operation to the journal.	Dependent item	ceph.osd[{#OSDNAME},latency_commit] Preprocessing JSON Path: `$.osds.{#OSDNAME}.osd_latency_commit` ⛔️Custom on fail: Discard value

Trigger prototypes for OSD

Name	Description	Expression	Severity
Ceph: OSD osd.{#OSDNAME} is down	OSD osd.{#OSDNAME} is marked "down" in the osdmap. The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network.	`last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0`\|Average
Ceph: OSD osd.{#OSDNAME} is full		`min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100`\|Average
Ceph: Ceph OSD osd.{#OSDNAME} is near full		`min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100`\|Warning	Depends on: Ceph: OSD osd.{#OSDNAME} is full

LLD rule Pool

Name	Description	Type	Key and additional info
Pool		Zabbix agent	ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for Pool

Name	Description	Type	Key and additional info
Ceph: [{#POOLNAME}] Pool Used	The total bytes used in a pool.	Dependent item	ceph.pool["{#POOLNAME}",bytes_used] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].bytes_used`
Ceph: [{#POOLNAME}] Max available	The maximum available space in the given pool.	Dependent item	ceph.pool["{#POOLNAME}",max_avail] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].max_avail`
Ceph: [{#POOLNAME}] Pool RAW Used	Bytes used in pool including the copies made.	Dependent item	ceph.pool["{#POOLNAME}",stored_raw] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].stored_raw`
Ceph: [{#POOLNAME}] Pool Percent Used	The percentage of the storage used per pool.	Dependent item	ceph.pool["{#POOLNAME}",percent_used] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].percent_used`
Ceph: [{#POOLNAME}] Pool objects	The number of objects in the pool.	Dependent item	ceph.pool["{#POOLNAME}",objects] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].objects`
Ceph: [{#POOLNAME}] Pool Read bandwidth	The read rate per pool (bytes per second).	Dependent item	ceph.pool["{#POOLNAME}",rd_bytes.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].rd_bytes` Change per second
Ceph: [{#POOLNAME}] Pool Write bandwidth	The write rate per pool (bytes per second).	Dependent item	ceph.pool["{#POOLNAME}",wr_bytes.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].wr_bytes` Change per second
Ceph: [{#POOLNAME}] Pool Read operations	The read rate per pool (operations per second).	Dependent item	ceph.pool["{#POOLNAME}",rd_ops.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].rd_ops` Change per second
Ceph: [{#POOLNAME}] Pool Write operations	The write rate per pool (operations per second).	Dependent item	ceph.pool["{#POOLNAME}",wr_ops.rate] Preprocessing JSON Path: `$.pools["{#POOLNAME}"].wr_ops` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_aranet_http

View README Download JSON

Aranet Cloud

Overview

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Aranet Cloud

Configuration

Setup

Refer to the vendor documentation.

Macros used

Name	Description	Default
{$ARANET.API.ENDPOINT}	Aranet Cloud API endpoint.	`https://aranet.cloud/api`
{$ARANET.API.USERNAME}	Aranet Cloud username.	`<PUT YOUR USERNAME>`
{$ARANET.API.PASSWORD}	Aranet Cloud password.	`<PUT YOUR PASSWORD>`
{$ARANET.API.SPACE_NAME}	Aranet Cloud organization name.	`<PUT YOUR SPACE NAME>`
{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES}	Filter of discoverable sensors by name.	`.+`
{$ARANET.LLD.FILTER.SENSORNAME.NOTMATCHES}	Filter to exclude discoverable sensors by name.	`CHANGE_IF_NEEDED`
{$ARANET.LLD.FILTER.SENSOR_ID.MATCHES}	Filter of discoverable sensors by id.	`.+`
{$ARANET.LLD.FILTER.GATEWAY_NAME.MATCHES}	Filter of discoverable sensors by gateway name.	`.+`
{$ARANET.LLD.FILTER.GATEWAYNAME.NOTMATCHES}	Filter to exclude discoverable sensors by gateway name.	`CHANGE_IF_NEEDED`
{$ARANET.LLD.FILTER.GATEWAY_ID.MATCHES}	Filter of discoverable sensors by gateway id.	`.+`
{$ARANET.BATT.VOLTAGE.MIN.WARN}	Battery voltage warning threshold.	`1`
{$ARANET.BATT.VOLTAGE.MIN.CRIT}	Battery voltage critical threshold.	`2`
{$ARANET.HUMIDITY.MIN.WARN}	Minimum humidity threshold.	`20`
{$ARANET.HUMIDITY.MAX.WARN}	Maximum humidity threshold.	`70`
{$ARANET.CO2.MAX.WARN}	CO2 warning threshold.	`600`
{$ARANET.CO2.MAX.CRIT}	CO2 critical threshold.	`1000`
{$ARANET.LAST_UPDATE.MAX.WARN}	Data update delay threshold.	`1h`

Items

Name Description Type Key and additional info

Aranet: Sensors discovery

Discovery for Aranet Cloud sensors

Dependent item

aranet.sensor.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 15m

Aranet: Get data Script aranet.get_data

LLD rule Temperature discovery

Name	Description	Type	Key and additional info
Temperature discovery	Discovery for Aranet Cloud temperature sensors	Dependent item	aranet.temp.discovery

Item prototypes for Temperature discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.temp["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Humidity discovery

Name	Description	Type	Key and additional info
Humidity discovery	Discovery for Aranet Cloud humidity sensors	Dependent item	aranet.humidity.discovery

Item prototypes for Humidity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.humidity["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for Humidity discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#METRIC}: Low humidity on "[{#GATEWAYNAME}] {#SENSORNAME}"		`max(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.HUMIDITY.MIN.WARN:"{#SENSOR_NAME}"}`\|Warning	Depends on: {#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}"
{#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}"		`min(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.HUMIDITY.MAX.WARN:"{#SENSOR_NAME}"}`\|High

LLD rule RSSI discovery

Name	Description	Type	Key and additional info
RSSI discovery	Discovery for Aranet Cloud RSSI sensors	Dependent item	aranet.rssi.discovery

Item prototypes for RSSI discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.rssi["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Battery voltage discovery

Name	Description	Type	Key and additional info
Battery voltage discovery	Discovery for Aranet Cloud Battery voltage sensors	Dependent item	aranet.battery.voltage.discovery

Item prototypes for Battery voltage discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.battery.voltage["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for Battery voltage discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#METRIC}: Low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}"		`max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.WARN:"{#SENSOR_NAME}"}`\|Warning	Depends on: {#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}"
{#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}"		`max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.CRIT:"{#SENSOR_NAME}"}`\|High

LLD rule CO2 discovery

Name	Description	Type	Key and additional info
CO2 discovery	Discovery for Aranet Cloud CO2 sensors	Dependent item	aranet.co2.discovery

Item prototypes for CO2 discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.co2["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

Trigger prototypes for CO2 discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#METRIC}: High CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}"		`min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.WARN:"{#SENSOR_NAME}"}`\|Warning	Depends on: {#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}"
{#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}"		`min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.CRIT:"{#SENSOR_NAME}"}`\|High

LLD rule Atmospheric pressure discovery

Name	Description	Type	Key and additional info
Atmospheric pressure discovery	Discovery for Aranet Cloud atmospheric pressure sensors	Dependent item	aranet.pressure.discovery

Item prototypes for Atmospheric pressure discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.pressure["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Voltage discovery

Name	Description	Type	Key and additional info
Voltage discovery	Discovery for Aranet Cloud Voltage sensors	Dependent item	aranet.voltage.discovery

Item prototypes for Voltage discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.voltage["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Weight discovery

Name	Description	Type	Key and additional info
Weight discovery	Discovery for Aranet Cloud Weight sensors	Dependent item	aranet.weight.discovery

Item prototypes for Weight discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.weight["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Volumetric Water Content discovery

Name	Description	Type	Key and additional info
Volumetric Water Content discovery	Discovery for Aranet Cloud Volumetric Water Content sensors	Dependent item	aranet.volumwatercontent.discovery

Item prototypes for Volumetric Water Content discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.volumetric.water.content["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule PPFD discovery

Name	Description	Type	Key and additional info
PPFD discovery	Discovery for Aranet Cloud PPFD sensors	Dependent item	aranet.ppfd.discovery

Item prototypes for PPFD discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.ppfd["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Distance discovery

Name	Description	Type	Key and additional info
Distance discovery	Discovery for Aranet Cloud Distance sensors	Dependent item	aranet.distance.discovery

Item prototypes for Distance discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.distance["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Illuminance discovery

Name	Description	Type	Key and additional info
Illuminance discovery	Discovery for Aranet Cloud Illuminance sensors	Dependent item	aranet.illuminance.discovery

Item prototypes for Illuminance discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.illuminance["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule pH discovery

Name	Description	Type	Key and additional info
pH discovery	Discovery for Aranet Cloud pH sensors	Dependent item	aranet.ph.discovery

Item prototypes for pH discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.ph["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Current discovery

Name	Description	Type	Key and additional info
Current discovery	Discovery for Aranet Cloud Current sensors	Dependent item	aranet.current.discovery

Item prototypes for Current discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.current["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Soil Dielectric Permittivity discovery

Name	Description	Type	Key and additional info
Soil Dielectric Permittivity discovery	Discovery for Aranet Cloud Soil Dielectric Permittivity sensors	Dependent item	aranet.soildielectricperm.discovery

Item prototypes for Soil Dielectric Permittivity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.soildielectricperm["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Soil Electrical Conductivity discovery

Name	Description	Type	Key and additional info
Soil Electrical Conductivity discovery	Discovery for Aranet Cloud Soil Electrical Conductivity sensors	Dependent item	aranet.soilelectriccond.discovery

Item prototypes for Soil Electrical Conductivity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.soilelectriccond["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Pore Electrical Conductivity discovery

Name	Description	Type	Key and additional info
Pore Electrical Conductivity discovery	Discovery for Aranet Cloud Pore Electrical Conductivity sensors	Dependent item	aranet.poreelectriccond.discovery

Item prototypes for Pore Electrical Conductivity discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.poreelectriccond["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Pulses discovery

Name	Description	Type	Key and additional info
Pulses discovery	Discovery for Aranet Cloud Pulses sensors	Dependent item	aranet.pulses.discovery

Item prototypes for Pulses discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.pulses["{#GATEWAYID}", "{#SENSORID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Pulses Cumulative discovery

Name	Description	Type	Key and additional info
Pulses Cumulative discovery	Discovery for Aranet Cloud Pulses Cumulative sensors	Dependent item	aranet.pulses_cumulative.discovery

Item prototypes for Pulses Cumulative discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.pulsescumulative["{#GATEWAYID}", "{#SENSOR_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Differential Pressure discovery

Name	Description	Type	Key and additional info
Differential Pressure discovery	Discovery for Aranet Cloud Differential Pressure sensors	Dependent item	aranet.diff_pressure.discovery

Item prototypes for Differential Pressure discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.diffpressure["{#GATEWAYID}", "{#SENSOR_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.

LLD rule Last update discovery

Name	Description	Type	Key and additional info
Last update discovery	Discovery for Aranet Cloud Last update metric	Dependent item	aranet.last_update.discovery

Item prototypes for Last update discovery

Name Description Type Key and additional info

{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME}

Dependent item

aranet.lastupdate["{#GATEWAYID}", "{#SENSOR_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Last update discovery

Name	Description	Expression	Severity	Dependencies and additional info
{#METRIC}: Sensor data "[{#GATEWAYNAME}] {#SENSORNAME}" is not updated		`last(/Aranet Cloud/aranet.last_update["{#GATEWAY_ID}", "{#SENSOR_ID}"]) > {$ARANET.LAST_UPDATE.MAX.WARN:"{#SENSOR_NAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_apache_http

View README Download JSON

Apache by HTTP

Overview

This template is designed for the effortless deployment of Apache monitoring by Zabbix via HTTP and doesn't require any external scripts.

The template collects metrics by polling mod_status with HTTP agent remotely:

127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Apache 2.4.41

Configuration

Setup

See the setup instructions for mod_status.

Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module

This is an example configuration of the Apache web server:

<Location "/server-status">
  SetHandler server-status
  Require host example.com
</Location>

Set the hostname or IP address of the Apache status page host in the {$APACHE.STATUS.HOST} macro. You can also change the status page port in the {$APACHE.STATUS.PORT} macro and status page path in the {$APACHE.STATUS.PATH} macro if necessary.

Macros used

Name	Description	Default
{$APACHE.STATUS.HOST}	The hostname or IP address of the Apache status page host.	`<SET APACHE HOST>`
{$APACHE.STATUS.PORT}	The port of the Apache status page.	`80`
{$APACHE.STATUS.PATH}	The URL path.	`server-status?auto`
{$APACHE.STATUS.SCHEME}	The request scheme, which may be either HTTP or HTTPS.	`http`
{$APACHE.RESPONSE_TIME.MAX.WARN}	The maximum Apache response time expressed in seconds for a trigger expression.	`10`

Items

Name	Description	Type	Key and additional info
Apache: Get status	Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status.	HTTP agent	apache.get_status Preprocessing JavaScript: `The text is too long. Please see the template.`
Apache: Service ping		Simple check	net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Apache: Service response time		Simple check	net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"]
Apache: Total bytes	The total bytes served.	Dependent item	apache.bytes Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024`
Apache: Bytes per second	It is calculated as a rate of change for total bytes statistics. `BytesPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.bytes.rate Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024` Change per second
Apache: Requests per second	It is calculated as a rate of change for the "Total requests" statistics. `ReqPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.requests.rate Preprocessing JSON Path: `$["Total Accesses"]` Change per second
Apache: Total requests	The total number of the Apache server accesses.	Dependent item	apache.requests Preprocessing JSON Path: `$["Total Accesses"]`
Apache: Uptime	The service uptime expressed in seconds.	Dependent item	apache.uptime Preprocessing JSON Path: `$.ServerUptimeSeconds`
Apache: Version	The Apache service version.	Dependent item	apache.version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Apache: Total workers busy	The total number of busy worker threads/processes.	Dependent item	apache.workers_total.busy Preprocessing JSON Path: `$.BusyWorkers`
Apache: Total workers idle	The total number of idle worker threads/processes.	Dependent item	apache.workers_total.idle Preprocessing JSON Path: `$.IdleWorkers`
Apache: Workers closing connection	The number of workers in closing state.	Dependent item	apache.workers.closing Preprocessing JSON Path: `$.Workers.closing`
Apache: Workers DNS lookup	The number of workers in `dnslookup` state.	Dependent item	apache.workers.dnslookup Preprocessing JSON Path: `$.Workers.dnslookup`
Apache: Workers finishing	The number of workers in finishing state.	Dependent item	apache.workers.finishing Preprocessing JSON Path: `$.Workers.finishing`
Apache: Workers idle cleanup	The number of workers in cleanup state.	Dependent item	apache.workers.cleanup Preprocessing JSON Path: `$.Workers.cleanup`
Apache: Workers keepalive (read)	The number of workers in `keepalive` state.	Dependent item	apache.workers.keepalive Preprocessing JSON Path: `$.Workers.keepalive`
Apache: Workers logging	The number of workers in logging state.	Dependent item	apache.workers.logging Preprocessing JSON Path: `$.Workers.logging`
Apache: Workers reading request	The number of workers in reading state.	Dependent item	apache.workers.reading Preprocessing JSON Path: `$.Workers.reading`
Apache: Workers sending reply	The number of workers in sending state.	Dependent item	apache.workers.sending Preprocessing JSON Path: `$.Workers.sending`
Apache: Workers slot with no current process	The number of slots with no current process.	Dependent item	apache.workers.slot Preprocessing JSON Path: `$.Workers.slot`
Apache: Workers starting up	The number of workers in starting state.	Dependent item	apache.workers.starting Preprocessing JSON Path: `$.Workers.starting`
Apache: Workers waiting for connection	The number of workers in waiting state.	Dependent item	apache.workers.waiting Preprocessing JSON Path: `$.Workers.waiting`

Triggers

Name	Description	Expression	Severity
Apache: Failed to fetch status page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Apache by HTTP/apache.get_status,30m)=1`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Service is down		`last(/Apache by HTTP/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0`\|Average	Manual close: Yes
Apache: Service response time is too high		`min(/Apache by HTTP/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN}`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Host has been restarted	Uptime is less than 10 minutes.	`last(/Apache by HTTP/apache.uptime)<10m`\|Info	Manual close: Yes
Apache: Version has changed	Apache version has changed. Acknowledge to close the problem manually.	`last(/Apache by HTTP/apache.version,#1)<>last(/Apache by HTTP/apache.version,#2) and length(last(/Apache by HTTP/apache.version))>0`\|Info	Manual close: Yes

LLD rule Event MPM discovery

Name Description Type Key and additional info

Event MPM discovery

The discovery of additional metrics if the event Multi-Processing Module (MPM) is used.

For more details see Apache MPM event.

Dependent item

apache.mpm.event.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Event MPM discovery

Name	Description	Type	Key and additional info
Apache: Connections async closing	The number of asynchronous connections in closing state (applicable only to the event MPM).	Dependent item	apache.connections[async_closing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncClosing`
Apache: Connections async keepalive	The number of asynchronous connections in keepalive state (applicable only to the event MPM).	Dependent item	apache.connections[asynckeepalive{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncKeepAlive`
Apache: Connections async writing	The number of asynchronous connections in writing state (applicable only to the event MPM).	Dependent item	apache.connections[async_writing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncWriting`
Apache: Connections total	The number of total connections.	Dependent item	apache.connections[total{#SINGLETON}] Preprocessing JSON Path: `$.ConnsTotal`
Apache: Bytes per request	The average number of client requests per second.	Dependent item	apache.bytes[per_request{#SINGLETON}] Preprocessing JSON Path: `$.BytesPerReq`
Apache: Number of async processes	The number of asynchronous processes.	Dependent item	apache.process[num{#SINGLETON}] Preprocessing JSON Path: `$.Processes`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_apache_agent

View README Download JSON

Apache by Zabbix agent

Overview

This template is designed for the effortless deployment of Apache monitoring by Zabbix via Zabbix agent and doesn't require any external scripts. The template Apache by Zabbix agent - collects metrics by polling mod_status locally with Zabbix agent:

127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...

It also uses Zabbix agent to collect Apache Linux process statistics such as CPU usage, memory usage, and whether the process is running or not.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Apache 2.4.41

Configuration

Setup

See the setup instructions for mod_status.

Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module

This is an example configuration of the Apache web server:

<Location "/server-status">
  SetHandler server-status
  Require host example.com
</Location>

If you use another path, then do not forget to change the {$APACHE.STATUS.PATH} macro. Install and setup Zabbix agent.

Macros used

Name	Description	Default
{$APACHE.STATUS.HOST}	The hostname or IP address of the Apache status page.	`127.0.0.1`
{$APACHE.STATUS.PORT}	The port of the Apache status page.	`80`
{$APACHE.STATUS.PATH}	The URL path.	`server-status?auto`
{$APACHE.STATUS.SCHEME}	The request scheme, which may be either HTTP or HTTPS.	`http`
{$APACHE.RESPONSE_TIME.MAX.WARN}	The maximum Apache response time expressed in seconds for a trigger expression.	`10`
{$APACHE.PROCESS_NAME}	The process name filter for the Apache process discovery.	`(httpd\|apache2)`
{$APACHE.PROCESS.NAME.PARAMETER}	The process name of the Apache web server used in the item key `proc.get`. It could be specified if the correct process name is known.

Items

Name	Description	Type	Key and additional info
Apache: Get status	Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status.	Zabbix agent	web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing JavaScript: `The text is too long. Please see the template.`
Apache: Service ping		Zabbix agent	net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Apache: Service response time		Zabbix agent	net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"]
Apache: Total bytes	The total bytes served.	Dependent item	apache.bytes Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024`
Apache: Bytes per second	It is calculated as a rate of change for total bytes statistics. `BytesPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.bytes.rate Preprocessing JSON Path: `$["Total kBytes"]` Custom multiplier: `1024` Change per second
Apache: Requests per second	It is calculated as a rate of change for the "Total requests" statistics. `ReqPerSec` is not used, as it counts the average since the last Apache server start.	Dependent item	apache.requests.rate Preprocessing JSON Path: `$["Total Accesses"]` Change per second
Apache: Total requests	The total number of the Apache server accesses.	Dependent item	apache.requests Preprocessing JSON Path: `$["Total Accesses"]`
Apache: Uptime	The service uptime expressed in seconds.	Dependent item	apache.uptime Preprocessing JSON Path: `$.ServerUptimeSeconds`
Apache: Version	The Apache service version.	Dependent item	apache.version Preprocessing JSON Path: `$.ServerVersion` Discard unchanged with heartbeat: `1d`
Apache: Total workers busy	The total number of busy worker threads/processes.	Dependent item	apache.workers_total.busy Preprocessing JSON Path: `$.BusyWorkers`
Apache: Total workers idle	The total number of idle worker threads/processes.	Dependent item	apache.workers_total.idle Preprocessing JSON Path: `$.IdleWorkers`
Apache: Workers closing connection	The number of workers in closing state.	Dependent item	apache.workers.closing Preprocessing JSON Path: `$.Workers.closing`
Apache: Workers DNS lookup	The number of workers in `dnslookup` state.	Dependent item	apache.workers.dnslookup Preprocessing JSON Path: `$.Workers.dnslookup`
Apache: Workers finishing	The number of workers in finishing state.	Dependent item	apache.workers.finishing Preprocessing JSON Path: `$.Workers.finishing`
Apache: Workers idle cleanup	The number of workers in cleanup state.	Dependent item	apache.workers.cleanup Preprocessing JSON Path: `$.Workers.cleanup`
Apache: Workers keepalive (read)	The number of workers in `keepalive` state.	Dependent item	apache.workers.keepalive Preprocessing JSON Path: `$.Workers.keepalive`
Apache: Workers logging	The number of workers in logging state.	Dependent item	apache.workers.logging Preprocessing JSON Path: `$.Workers.logging`
Apache: Workers reading request	The number of workers in reading state.	Dependent item	apache.workers.reading Preprocessing JSON Path: `$.Workers.reading`
Apache: Workers sending reply	The number of workers in sending state.	Dependent item	apache.workers.sending Preprocessing JSON Path: `$.Workers.sending`
Apache: Workers slot with no current process	The number of slots with no current process.	Dependent item	apache.workers.slot Preprocessing JSON Path: `$.Workers.slot`
Apache: Workers starting up	The number of workers in starting state.	Dependent item	apache.workers.starting Preprocessing JSON Path: `$.Workers.starting`
Apache: Workers waiting for connection	The number of workers in waiting state.	Dependent item	apache.workers.waiting Preprocessing JSON Path: `$.Workers.waiting`
Apache: Get processes summary	The aggregated data of summary metrics for all processes.	Zabbix agent	proc.get[{$APACHE.PROCESS.NAME.PARAMETER},,,summary]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Apache: Host has been restarted	Uptime is less than 10 minutes.	`last(/Apache by Zabbix agent/apache.uptime)<10m`\|Info	Manual close: Yes
Apache: Version has changed	Apache version has changed. Acknowledge to close the problem manually.	`last(/Apache by Zabbix agent/apache.version,#1)<>last(/Apache by Zabbix agent/apache.version,#2) and length(last(/Apache by Zabbix agent/apache.version))>0`\|Info	Manual close: Yes

LLD rule Event MPM discovery

Name Description Type Key and additional info

Event MPM discovery

The discovery of additional metrics if the event Multi-Processing Module (MPM) is used.

For more details see Apache MPM event.

Dependent item

apache.mpm.event.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Event MPM discovery

Name	Description	Type	Key and additional info
Apache: Connections async closing	The number of asynchronous connections in closing state (applicable only to the event MPM).	Dependent item	apache.connections[async_closing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncClosing`
Apache: Connections async keepalive	The number of asynchronous connections in keepalive state (applicable only to the event MPM).	Dependent item	apache.connections[asynckeepalive{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncKeepAlive`
Apache: Connections async writing	The number of asynchronous connections in writing state (applicable only to the event MPM).	Dependent item	apache.connections[async_writing{#SINGLETON}] Preprocessing JSON Path: `$.ConnsAsyncWriting`
Apache: Connections total	The number of total connections.	Dependent item	apache.connections[total{#SINGLETON}] Preprocessing JSON Path: `$.ConnsTotal`
Apache: Bytes per request	The average number of client requests per second.	Dependent item	apache.bytes[per_request{#SINGLETON}] Preprocessing JSON Path: `$.BytesPerReq`
Apache: Number of async processes	The number of asynchronous processes.	Dependent item	apache.process[num{#SINGLETON}] Preprocessing JSON Path: `$.Processes`

LLD rule Apache process discovery

Name	Description	Type	Key and additional info
Apache process discovery	The discovery of the Apache process summary.	Dependent item	apache.proc.discovery

Item prototypes for Apache process discovery

Name	Description	Type	Key and additional info
Apache: CPU utilization	The percentage of the CPU utilization by a process {#APACHE.NAME}.	Zabbix agent	proc.cpu.util[{#APACHE.NAME}]
Apache: Get process data	The summary metrics aggregated by a process {#APACHE.NAME}.	Dependent item	apache.proc.get[{#APACHE.NAME}] Preprocessing JSON Path: `$.[?(@["name"]=="{#APACHE.NAME}")].first()` ⛔️Custom on fail: Set value to: `Failed to retrieve process {#APACHE.NAME} data`
Apache: Memory usage (rss)	The summary of resident set size memory used by a process {#APACHE.NAME} expressed in bytes.	Dependent item	apache.proc.rss[{#APACHE.NAME}] Preprocessing JSON Path: `$.rss` ⛔️Custom on fail: Discard value
Apache: Memory usage (vsize)	The summary of virtual memory used by a process {#APACHE.NAME} expressed in bytes.	Dependent item	apache.proc.vmem[{#APACHE.NAME}] Preprocessing JSON Path: `$.vsize` ⛔️Custom on fail: Discard value
Apache: Memory usage, %	The percentage of real memory used by a process {#APACHE.NAME}.	Dependent item	apache.proc.pmem[{#APACHE.NAME}] Preprocessing JSON Path: `$.pmem` ⛔️Custom on fail: Discard value
Apache: Number of running processes	The number of running processes {#APACHE.NAME}.	Dependent item	apache.proc.num[{#APACHE.NAME}] Preprocessing JSON Path: `$.processes` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Apache process discovery

Name	Description	Expression	Severity
Apache: Process is not running		`last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])=0`\|High
Apache: Service is down		`last(/Apache by Zabbix agent/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0`\|Average	Manual close: Yes
Apache: Failed to fetch status page	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Apache by Zabbix agent/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0`\|Warning	Manual close: Yes Depends on: Apache: Service is down
Apache: Service response time is too high		`min(/Apache by Zabbix agent/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0`\|Warning	Manual close: Yes Depends on: Apache: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_activemq_jmx

View README Download JSON

Apache ActiveMQ by JMX

Overview

This template is designed for the effortless deployment of Apache ActiveMQ monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Apache ActiveMQ 5.15.5

Configuration

Setup

Metrics are collected by JMX.

Enable and configure JMX access to Apache ActiveMQ. See documentation for instructions.
Set values in host macros {$ACTIVEMQ.USERNAME}, {$ACTIVEMQ.PASSWORD} and {$ACTIVEMQ.PORT}.

Macros used

Name	Description	Default
{$ACTIVEMQ.USER}	User for JMX	`admin`
{$ACTIVEMQ.PASSWORD}	Password for JMX	`activemq`
{$ACTIVEMQ.PORT}	Port for JMX	`1099`
{$ACTIVEMQ.LLD.FILTER.BROKER.MATCHES}	Filter of discoverable discovered brokers	`.*`
{$ACTIVEMQ.LLD.FILTER.BROKER.NOT_MATCHES}	Filter to exclude discovered brokers	`CHANGE IF NEEDED`
{$ACTIVEMQ.LLD.FILTER.DESTINATION.MATCHES}	Filter of discoverable discovered destinations	`.*`
{$ACTIVEMQ.LLD.FILTER.DESTINATION.NOT_MATCHES}	Filter to exclude discovered destinations	`CHANGE IF NEEDED`
{$ACTIVEMQ.MSG.RATE.WARN.TIME}	The time for message enqueue/dequeue rate. Can be used with destination or broker name as context.	`15m`
{$ACTIVEMQ.MEM.MAX.WARN}	Memory threshold for AVERAGE trigger. Can be used with destination or broker name as context.	`75`
{$ACTIVEMQ.MEM.MAX.HIGH}	Memory threshold for HIGH trigger. Can be used with destination or broker name as context.	`90`
{$ACTIVEMQ.MEM.TIME}	Time during which the metric can be above the threshold. Can be used with destination or broker name as context.	`5m`
{$ACTIVEMQ.STORE.MAX.WARN}	Storage threshold for AVERAGE trigger. Can be used with broker name as context.	`75`
{$ACTIVEMQ.STORE.TIME}	Time during which the metric can be above the threshold. Can be used with destination or broker name as context.	`5m`
{$ACTIVEMQ.STORE.MAX.HIGH}	Storage threshold for HIGH trigger. Can be used with broker name as context.	`90`
{$ACTIVEMQ.TEMP.MAX.WARN}	Temp threshold for AVERAGE trigger. Can be used with broker name as context.	`75`
{$ACTIVEMQ.TEMP.MAX.HIGH}	Temp threshold for HIGH trigger. Can be used with broker name as context.	`90`
{$ACTIVEMQ.TEMP.TIME}	Time during which the metric can be above the threshold. Can be used with destination or broker name as context.	`5m`
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME}	Time during which there may be no consumers in destination. Can be used with destination name as context.	`10m`
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH}	Minimum amount of consumers for destination. Can be used with destination name as context.	`1`
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME}	Time during which there may be no producers on destination. Can be used with destination name as context.	`10m`
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH}	Minimum amount of producers for destination. Can be used with destination name as context.	`1`
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME}	Time during which there may be no consumers on destination. Can be used with broker name as context.	`5m`
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH}	Minimum amount of consumers for broker. Can be used with broker name as context.	`1`
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME}	Time during which there may be no producers on broker. Can be used with broker name as context.	`5m`
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH}	Minimum amount of producers for broker. Can be used with broker name as context.	`1`
{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT}	Attribute for TotalConsumerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold.	`TotalConsumerCount`
{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT}	Attribute for TotalProducerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold.	`TotalProducerCount`
{$ACTIVEMQ.QUEUE.TIME}	Time during which the QueueSize can be higher than threshold. Can be used with destination name as context.	`10m`
{$ACTIVEMQ.QUEUE.WARN}	Threshold for QueueSize. Can be used with destination name as context.	`100`
{$ACTIVEMQ.QUEUE.ENABLED}	Use this to disable alerting for specific destination. 1 = enabled, 0 = disabled. Can be used with destination name as context.	`1`
{$ACTIVEMQ.EXPIRED.WARN}	Threshold for expired messages count. Can be used with destination name as context.	`0`

LLD rule Brokers discovery

Name	Description	Type	Key and additional info
Brokers discovery	Discovery of brokers	JMX agent	jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=*"]

Item prototypes for Brokers discovery

Name	Description	Type	Key and additional info
Broker {#JMXBROKERNAME}: Version	The version of the broker.	JMX agent	jmx[{#JMXOBJ},BrokerVersion] Preprocessing Discard unchanged with heartbeat: `3h`
Broker {#JMXBROKERNAME}: Uptime	The uptime of the broker.	JMX agent	jmx[{#JMXOBJ},UptimeMillis] Preprocessing Custom multiplier: `0.001`
Broker {#JMXBROKERNAME}: Memory limit	Memory limit, in bytes, used for holding undelivered messages before paging to temporary storage.	JMX agent	jmx[{#JMXOBJ},MemoryLimit] Preprocessing Discard unchanged with heartbeat: `1h`
Broker {#JMXBROKERNAME}: Memory usage in percents	Percent of memory limit used.	JMX agent	jmx[{#JMXOBJ}, MemoryPercentUsage]
Broker {#JMXBROKERNAME}: Storage limit	Disk limit, in bytes, used for persistent messages before producers are blocked.	JMX agent	jmx[{#JMXOBJ},StoreLimit] Preprocessing Discard unchanged with heartbeat: `1h`
Broker {#JMXBROKERNAME}: Storage usage in percents	Percent of store limit used.	JMX agent	jmx[{#JMXOBJ},StorePercentUsage]
Broker {#JMXBROKERNAME}: Temp limit	Disk limit, in bytes, used for non-persistent messages and temporary data before producers are blocked.	JMX agent	jmx[{#JMXOBJ},TempLimit] Preprocessing Discard unchanged with heartbeat: `1h`
Broker {#JMXBROKERNAME}: Temp usage in percents	Percent of temp limit used.	JMX agent	jmx[{#JMXOBJ},TempPercentUsage]
Broker {#JMXBROKERNAME}: Messages enqueue rate	Rate of messages that have been sent to the broker.	JMX agent	jmx[{#JMXOBJ},TotalEnqueueCount] Preprocessing Change per second
Broker {#JMXBROKERNAME}: Messages dequeue rate	Rate of messages that have been delivered by the broker and acknowledged by consumers.	JMX agent	jmx[{#JMXOBJ},TotalDequeueCount] Preprocessing Change per second
Broker {#JMXBROKERNAME}: Consumers count total	Number of consumers attached to this broker.	JMX agent	jmx[{#JMXOBJ},TotalConsumerCount]
Broker {#JMXBROKERNAME}: Producers count total	Number of producers attached to this broker.	JMX agent	jmx[{#JMXOBJ},TotalProducerCount]

Trigger prototypes for Brokers discovery

Name	Description	Expression	Severity
Broker {#JMXBROKERNAME}: Version has been changed	The Broker {#JMXBROKERNAME} version has changed. Acknowledge to close the problem manually.	`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#1)<>last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#2) and length(last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion]))>0`\|Info	Manual close: Yes
Broker {#JMXBROKERNAME}: Broker has been restarted	Uptime is less than 10 minutes.	`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},UptimeMillis])<10m`\|Info	Manual close: Yes
Broker {#JMXBROKERNAME}: Memory usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXBROKERNAME}"}`\|Average	Depends on: Broker {#JMXBROKERNAME}: Memory usage is too high
Broker {#JMXBROKERNAME}: Memory usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXBROKERNAME}"}`\|High
Broker {#JMXBROKERNAME}: Storage usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.WARN:"{#JMXBROKERNAME}"}`\|Average	Depends on: Broker {#JMXBROKERNAME}: Storage usage is too high
Broker {#JMXBROKERNAME}: Storage usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.HIGH:"{#JMXBROKERNAME}"}`\|High
Broker {#JMXBROKERNAME}: Temp usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.WARN}`\|Average	Depends on: Broker {#JMXBROKERNAME}: Temp usage is too high
Broker {#JMXBROKERNAME}: Temp usage is too high		`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.HIGH}`\|High
Broker {#JMXBROKERNAME}: Message enqueue rate is higher than dequeue rate	Enqueue rate is higher than dequeue rate. It may indicate performance problems.	`avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})`\|Average
Broker {#JMXBROKERNAME}: Consumers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalConsumerCount],{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|High
Broker {#JMXBROKERNAME}: Producers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalProducerCount],{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|High

LLD rule Destinations discovery

Name	Description	Type	Key and additional info
Destinations discovery	Discovery of destinations	JMX agent	jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=,destinationType=,destinationName=*"]

Item prototypes for Destinations discovery

Name	Description	Type	Key and additional info
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count	Number of consumers attached to this destination.	JMX agent	jmx[{#JMXOBJ},ConsumerCount]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count total on {#JMXBROKERNAME}	Number of consumers attached to the broker of this destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold.	JMX agent	jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing In range: `0 -> {$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH}` ⛔️Custom on fail: Set value to: `{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH}` Discard unchanged with heartbeat: `3h`
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count	Number of producers attached to this destination.	JMX agent	jmx[{#JMXOBJ},ProducerCount]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count total on {#JMXBROKERNAME}	Number of producers attached to the broker of this destination. Used to suppress destination's triggers when the count of producers on the broker is lower than threshold.	JMX agent	jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing In range: `0 -> {$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH}` ⛔️Custom on fail: Set value to: `{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH}` Discard unchanged with heartbeat: `3h`
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage in percents	The percentage of the memory limit used.	JMX agent	jmx[{#JMXOBJ},MemoryPercentUsage]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages enqueue rate	Rate of messages that have been sent to the destination.	JMX agent	jmx[{#JMXOBJ},EnqueueCount] Preprocessing Change per second
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages dequeue rate	Rate of messages that has been acknowledged (and removed) from the destination.	JMX agent	jmx[{#JMXOBJ},DequeueCount] Preprocessing Change per second
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size	Number of messages on this destination, including any that have been dispatched but not acknowledged.	JMX agent	jmx[{#JMXOBJ},QueueSize]
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count	Number of messages that have been expired.	JMX agent	jmx[{#JMXOBJ},ExpiredCount] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Destinations discovery

Name	Description	Expression	Severity
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ConsumerCount],{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|Average	Manual close: Yes
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count is too low		`max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ProducerCount],{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"}`\|Average	Manual close: Yes
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high		`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXDESTINATIONNAME}"}`\|Average
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high		`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXDESTINATIONNAME}"}`\|High
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Message enqueue rate is higher than dequeue rate	Enqueue rate is higher than dequeue rate. It may indicate performance problems.	`avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},EnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},DequeueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})`\|Average
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size is high	Queue size is higher than threshold. It may indicate performance problems.	`min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},QueueSize],{$ACTIVEMQ.QUEUE.TIME:"{#JMXDESTINATIONNAME}"})>{$ACTIVEMQ.QUEUE.WARN:"{#JMXDESTINATIONNAME}"} and {$ACTIVEMQ.QUEUE.ENABLED:"{#JMXDESTINATIONNAME}"}=1`\|Average
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count is high	This metric represents the number of messages that expired before they could be delivered. If you expect all messages to be delivered and acknowledged within a certain amount of time, you can set an expiration for each message, and investigate if your ExpiredCount metric rises above zero.	`last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ExpiredCount])>{$ACTIVEMQ.EXPIRED.WARN:"{#JMXDESTINATIONNAME}"}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

app

app_acronis_cyber_protect_cloud_http

View README Download JSON

Acronis Cyber Protect Cloud by HTTP

Overview

This template is designed for the effortless deployment of Acronis Cyber Protect Cloud monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Acronis Cloud Platform version 23.07

Configuration

Setup

This is a master template that needs to be assigned to a host, and it will automatically create MSP host prototype, which will monitor Acronis Cyber Protect Cloud metrics.

Before using this template it is required to create a new MSP-level API client for Zabbix to use. To do that, sign into your Acronis Cyber Protect Cloud WEB interface, navigate to Settings -> API clients and create new API client. You will be shown credentials for this API client. These credentials need to be entered in the following user macros of this template:

{$ACRONIS.CPC.AUTH.CLIENT.ID} - enter Client ID here;
{$ACRONIS.CPC.AUTH.SECRET} - enter Secret here;
{$ACRONIS.CPC.DATACENTER.URL} - enter Data center URL

This is all the configuration needed for this integration.

Macros used

Name	Description	Default
{$ACRONIS.CPC.DATACENTER.URL}	Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com.
{$ACRONIS.CPC.AUTH.INTERVAL}	API token regeneration interval, in minutes. By default, Acronis Cyber Protect Cloud tokens expire after 2 hours.	`110m`
{$ACRONIS.CPC.HTTP.PROXY}	Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used.
{$ACRONIS.CPC.AUTH.CLIENT.ID}	Client ID for API user access.
{$ACRONIS.CPC.AUTH.SECRET}	Secret for API user access.
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT}	Sub-path for the Account Management API.	`/api/2`

Items

Name Description Type Key and additional info

Acronis CPC: Get access token

Authorizes API user and receives access token.

HTTP agent

acronis.cpc.accountmanager.gettoken

Preprocessing

JavaScript: The text is too long. Please see the template.

LLD rule Acronis CPC: MSP Discovery

Name	Description	Type	Key and additional info
Acronis CPC: MSP Discovery	Discovers MSP and creates host prototype based on that.	Dependent item	acronis.cpc.lld.msp_discovery

Acronis Cyber Protect Cloud MSP by HTTP

Overview

This template is designed for the effortless deployment of Acronis Cyber Protect Cloud MSP monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Acronis Cloud Platform version 23.07

Configuration

Setup

This template is not meant to be used independently. A host with the Acronis Cyber Protect Cloud by HTTP template will request API token and automatically create a host prototype with this template assigned to it.

If needed, you can specify an HTTP proxy for the template to use by changing the value of {$ACRONIS.CPC.HTTP.PROXY} user macro.

Device discovery trigger prototypes that check services which have failed to run, have trigger time offset user macros:

{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}

Using these macros, their respective triggers can be offset in both directions. For example, if you wish to make sure that the trigger fires only when the current time is at least 3 minutes over the next scheduled antimalware scan, then set the value of {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE} user macro to -180. This is the default behaviour.

Macros used

Name	Description	Default
{$ACRONIS.CPC.DATACENTER.URL}	Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com.
{$ACRONIS.CPC.HTTP.PROXY}	Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used.
{$ACRONIS.CPC.CYBERFIT.WARN}	CyberFit score threshold for "warning" severity trigger.	`669`
{$ACRONIS.CPC.CYBERFIT.HIGH}	CyberFit score threshold for "high" severity trigger.	`579`
{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}	Offset time in seconds for scheduled antimalware scan trigger check.	`-180`
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}	Offset time in seconds for scheduled backup run trigger check.	`-180`
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}	Offset time in seconds for scheduled vulnerability assessment run trigger check.	`-180`
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}	Offset time in seconds for scheduled patch management run trigger check.	`-180`
{$ACRONIS.CPC.DEVICE.RESOURCE.TYPE}	Comma separated list of resource types for devices retrieval.	`resource.machine`
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.MATCHES}	Sets the alert category regex filter to use in alert discovery for including.	`.*`
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.NOT_MATCHES}	Sets the alert category regex filter to use in alert discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.MATCHES}	Sets the alert severity regex filter to use in alert discovery for including.	`.*`
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.NOT_MATCHES}	Sets the alert severity regex filter to use in alert discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.MATCHES}	Sets the alert resource name regex filter to use in alert discovery for including.	`.*`
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.NOT_MATCHES}	Sets the alert resource name regex filter to use in alert discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.KIND.MATCHES}	Sets the customer name regex filter to use in customer discovery for including.	`customer`
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.MATCHES}	Sets the customer name regex filter to use in customer discovery for including.	`.*`
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.NOT_MATCHES}	Sets the customer name regex filter to use in customer discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.MATCHES}	Sets the tenant name regex filter to use in device discovery for including.	`.*`
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.NOT_MATCHES}	Sets the tenant name regex filter to use in device discovery for excluding.	`CHANGE_IF_NEEDED`
{$ACRONIS.CPC.ACCESS_TOKEN}	API access token.
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT}	Sub-path for the Account Management API.	`/api/2`
{$ACRONIS.CPC.PATH.RESOURCE.MANAGEMENT}	Sub-path for the Resource Management API.	`/api/resource_management/v4`
{$ACRONIS.CPC.PATH.ALERTS}	Sub-path for the Alerts API.	`/api/alert_manager/v1`
{$ACRONIS.CPC.PATH.AGENTS}	Sub-path for the Agents API.	`/api/agent_manager/v2`
{$ACRONIS.CPC.MSP.TENANT.UUID}	UUID for MSP.

Items

Name	Description	Type	Key and additional info
Acronis CPC: Register integration	Registers integration on Acronis services.	Script	acronis.cpc.register.integration
Acronis CPC: Get alerts	Fetches all alerts.	HTTP agent	acronis.cpc.alerts.get Preprocessing JSON Path: `$.items`
Acronis CPC: Get customers	Fetches all customers.	HTTP agent	acronis.cpc.customers.get Preprocessing JSON Path: `$.items`
Acronis CPC: Get devices	Fetches all devices.	HTTP agent	acronis.cpc.devices.get Preprocessing JSON Path: `$.items`
Acronis CPC: Alerts with "ok" severity	Gets count of alerts with "ok" severity.	Dependent item	acronis.cpc.alerts.severity.ok Preprocessing JSON Path: `$..[?(@.severity == 'ok')].length()` Discard unchanged with heartbeat: `1h`
Acronis CPC: Alerts with "warning" severity	Gets count of alerts with "warning" severity.	Dependent item	acronis.cpc.alerts.severity.warn Preprocessing JSON Path: `$..[?(@.severity == 'warning')].length()` Discard unchanged with heartbeat: `1h`
Acronis CPC: Alerts with "error" severity	Gets count of alerts with "error" severity.	Dependent item	acronis.cpc.alerts.severity.err Preprocessing JSON Path: `$..[?(@.severity == 'error')].length()` Discard unchanged with heartbeat: `1h`
Acronis CPC: Alerts with "critical" severity	Gets count of alerts with "critical" severity.	Dependent item	acronis.cpc.alerts.severity.crit Preprocessing JSON Path: `$..[?(@.severity == 'critical')].length()` Discard unchanged with heartbeat: `1h`
Acronis CPC: Alerts with "information" severity	Gets count of alerts with "information" severity.	Dependent item	acronis.cpc.alerts.severity.info Preprocessing JSON Path: `$..[?(@.severity == 'information')].length()` Discard unchanged with heartbeat: `1h`

LLD rule Acronis CPC: Alerts discovery

Name Description Type Key and additional info

Acronis CPC: Alerts discovery

Discovers alerts.

Dependent item

acronis.cpc.alerts.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Acronis CPC: Alerts discovery

Name Description Type Key and additional info

Alert [{#TYPE}]:[{#ALERT_ID}]: Alert severity

Severity for the alert.

Dependent item

acronis.cpc.alert.severity[{#ALERT_ID}]

Preprocessing

JSON Path: $[?(@.id == "{#ALERT_ID}")].severity.first()
⛔️Custom on fail: Set error to: Could not find alert severity
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Acronis CPC: Alerts discovery

Name	Description	Expression	Severity
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "critical" severity	Alert has "critical" severity.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=3`\|High	Manual close: Yes
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "error" severity	Alert has "error" severity.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=2`\|Average	Manual close: Yes Depends on: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "critical" severity
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "warning" severity	Alert has "warning" severity.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=1`\|Warning	Manual close: Yes Depends on: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "error" severity

LLD rule Acronis CPC: Customer discovery

Name Description Type Key and additional info

Acronis CPC: Customer discovery

Discovers customers.

Dependent item

acronis.cpc.customer.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Acronis CPC: Customer discovery

Name Description Type Key and additional info

Customer [{#NAME}]: Enabled status

Enabled status for customer (true or false).

Dependent item

acronis.cpc.customer.status[{#NAME}]

Preprocessing

JSON Path: $[?(@.name == "{#NAME}")].enabled.first()
⛔️Custom on fail: Set error to: Could not find customer status
Boolean to decimal
Discard unchanged with heartbeat: 1h

LLD rule Acronis CPC: Device discovery

Name Description Type Key and additional info

Acronis CPC: Device discovery

Discovers devices.

Dependent item

acronis.cpc.device.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Acronis CPC: Device discovery

Name	Description	Type	Key and additional info
Device [{#NAME}]:[{#ID}]: Raw data resources status	Gets statuses for device resources.	HTTP agent	acronis.cpc.device.res.status.raw[{#NAME}] Preprocessing JSON Path: `$.items[0]` ⛔️Custom on fail: Set error to: `Could not parse resource status data`
Device [{#NAME}]:[{#ID}]: CyberFit score	Acronis "CyberFit" score for the device. Value of "-1" is assigned if "CyberFit" could not be found for device.	Dependent item	acronis.cpc.device.cyberfit[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `-1` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Agent version	Agent version for the device.	Dependent item	acronis.cpc.device.agent.version[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set error to: `Could not parse agent version` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Agent enabled	Agent status (enabled or disabled) for the device.	Dependent item	acronis.cpc.device.agent.enabled[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set error to: `Could not parse agent status` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Agent online	Agent reachability for the device.	Dependent item	acronis.cpc.device.agent.online[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set error to: `Could not parse agent reachability status` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Protection status	Protection status for device.	Dependent item	acronis.cpc.device.protection.status[{#NAME}] Preprocessing JSON Path: `$.aggregate.status` ⛔️Custom on fail: Set error to: `Could not parse protection status` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Protection plan name	Protection plan name for device.	Dependent item	acronis.cpc.device.protection.name[{#NAME}] Preprocessing JSON Path: `$.aggregate.names` ⛔️Custom on fail: Set error to: `Could not parse protection plan name` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful antimalware protection scan	Previous successful antimalware protection scan for device.	Dependent item	acronis.cpc.device.protection.scan.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous antimalware protection scan	Previous antimalware protection scan for device.	Dependent item	acronis.cpc.device.protection.scan.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next antimalware protection scan	Next scheduled antimalware protection scan for device.	Dependent item	acronis.cpc.device.protection.scan.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful machine backup run	Previous successful machine backup run for device.	Dependent item	acronis.cpc.device.backup.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous machine backup run	Previous machine backup run for device.	Dependent item	acronis.cpc.device.backup.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next machine backup run	Next scheduled machine backup run for device.	Dependent item	acronis.cpc.device.backup.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful vulnerability assessment	Previous successful vulnerability assessment for device.	Dependent item	acronis.cpc.device.vuln.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment	Previous vulnerability assessment for device.	Dependent item	acronis.cpc.device.vuln.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next vulnerability assessment	Next scheduled vulnerability assessment for device.	Dependent item	acronis.cpc.device.vuln.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous successful patch management run	Previous successful patch management run for device.	Dependent item	acronis.cpc.device.patch.prev.ok[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Previous patch management run	Previous patch management run for device.	Dependent item	acronis.cpc.device.patch.prev[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Device [{#NAME}]:[{#ID}]: Next patch management run	Next scheduled patch management run for device.	Dependent item	acronis.cpc.device.patch.next[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Acronis CPC: Device discovery

Name	Description	Expression	Severity
Device [{#NAME}]:[{#ID}]: CyberFit score critical	CyberFit score for this device is critical for at least 3 minutes.	`min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.HIGH} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1`\|High	Manual close: Yes
Device [{#NAME}]:[{#ID}]: CyberFit score low	CyberFit score for this device is low for at least 3 minutes.	`min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.WARN} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1`\|Warning	Manual close: Yes Depends on: Device [{#NAME}]:[{#ID}]: CyberFit score critical
Device [{#NAME}]:[{#ID}]: Agent disabled	Agent for this device is disabled for at least 3 minutes.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.agent.enabled[{#NAME}],3m) < 1`\|Info	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Protection status "error"	Device has "error" protection status.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="error"`\|Average	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Protection status "warning"	Device has "warning" protection status.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="warning"`\|Warning	Manual close: Yes Depends on: Device [{#NAME}]:[{#ID}]: Protection status "error"
Device [{#NAME}]:[{#ID}]: Previous protection scan not successful	Device has "error" protection status.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev.ok[{#NAME}])<>last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev[{#NAME}])`\|Average	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Scheduled antimalware scan failed to run	Scheduled antimalware scan failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE})`\|Warning	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Previous machine backup run not successful	Previous machine backup did not run successfully.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev[{#NAME}],1m)`\|Average	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Scheduled machine backup failed to run	Scheduled machine backup failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP})`\|Warning	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment not successful	Previous vulnerability assessment did not run successfully.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev[{#NAME}],1m)`\|Average	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Scheduled vulnerability assessment failed to run	Scheduled vulnerability assessment failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY})`\|Warning	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Previous patch management run not successful	Previous patch management run did not run successfully.	`max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev[{#NAME}],1m)`\|Average	Manual close: Yes
Device [{#NAME}]:[{#ID}]: Scheduled patch management failed to run	Scheduled patch management failed to run.	`last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH})`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums