cloud

cloud_oracle_http

Oracle Cloud by HTTP

Overview

This template is designed as a master template that discovers various Oracle Cloud Infrastructure (OCI) services and resources, such as:

OCI Compute;
OCI Autonomous Database (serverless);
OCI Object Storage;
OCI Virtual Cloud Networks (VCNs);
OCI Block Volumes;
OCI Boot Volumes.

For communication with OCI, this template utilizes script items which execute HTTP GET and POST requests. POST requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses an HTTP request body for queries.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

Required setup

For this template to work, it needs authentication details to use in requests. To acquire this information, see the following steps:

Log into your administrator account in Oracle Cloud Console.
Create a new user that will be used by Zabbix for monitoring.
Create a new security policy and assign a previously created user to it.

This policy will contain a set of rules that will give monitoring user access to specific resources in your OCI. Make sure to add the following rules to the policy:

Allow group 'zabbix_api' to read metrics in tenancy
Allow group 'zabbix_api' to read instances in tenancy
Allow group 'zabbix_api' to read subnets in tenancy
Allow group 'zabbix_api' to read vcns in tenancy
Allow group 'zabbix_api' to read vnic-attachments in tenancy
Allow group 'zabbix_api' to read volumes in tenancy
Allow group 'zabbix_api' to read objectstorage-namespaces in tenancy
Allow group 'zabbix_api' to read buckets in tenancy
Allow group 'zabbix_api' to read autonomous-databases in tenancy

In this example, zabbix_api is the name of the previously created monitoring user. Rename it to your monitoring user's name.

Generate an API key pair for your monitoring user - open your monitoring user profile and on the left side, press API keys and then, Add API key (if generating a new key pair, do not forget to save the private key).
After this, Oracle Cloud Console will provide additional information that is required for access, such as:
- Tenancy OCID;
- User OCID;
- Fingerprint;
- Region.
Save this information somewhere or keep this window open. This information will be required in later steps.
In Zabbix, create a new host and assign this template to it (Oracle Cloud by HTTP).
Open the Macros section of the host you created and set the following user macro values according to the OCI configuration file (from step #6):
- {$OCI.API.TENANCY} - set the tenancy OCID value;
- {$OCI.API.USER} - set the user OCID value;
- {$OCI.API.FINGERPRINT} - set the fingerprint value;
- {$OCI.API.PRIVATE.KEY} - copy and paste the contents of private key file here.
After the authentication credentials are entered, you need to identify the OCI API endpoints that match your region (as provided by Oracle Cloud Console in step #6). To do so, you can use the OCI API Reference and Endpoints list, where each API service has a dedicated page with the respective API endpoints.

The required API service endpoints are:
When the API endpoints are identified, you need to set them in Zabbix as user macros to the host that the template is attached to (similarly to step #8):
- {$OCI.API.CORE.HOST} - Core Services API endpoint, for example, iaas.eu-stockholm-1.oraclecloud.com;
- {$OCI.API.AUTONOMOUS.DB.HOST} - Database Service API endpoint, for example, database.eu-stockholm-1.oraclecloud.com;
- {$OCI.API.OBJECT.STORAGE.HOST} - Object Storage Service API endpoint, for example, objectstorage.eu-stockholm-1.oraclecloud.com;
- {$OCI.API.TELEMETRY.HOST} - Monitoring API endpoint, for example, telemetry.eu-stockholm-1.oraclecloud.com;
IMPORTANT! API Endpoint URLs need to be entered without the HTTP scheme (https://).
Once you've completed adding the host to Zabbix, and it will automatically discover services and monitor them.

Optional setup

LLD resource filtering by free-form tags of OCI resources

Every LLD rule has pre-added filtering options to avoid discovering unwanted resources, such as terminated OCI compute instances. Most of these filters use specific service item names and states, and values of these filters are defined by the user macros {$....MATCHES} and {$....NOT_MATCHES}.

To add additional filtering options, every discovery script (except VCN discovery), gathers free-form tag data about a specific resource. Since free-form tags are completely custom and format or usage will vary between users, free-from tag filters are not included under LLD filters by default, but can be easily added as they are already being collected by scripts.

Example

In Oracle Cloud Console, add a free-form tag to a resource, for example, a compute instance. The tag key will be location_group and the tag value will be eu-north-1.
Open the Oracle Cloud by HTTP template in Zabbix and go to "Discovery rules". Find "Compute instances discovery" and open it.
Under "LLD macros", add a new macro that will represent this location group tag, for example: {#LOCATION_GROUP} $.tags.location_group.
Under the "Filters" tab, there will already be filters regarding the compute instance name and state. Click "Add" to add a new filter and define the previously created LLD macro and add a matching pattern and value, for example, {#LOCATION_GROUP} matches eu-north-*.
The next time Compute instances discovery is executed, it will only discover OCI compute instances that have the free-form tag location_group that matches the regex of eu-north-*. You can also experiment with the LLD filter pattern matching value to receive different matching results for a specified value.

HTTP proxy usage

If needed, you can specify an HTTP proxy for the template by changing the value of the {$OCI.HTTP.PROXY} user macro.

Custom OK HTTP response

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}.

LLD filter value changing

LLD filter values and trigger threshold values can be changed with the respective user macros.

Macros used

Name	Description	Default
{$OCI.API.CORE.HOST}	Host for OCI Core Services API endpoint.
{$OCI.API.TELEMETRY.HOST}	Host for OCI Monitoring API endpoint.
{$OCI.API.OBJECT.STORAGE.HOST}	Host for OCI Object Storage API endpoint.
{$OCI.API.AUTONOMOUS.DB.HOST}	Host for OCI Autonomous Database API endpoint.
{$OCI.API.COMPARTMENT.COMPUTE}	Compartment OCIDs for compute instances. Can be a single value or a comma separated list of values.
{$OCI.API.COMPARTMENT.VCN}	Compartment OCIDs for virtual cloud networks. Can be a single value or a comma separated list of values.
{$OCI.API.COMPARTMENT.VOLUME.BLOCK}	Compartment OCIDs for block volumes. Can be a single value or a comma separated list of values.
{$OCI.API.COMPARTMENT.VOLUME.BOOT}	Compartment OCIDs for boot volumes. Can be a single value or a comma separated list of values.
{$OCI.API.COMPARTMENT.OBJECT.STORAGE}	Compartment OCIDs for object storage buckets. Can be a single value or a comma separated list of values.
{$OCI.API.COMPARTMENT.AUTONOMOUS.DB}	Compartment OCIDs for autonomous databases. Can be a single value or a comma separated list of values.
{$OCI.API.TENANCY}	OCID of tenancy.
{$OCI.API.USER}	OCID of user.
{$OCI.API.PRIVATE.KEY}	Entire private key for API access.
{$OCI.API.FINGERPRINT}	Fingerprint of private key.
{$OCI.COMPUTE.DISCOVERY.STATE.MATCHES}	Sets the regex string of compute instance states to allow in discovery.	`.*`
{$OCI.COMPUTE.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of compute instance states to ignore in discovery.	`TERMINATED`
{$OCI.COMPUTE.DISCOVERY.NAME.MATCHES}	Sets the regex string of compute instance names to allow in discovery.	`.*`
{$OCI.COMPUTE.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of compute instance names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VCN.DISCOVERY.STATE.MATCHES}	Sets the regex string of virtual cloud network states to allow in discovery.	`.*`
{$OCI.VCN.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of virtual cloud network states to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VCN.DISCOVERY.NAME.MATCHES}	Sets the regex string of virtual cloud network names to allow in discovery.	`.*`
{$OCI.VCN.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of virtual cloud network names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VOLUME.BLOCK.DISCOVERY.STATE.MATCHES}	Sets the regex string of block volume states to allow in discovery.	`.*`
{$OCI.VOLUME.BLOCK.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of block volume states to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VOLUME.BLOCK.DISCOVERY.NAME.MATCHES}	Sets the regex string of block volume names to allow in discovery.	`.*`
{$OCI.VOLUME.BLOCK.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of block volume names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VOLUME.BOOT.DISCOVERY.STATE.MATCHES}	Sets the regex string of boot volume states to allow in discovery.	`.*`
{$OCI.VOLUME.BOOT.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of boot volume states to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VOLUME.BOOT.DISCOVERY.NAME.MATCHES}	Sets the regex string of boot volume names to allow in discovery.	`.*`
{$OCI.VOLUME.BOOT.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of boot volume names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.OBJECT.STORAGE.DISCOVERY.NAME.MATCHES}	Set an HTTP proxy for OCI API requests if needed.	`.*`
{$OCI.OBJECT.STORAGE.DISCOVERY.NAME.NOT_MATCHES}	Set an HTTP proxy for OCI API requests if needed.	`CHANGE_IF_NEEDED`
{$OCI.AUTONOMOUS.DB.DISCOVERY.STATE.MATCHES}	Sets the regex string of autonomous database states to allow in discovery.	`.*`
{$OCI.AUTONOMOUS.DB.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of autonomous database states to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.AUTONOMOUS.DB.DISCOVERY.NAME.MATCHES}	Sets the regex string of autonomous database names to allow in discovery.	`.*`
{$OCI.AUTONOMOUS.DB.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of autonomous database names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`

LLD rule Compute instances discovery

Name	Description	Type	Key and additional info
Compute instances discovery	Discover compute instances.	Script	oci.compute.discovery

LLD rule Virtual cloud networks discovery

Name	Description	Type	Key and additional info
Virtual cloud networks discovery	Discover virtual cloud networks (VCNs).	Script	oci.vcn.discovery

LLD rule Block volumes discovery

Name	Description	Type	Key and additional info
Block volumes discovery	Discover block volumes.	Script	oci.block.volumes.discovery

LLD rule Boot volumes discovery

Name	Description	Type	Key and additional info
Boot volumes discovery	Discover boot volumes.	Script	oci.boot.volumes.discovery

LLD rule Object storage discovery

Name	Description	Type	Key and additional info
Object storage discovery	Discover object storage.	Script	oci.object.storage.discovery

LLD rule Autonomous database discovery

Name	Description	Type	Key and additional info
Autonomous database discovery	Discover autonomous databases.	Script	oci.object.autonomous.db.discovery

Oracle Cloud Compute by HTTP

Overview

This template monitors Oracle Cloud Infrastructure (OCI) single compute instance resources and discovers attached virtual network interface cards (VNICs) and monitors their resources.

This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

This template is not meant to be used independently. A host with the Oracle Cloud by HTTP template will discover OCI compute instances automatically, create host prototypes for each discovered instance, and apply it to this template.

If needed, you can specify an HTTP proxy for the template to use by changing the value of the {$OCI.HTTP.PROXY} user macro.

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}.

LLD filter values and trigger threshold values can be changed with the respective user macros.

Macros used

Name	Description	Default
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`
{$OCI.COMPUTE.VNIC.DISCOVERY.STATE.MATCHES}	Sets the regex string of VNIC states to allow in discovery.	`.*`
{$OCI.COMPUTE.VNIC.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of VNIC states to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.COMPUTE.VNIC.DISCOVERY.NAME.MATCHES}	Sets the regex string of VNIC names to allow in discovery.	`.*`
{$OCI.COMPUTE.VNIC.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of VNIC names to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.COMPUTE.CPU.UTIL.WARN}	Sets the percentage threshold for creating a "warning" severity event about CPU resource utilization.	`75`
{$OCI.COMPUTE.CPU.UTIL.HIGH}	Sets the percentage threshold for creating a "high" severity event about CPU resource utilization.	`90`
{$OCI.COMPUTE.MEM.UTIL.WARN}	Sets the percentage threshold for creating a "warning" severity event about memory resource utilization.	`75`
{$OCI.COMPUTE.MEM.UTIL.HIGH}	Sets the percentage threshold for creating a "high" severity event about memory resource utilization.	`90`
{$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.WARN}	Sets the percentage threshold for creating a "warning" severity event about VNIC connection tracking table utilization.	`75`
{$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.HIGH}	Sets the percentage threshold for creating a "high" severity event about VNIC connection tracking table utilization.	`90`

Items

Name	Description	Type	Key and additional info
Get instance availability	The accessibility status of a virtual machine instance. A value of "1" indicates that the instance is unresponsive due to an issue with the infrastructure or the instance itself. A value of "0" indicates that an accessibility issue has not been detected. If the instance is stopped, then the metric does not have a value.	Script	oci.compute.availability.get
State	The current state of the instance.	Script	oci.compute.state.get Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get VNICs	Gets information about all virtual network interface cards attached to the instance.	Script	oci.compute.vnic.get
Get compute metrics	Gets compute instance metrics.	Script	oci.compute.metrics.get
CPU utilization, in %	Activity level from the CPU. Expressed as a percentage of the total time.	Dependent item	oci.compute.cpu.util Preprocessing JSON Path: `$.CpuUtilization` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Memory utilization, in %	Space currently in use, measured in pages. Expressed as a percentage of used pages.	Dependent item	oci.compute.mem.util Preprocessing JSON Path: `$.MemoryUtilization` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Memory allocation stalls	Number of times page reclaim was called directly.	Dependent item	oci.compute.mem.stalls Preprocessing JSON Path: `$.MemoryAllocationStalls` Discard unchanged with heartbeat: `1h`
Load average	Average system load calculated over a 1-minute period. Expressed as a number of processes.	Dependent item	oci.compute.load.avg Preprocessing JSON Path: `$.LoadAverage` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Disk bytes read	Read throughput. Expressed as bytes read per interval.	Dependent item	oci.compute.disk.read Preprocessing JSON Path: `$.DiskBytesRead` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Disk bytes written	Write throughput. Expressed as bytes written per interval.	Dependent item	oci.compute.disk.written Preprocessing JSON Path: `$.DiskBytesWritten` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Disk read I/O	Activity level from I/O reads. Expressed as reads per interval.	Dependent item	oci.compute.disk.io.read Preprocessing JSON Path: `$.DiskIopsRead` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Disk write I/O	Activity level from I/O writes. Expressed as writes per interval.	Dependent item	oci.compute.disk.io.write Preprocessing JSON Path: `$.DiskIopsWritten` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Network bytes in	Network bytes in for the compute instance.	Dependent item	oci.compute.network.in Preprocessing JSON Path: `$.NetworksBytesIn` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Network bytes out	Network bytes out for the compute instance.	Dependent item	oci.compute.network.out Preprocessing JSON Path: `$.NetworksBytesOut` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
OCI Compute: Compute instance is not available	Current instance availability.	`last(/Oracle Cloud Compute by HTTP/oci.compute.availability.get) = 1`\|High
OCI Compute: State has changed	Compute instance state has changed.	`last(/Oracle Cloud Compute by HTTP/oci.compute.state.get,#1)<>last(/Oracle Cloud Compute by HTTP/oci.compute.state.get,#2)`\|Info	Manual close: Yes
OCI Compute: Current CPU utilization is too high	Current CPU utilization has exceeded `{$OCI.COMPUTE.CPU.UTIL.HIGH}`% of the max available value.	`min(/Oracle Cloud Compute by HTTP/oci.compute.cpu.util,5m) >= {$OCI.COMPUTE.CPU.UTIL.HIGH}`\|High
OCI Compute: Current CPU utilization is high	Current CPU utilization has exceeded `{$OCI.COMPUTE.CPU.UTIL.WARN}`% of the max available value.	`min(/Oracle Cloud Compute by HTTP/oci.compute.cpu.util,5m) >= {$OCI.COMPUTE.CPU.UTIL.WARN}`\|Warning	Depends on: OCI Compute: Current CPU utilization is too high
OCI Compute: Current memory utilization is too high	Current memory utilization has exceeded `{$OCI.COMPUTE.MEM.UTIL.HIGH}`% of the max available value.	`min(/Oracle Cloud Compute by HTTP/oci.compute.mem.util,5m) >= {$OCI.COMPUTE.MEM.UTIL.HIGH}`\|High
OCI Compute: Current memory utilization is high	Current memory utilization has exceeded `{$OCI.COMPUTE.MEM.UTIL.WARN}`% of the max available value.	`min(/Oracle Cloud Compute by HTTP/oci.compute.mem.util,5m) >= {$OCI.COMPUTE.MEM.UTIL.WARN}`\|Warning	Depends on: OCI Compute: Current memory utilization is too high

LLD rule VNIC discovery

Name Description Type Key and additional info

VNIC discovery

Name	Description	Type	Key and additional info
VNIC discovery	Discover compute instance VNICs.	Dependent item	oci.compute.vnic.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discover compute instance VNICs.

Dependent item

oci.compute.vnic.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for VNIC discovery

Name	Description	Type	Key and additional info
VNIC [{#NAME}]: Attachment state	Current attachment state of the VNIC.	Dependent item	oci.compute.vnic.attachment[{#ID}] Preprocessing JSON Path: `$[?(@.id == '{#ID}')].state.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Get metrics	Gets virtual network interface card metrics.	Script	oci.compute.vnic.metrics.get[{#ID}]
VNIC [{#NAME}]: Egress packets dropped by security list	Packets sent by the VNIC, destined for the network, dropped due to security rule violations.	Dependent item	oci.compute.vnic.egress.packets.dropped[{#ID}] Preprocessing JSON Path: `$.VnicEgressDropsSecurityList` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Ingress packets dropped by security list	Packets received from the network, destined for the VNIC, dropped due to security rule violations.	Dependent item	oci.compute.vnic.ingress.packets.dropped[{#ID}] Preprocessing JSON Path: `$.VnicIngressDropsSecurityList` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Bytes from network	Bytes received at the VNIC from the network, after drops.	Dependent item	oci.compute.vnic.net.bytes.ingr[{#ID}] Preprocessing JSON Path: `$.VnicFromNetworkBytes` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Bytes to network	Bytes sent from the VNIC to the network, before drops.	Dependent item	oci.compute.vnic.net.bytes.egr[{#ID}] Preprocessing JSON Path: `$.VnicToNetworkBytes` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Packets from network	Packets received at the VNIC from the network, after drops.	Dependent item	oci.compute.vnic.net.packets.ingr[{#ID}] Preprocessing JSON Path: `$.VnicFromNetworkPackets` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Packets to network	Packets sent from the VNIC to the network, before drops.	Dependent item	oci.compute.vnic.net.packets.egr[{#ID}] Preprocessing JSON Path: `$.VnicToNetworkPackets` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Throttled ingress packets	Packets received from the network, destined for the VNIC, dropped due to throttling.	Dependent item	oci.compute.vnic.net.packets.ingr.throttled[{#ID}] Preprocessing JSON Path: `$.VnicIngressDropsThrottle` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Throttled egress packets	Packets sent from the VNIC, destined for the network, dropped due to throttling.	Dependent item	oci.compute.vnic.net.packets.egr.throttled[{#ID}] Preprocessing JSON Path: `$.VnicEgressDropsThrottle` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Ingress packets dropped by full connection tracking table	Packets received from the network, destined for the VNIC, dropped due to the full connection tracking table.	Dependent item	oci.compute.vnic.net.packets.ingr.drop[{#ID}] Preprocessing JSON Path: `$.VnicIngressDropsConntrackFull` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Egress packets dropped by full connection tracking table	Packets sent from the VNIC, destined for the network, dropped due to the full connection tracking table.	Dependent item	oci.compute.vnic.net.packets.egr.drop[{#ID}] Preprocessing JSON Path: `$.VnicEgressDropsConntrackFull` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Connection tracking table utilization, in %	Total utilization percentage (0-100%) of the connection tracking table.	Dependent item	oci.compute.vnic.net.conntrack.util[{#ID}] Preprocessing JSON Path: `$.VnicConntrackUtilPercent` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Connection tracking table full	Boolean (0/false, 1/true) that indicates the connection tracking table is full.	Dependent item	oci.compute.vnic.net.conntrack.full[{#ID}] Preprocessing JSON Path: `$.VnicConntrackIsFull` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Smartnic buffer drops from network	Number of packets dropped in SmartNIC from the network due to buffer exhaustion. This metric is available only for Bare Metal Instances. For virtual machines, these metric values are zero.	Dependent item	oci.compute.vnic.net.smartnic.drops[{#ID}] Preprocessing JSON Path: `$.SmartnicBufferDropsFromNetwork` Discard unchanged with heartbeat: `1h`
VNIC [{#NAME}]: Smartnic buffer drops from host	Number of packets dropped in SmartNIC from the host due to buffer exhaustion. This metric is available only for Bare Metal Instances. For virtual machines, these metric values are zero.	Dependent item	oci.compute.vnic.host.smartnic.drops[{#ID}] Preprocessing JSON Path: `$.SmartnicBufferDropsFromHost` Discard unchanged with heartbeat: `1h`

Trigger prototypes for VNIC discovery

Name	Description	Expression	Severity
OCI Compute: VNIC [{#NAME}]: VNIC is not attached	Virtual network interface card attachment status.	`min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.attachment[{#ID}],5m) >= 3`\|High
OCI Compute: VNIC [{#NAME}]: Current conntrack table utilization is too high	Current conntrack table utilization has exceeded `{$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.HIGH}`% of the max available value.	`min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.net.conntrack.util[{#ID}],5m) >= {$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.HIGH}`\|High
OCI Compute: VNIC [{#NAME}]: Current conntrack table utilization is high	Current conntrack table utilization has exceeded `{$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.WARN}`% of the max available value.	`min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.net.conntrack.util[{#ID}],5m) >= {$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.WARN}`\|Warning	Depends on: OCI Compute: VNIC [{#NAME}]: Current conntrack table utilization is too high
OCI Compute: VNIC [{#NAME}]: Conntrack table full	Virtual network interface card connection tracking table is full.	`min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.net.conntrack.full[{#ID}],5m) = 1`\|High

Oracle Cloud Object Storage by HTTP

Overview

This template monitors Oracle Cloud Infrastructure (OCI) object storage resources.

This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

This template is not meant to be used independently. A host with the Oracle Cloud by HTTP template will discover OCI object storage buckets automatically, create host prototypes for each discovered bucket, and apply it this template.

If needed, you can specify an HTTP proxy for the template to use by changing the value of the {$OCI.HTTP.PROXY} user macro.

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}.

LLD filter values and trigger threshold values can be changed with the respective user macros.

Macros used

Name	Description	Default
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`

Items

Name	Description	Type	Key and additional info
Get frequent metrics	Gets all metrics related to a specific bucket that have frequent update time (100 milliseconds).	Script	oci.obj.storage.metrics.frequent.get
All requests count	The total number of all HTTP requests made in a bucket.	Dependent item	oci.obj.storage.requests Preprocessing JSON Path: `$.AllRequests` Discard unchanged with heartbeat: `1h`
Client-side error count	The total number of 4xx errors for requests made in a bucket.	Dependent item	oci.obj.storage.client.errors Preprocessing JSON Path: `$.ClientErrors` Discard unchanged with heartbeat: `1h`
First byte latency time	The per-request time measured from the time Object Storage receives the complete request to when Object Storage returns the first byte of the response.	Dependent item	oci.obj.storage.latency.byte Preprocessing JSON Path: `$.FirstByteLatency` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
Post object request count	The total number of HTTP `POST` requests made in a bucket.	Dependent item	oci.obj.storage.requests.post Preprocessing JSON Path: `$.PostRequests` Discard unchanged with heartbeat: `1h`
Put object request count	The total number of `PutObject` requests made in a bucket.	Dependent item	oci.obj.storage.requests.put Preprocessing JSON Path: `$.PutRequests` Discard unchanged with heartbeat: `1h`
Overall latency time	The per-request time from the first byte received by Object Storage to the last byte sent from Object Storage.	Dependent item	oci.obj.storage.latency.overall Preprocessing JSON Path: `$.TotalRequestLatency` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
Get hourly metrics	Gets all metrics related to specific bucket that have update time of 1 hour.	Script	oci.obj.storage.metrics.hourly.get
Number of objects	The count of objects in the bucket, excluding any multipart upload parts that have not been discarded (aborted) or committed.	Dependent item	oci.obj.storage.objects Preprocessing JSON Path: `$.ObjectCount` Discard unchanged with heartbeat: `12h`
Bucket size	The size of the bucket, excluding any multipart upload parts that have not been discarded (aborted) or committed.	Dependent item	oci.obj.storage.size Preprocessing JSON Path: `$.StoredBytes` Discard unchanged with heartbeat: `12h`
Incomplete multipart upload size	The size of any multipart upload parts that have not been discarded (aborted) or committed.	Dependent item	oci.obj.storage.size.incomplete Preprocessing JSON Path: `$.UncommittedParts` Discard unchanged with heartbeat: `12h`
Get enabled object lifecycle management	Indicates whether a bucket has any executable Object Lifecycle Management policies configured. `EnabledOLM` emits: 1 - if policies are configured 0 - if no policies are configured	Script	oci.obj.storage.metrics.olm.get Preprocessing Discard unchanged with heartbeat: `1d`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
OCI Object Storage: Object lifecycle management policy has changed	The object lifecycle management policy configuration has changed.	`last(/Oracle Cloud Object Storage by HTTP/oci.obj.storage.metrics.olm.get,#1)<>last(/Oracle Cloud Object Storage by HTTP/oci.obj.storage.metrics.olm.get,#2) and length(last(/Oracle Cloud Object Storage by HTTP/oci.obj.storage.metrics.olm.get))>0`\|Info

Oracle Cloud Autonomous Database by HTTP

Overview

This template monitors Oracle Cloud Infrastructure (OCI) autonomous database (serverless) resources.

This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

This template is not meant to be used independently. A host with the Oracle Cloud by HTTP template will discover OCI autonomous databases automatically, create host prototypes for each discovered database, and apply it to this template.

If needed, you can specify an HTTP proxy for the template to use by changing the value of the {$OCI.HTTP.PROXY} user macro.

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}.

The LLD filter values and trigger threshold values can be changed with the respective user macros.

Macros used

Name	Description	Default
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`
{$OCI.AUTONOMOUS.DB.CPU.UTIL.WARN}	Sets the percentage threshold for creating a "warning" severity event about CPU resource utilization.	`75`
{$OCI.AUTONOMOUS.DB.CPU.UTIL.HIGH}	Sets the percentage threshold for creating a "high" severity event about CPU resource utilization.	`90`
{$OCI.AUTONOMOUS.DB.STORAGE.UTIL.WARN}	Sets the percentage threshold for creating a "warning" severity event about storage resource utilization.	`75`
{$OCI.AUTONOMOUS.DB.STORAGE.UTIL.HIGH}	Sets the percentage threshold for creating a "high" severity event about storage resource utilization.	`90`

Items

Name	Description	Type	Key and additional info
State	Gets the autonomous database state.	Script	oci.aut.db.state Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get frequent metrics	Gets all metrics related to the database that have a collection frequency of 1 minute.	Script	oci.aut.db.metrics.frequent.get
CPU time	Average rate of accumulation of CPU time by foreground sessions in the database over the selected time interval.	Dependent item	oci.aut.db.cpu.time Preprocessing JSON Path: `$.CpuTime` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
CPU utilization, in %	The CPU usage expressed as a percentage, aggregated across all consumer groups. The utilization percentage is reported with respect to the number of CPUs the database is allowed to use.	Dependent item	oci.aut.db.cpu.util Preprocessing JSON Path: `$.CpuUtilization` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Current logons	The number of successful logons during the selected time interval.	Dependent item	oci.aut.db.logons Preprocessing JSON Path: `$.CurrentLogons` Discard unchanged with heartbeat: `1h`
DB block changes	The number of changes that were part of an update or delete operation that were made to all blocks in the SGA. Such changes generate redo log entries and thus become permanent changes to the database if the transaction is committed. This statistic approximates total database work and indicates the rate at which buffers are being dirtied during the selected time interval.	Dependent item	oci.aut.db.block.changes Preprocessing JSON Path: `$.DBBlockChanges` Discard unchanged with heartbeat: `1h`
DB time	The amount of time database user sessions spend executing database code (CPU time + wait time). Database time is used to infer database call latency as it increases in direct proportion to both database call latency (response time) and call volume. It is calculated as the average rate of accumulation of database time by foreground sessions in the database over the selected time interval.	Dependent item	oci.aut.db.time Preprocessing JSON Path: `$.DBTime` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Execute count	The number of user and recursive calls that executed SQL statements during the selected time interval.	Dependent item	oci.aut.db.exec.count Preprocessing JSON Path: `$.ExecuteCount` Discard unchanged with heartbeat: `1h`
Failed connections	The number of failed database connections.	Dependent item	oci.aut.db.conn.failed Preprocessing JSON Path: `$.FailedConnections` Discard unchanged with heartbeat: `1h`
Failed logons	The number of logons that failed because of an invalid user name and/or password during the selected time interval.	Dependent item	oci.aut.db.logons.failed Preprocessing JSON Path: `$.FailedLogons` Discard unchanged with heartbeat: `1h`
Parse count (hard)	The number of parse calls (real parses) during the selected time interval. A hard parse is an expensive operation in terms of memory use as it requires Oracle to allocate a workheap and other memory structures and then build a parse tree.	Dependent item	oci.aut.db.parse.count.hard Preprocessing JSON Path: `$.HardParseCount` Discard unchanged with heartbeat: `1h`
Session logical reads	The sum of `db block gets` and `consistent gets` during the selected time interval. This includes logical reads of database blocks from either the buffer cache or process private memory.	Dependent item	oci.aut.db.logical.reads.session Preprocessing JSON Path: `$.LogicalReads` Discard unchanged with heartbeat: `1h`
Parse count (total)	The number of hard and soft parses during the selected time interval.	Dependent item	oci.aut.db.parse.count.total Preprocessing JSON Path: `$.ParseCount` Discard unchanged with heartbeat: `1h`
Parse count (failures)	The number of parse failures during the selected time interval.	Dependent item	oci.aut.db.parse.count.failed Preprocessing JSON Path: `$.ParseFailureCount` Discard unchanged with heartbeat: `1h`
Physical reads	The number of data blocks read from disk during the selected time interval.	Dependent item	oci.aut.db.physical.reads Preprocessing JSON Path: `$.PhysicalReads` Discard unchanged with heartbeat: `1h`
Physical read total bytes	The size in bytes of disk reads by all database instance activity including application reads, backup and recovery, and other utilities during the selected time interval.	Dependent item	oci.aut.db.physical.read.bytes Preprocessing JSON Path: `$.PhysicalReadTotalBytes` Discard unchanged with heartbeat: `1h`
Physical writes	The number of data blocks written to disk during the selected time interval.	Dependent item	oci.aut.db.physical.writes Preprocessing JSON Path: `$.PhysicalWrites` Discard unchanged with heartbeat: `1h`
Physical write total bytes	The size in bytes of all disk writes for the database instance including application activity, backup and recovery, and other utilities during the selected time interval.	Dependent item	oci.aut.db.physical.write.bytes Preprocessing JSON Path: `$.PhysicalWriteTotalBytes` Discard unchanged with heartbeat: `1h`
Queued statements	The number of queued SQL statements aggregated across all consumer groups during the selected time interval.	Dependent item	oci.aut.db.queued.statements Preprocessing JSON Path: `$.QueuedStatements` Discard unchanged with heartbeat: `1h`
Redo generated	Amount of redo generated in bytes during the selected time interval.	Dependent item	oci.aut.db.redo.gen Preprocessing JSON Path: `$.RedoGenerated` Discard unchanged with heartbeat: `1h`
Running statements	The number of running SQL statements aggregated across all consumer groups during the selected time interval.	Dependent item	oci.aut.db.statements.running Preprocessing JSON Path: `$.RunningStatements` Discard unchanged with heartbeat: `1h`
Sessions	The number of sessions in the database.	Dependent item	oci.aut.db.sessions Preprocessing JSON Path: `$.Sessions` Discard unchanged with heartbeat: `1h`
Bytes received via SQL*Net from client	The number of bytes received from the client over Oracle Net Services during the selected time interval.	Dependent item	oci.aut.db.sqlnet.bytes.recv.client Preprocessing JSON Path: `$.SQLNetBytesFromClient` Discard unchanged with heartbeat: `1h`
Bytes received via SQL*Net from DBLink	The number of bytes received from a database link over Oracle Net Services during the selected time interval.	Dependent item	oci.aut.db.sqlnet.bytes.recv.dblink Preprocessing JSON Path: `$.SQLNetBytesFromDBLink` Discard unchanged with heartbeat: `1h`
Bytes sent via SQL*Net to client	The number of bytes sent to the client from the foreground processes during the selected time interval.	Dependent item	oci.aut.db.sqlnet.bytes.sent.client Preprocessing JSON Path: `$.SQLNetBytesToClient` Discard unchanged with heartbeat: `1h`
Bytes sent via SQL*Net to DBLink	The number of bytes sent over a database link during the selected time interval.	Dependent item	oci.aut.db.sqlnet.bytes.sent.dblink Preprocessing JSON Path: `$.SQLNetBytesToDBLink` Discard unchanged with heartbeat: `1h`
Transaction count	The combined number of user commits and user rollbacks during the selected time interval.	Dependent item	oci.aut.db.transaction.count Preprocessing JSON Path: `$.TransactionCount` Discard unchanged with heartbeat: `1h`
User calls	The combined number of logons, parses, and execute calls during the selected time interval.	Dependent item	oci.aut.db.user.calls Preprocessing JSON Path: `$.UserCalls` Discard unchanged with heartbeat: `1h`
User commits	The number of user commits during the selected time interval. When a user commits a transaction, the generated redo that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate.	Dependent item	oci.aut.db.user.commits Preprocessing JSON Path: `$.UserCommits` Discard unchanged with heartbeat: `1h`
User rollbacks	Number of times users manually issue the `ROLLBACK` statement or an error occurs during a user's transactions during the selected time interval.	Dependent item	oci.aut.db.user.rollbacks Preprocessing JSON Path: `$.UserRollbacks` Discard unchanged with heartbeat: `1h`
Wait time	Average rate of accumulation of non-idle wait time by foreground sessions in the database over the selected time interval.	Dependent item	oci.aut.db.wait.time Preprocessing JSON Path: `$.WaitTime` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Get database stats	Gets all metrics related to specific database that have a collection frequency of 5 minutes.	Script	oci.aut.db.metrics.stats
Database availability	The database is available for connections in the given minute.	Dependent item	oci.aut.db.availability Preprocessing JSON Path: `$.DatabaseAvailability` Discard unchanged with heartbeat: `5h`
Connection latency	The time taken to connect to an Oracle Autonomous Database Serverless instance in each region from a Compute service virtual machine in the same region.	Dependent item	oci.aut.db.latency.conn Preprocessing JSON Path: `$.ConnectionLatency` Custom multiplier: `0.001` Discard unchanged with heartbeat: `5h`
Query latency	The time taken to display the results of a simple query on the user's screen.	Dependent item	oci.aut.db.latency.query Preprocessing JSON Path: `$.QueryLatency` Custom multiplier: `0.001` Discard unchanged with heartbeat: `5h`
Get storage stats	Gets all storage metrics related to a specific database that have a collection frequency of 60 minutes.	Script	oci.aut.db.metrics.storage.stats
Storage space allocated	Amount of space allocated to the database for all tablespaces during the selected time interval.	Dependent item	oci.aut.db.storage.space.alloc Preprocessing JSON Path: `$.StorageAllocated` Custom multiplier: `1073741824` Discard unchanged with heartbeat: `12h`
Maximum storage space	Maximum amount of storage reserved for the database during the selected time interval.	Dependent item	oci.aut.db.storage.space.max Preprocessing JSON Path: `$.StorageMax` Custom multiplier: `1073741824` Discard unchanged with heartbeat: `12h`
Storage space used	Maximum amount of space used during the selected time interval.	Dependent item	oci.aut.db.storage.space.used Preprocessing JSON Path: `$.StorageUsed` Custom multiplier: `1073741824` Discard unchanged with heartbeat: `12h`
Storage utilization, in %	The percentage of the reserved maximum storage currently allocated for all database tablespaces. Represents the total reserved space for all tablespaces.	Dependent item	oci.aut.db.storage.space.util Preprocessing JSON Path: `$.StorageUtilization` Discard unchanged with heartbeat: `12h`

Triggers

Name	Description	Expression	Severity
OCI Autonomous DB: Restore has failed	Autonomous database restore has failed.	`last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 9`\|Warning
OCI Autonomous DB: Database is not available or accessible	Autonomous database is not available or accessible.	`last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 19 or last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 20`\|High
OCI Autonomous DB: Available, needs attention	Autonomous database is available, but needs attention.	`last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 12`\|Warning
OCI Autonomous DB: State unknown	Autonomous database state is unknown.	`last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 0`\|Warning
OCI Autonomous DB: State has changed	Autonomous database state has changed.	`last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state,#1)<>last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state,#2)`\|Info	Manual close: Yes Depends on: OCI Autonomous DB: Restore has failed OCI Autonomous DB: Database is not available or accessible OCI Autonomous DB: Available, needs attention OCI Autonomous DB: State unknown
OCI Autonomous DB: Current CPU utilization is too high	Current CPU utilization has exceeded `{$OCI.AUTONOMOUS.DB.CPU.UTIL.HIGH}`% of the max available value.	`min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.cpu.util,5m) >= {$OCI.AUTONOMOUS.DB.CPU.UTIL.HIGH}`\|High
OCI Autonomous DB: Current CPU utilization is high	Current CPU utilization has exceeded `{$OCI.AUTONOMOUS.DB.CPU.UTIL.WARN}`% of the max available value.	`min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.cpu.util,5m) >= {$OCI.AUTONOMOUS.DB.CPU.UTIL.WARN}`\|Warning	Depends on: OCI Autonomous DB: Current CPU utilization is too high
OCI Autonomous DB: Database is not available	Autonomous database is not available.	`last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.availability) = 0`\|High	Depends on: OCI Autonomous DB: Database is not available or accessible
OCI Autonomous DB: Current storage utilization is too high	Current storage utilization has exceeded `{$OCI.AUTONOMOUS.DB.STORAGE.UTIL.HIGH}`% of the max available value.	`min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.storage.space.util,5m) >= {$OCI.AUTONOMOUS.DB.STORAGE.UTIL.HIGH}`\|High
OCI Autonomous DB: Current storage utilization is high	Current storage utilization has exceeded `{$OCI.AUTONOMOUS.DB.STORAGE.UTIL.WARN}`% of the max available value.	`min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.storage.space.util,5m) >= {$OCI.AUTONOMOUS.DB.STORAGE.UTIL.WARN}`\|Warning	Depends on: OCI Autonomous DB: Current storage utilization is too high

Oracle Cloud Block Volume by HTTP

Overview

This template monitors Oracle Cloud Infrastructure (OCI) block volume resources.

This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

This template is not meant to be used independently. A host with the Oracle Cloud by HTTP template will discover OCI block volumes automatically, create host prototypes for each discovered block volume, and apply it this template.

If needed, you can specify an HTTP proxy for the template to use by changing the value of {$OCI.HTTP.PROXY} user macro.

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}

LLD filter values and trigger threshold values can be changed with respective user macros.

Macros used

Name	Description	Default
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`

Items

Name	Description	Type	Key and additional info
State	Gets the block volume state.	Script	oci.block.volume.state Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get metrics	Gets block volume metrics.	Script	oci.block.volume.metrics.get
Volume read throughput	Read throughput. Expressed as bytes read per interval.	Dependent item	oci.block.volume.read Preprocessing JSON Path: `$.VolumeReadThroughput` Discard unchanged with heartbeat: `1h`
Volume write throughput	Write throughput. Expressed as bytes read per interval.	Dependent item	oci.block.volume.write Preprocessing JSON Path: `$.VolumeWriteThroughput` Discard unchanged with heartbeat: `1h`
Volume read operations	Activity level from I/O reads. Expressed as reads per interval.	Dependent item	oci.block.volume.read.ops Preprocessing JSON Path: `$.VolumeReadOps` Discard unchanged with heartbeat: `1h`
Volume write operations	Activity level from I/O writes. Expressed as writes per interval.	Dependent item	oci.block.volume.write.ops Preprocessing JSON Path: `$.VolumeWriteOps` Discard unchanged with heartbeat: `1h`
Volume throttled operations	Total sum of all the I/O operations that were throttled during a given time interval.	Dependent item	oci.block.volume.throttled.ops Preprocessing JSON Path: `$.VolumeThrottledIOs` Discard unchanged with heartbeat: `1h`
Volume guaranteed VPUs/GB	Rate of change for currently active VPUs/GB. Expressed as the average of active VPUs/GB during a given time interval.	Dependent item	oci.block.volume.vpu Preprocessing JSON Path: `$.VolumeGuaranteedVPUsPerGB` Discard unchanged with heartbeat: `1h`
Volume guaranteed IOPS	Rate of change for guaranteed IOPS per SLA. Expressed as the average of guaranteed IOPS during a given time interval.	Dependent item	oci.block.volume.iops Preprocessing JSON Path: `$.VolumeGuaranteedIOPS` Discard unchanged with heartbeat: `1h`
Volume guaranteed throughput	Rate of change for guaranteed throughput per SLA. Expressed as megabytes per interval.	Dependent item	oci.block.volume.throughput Preprocessing JSON Path: `$.VolumeGuaranteedThroughput` Custom multiplier: `1048576` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
OCI Block Volume: Block volume terminated or faulty	Block volume state is "terminated"/"terminating" or "faulty".	`min(/Oracle Cloud Block Volume by HTTP/oci.block.volume.state,5m) >= 4`\|High
OCI Block Volume: Block volume state unknown	Block volume state is unknown.	`min(/Oracle Cloud Block Volume by HTTP/oci.block.volume.state,5m) = 0`\|Warning	Depends on: OCI Block Volume: Block volume terminated or faulty

Oracle Cloud Boot Volume by HTTP

Overview

This template monitors Oracle Cloud Infrastructure (OCI) boot volume resources.

This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

This template is not meant to be used independently. A host with the Oracle Cloud by HTTP template will discover OCI boot volumes automatically, create host prototypes for each discovered boot volume, and apply it this template.

If needed, you can specify an HTTP proxy for the template to use by changing the value of the {$OCI.HTTP.PROXY} user macro.

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}

LLD filter values and trigger threshold values can be changed with respective user macros.

Macros used

Name	Description	Default
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`

Items

Name	Description	Type	Key and additional info
State	Gets the boot volume state.	Script	oci.boot.volume.state Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get metrics	Gets boot volume metrics.	Script	oci.boot.volume.metrics.get
Volume read throughput	Read throughput. Expressed as bytes read per interval.	Dependent item	oci.boot.volume.read Preprocessing JSON Path: `$.VolumeReadThroughput` Discard unchanged with heartbeat: `1h`
Volume write throughput	Write throughput. Expressed as bytes read per interval.	Dependent item	oci.boot.volume.write Preprocessing JSON Path: `$.VolumeWriteThroughput` Discard unchanged with heartbeat: `1h`
Volume read operations	Activity level from I/O reads. Expressed as reads per interval.	Dependent item	oci.boot.volume.read.ops Preprocessing JSON Path: `$.VolumeReadOps` Discard unchanged with heartbeat: `1h`
Volume write operations	Activity level from I/O writes. Expressed as writes per interval.	Dependent item	oci.boot.volume.write.ops Preprocessing JSON Path: `$.VolumeWriteOps` Discard unchanged with heartbeat: `1h`
Volume throttled operations	Total sum of all the I/O operations that were throttled during a given time interval.	Dependent item	oci.boot.volume.throttled.ops Preprocessing JSON Path: `$.VolumeThrottledIOs` Discard unchanged with heartbeat: `1h`
Volume guaranteed VPUs/GB	Rate of change for currently active VPUs/GB. Expressed as the average of active VPUs/GB during a given time interval.	Dependent item	oci.boot.volume.vpu Preprocessing JSON Path: `$.VolumeGuaranteedVPUsPerGB` Discard unchanged with heartbeat: `1h`
Volume guaranteed IOPS	Rate of change for guaranteed IOPS per SLA. Expressed as the average of guaranteed IOPS during a given time interval.	Dependent item	oci.boot.volume.iops Preprocessing JSON Path: `$.VolumeGuaranteedIOPS` Discard unchanged with heartbeat: `1h`
Volume guaranteed throughput	Rate of change for guaranteed throughput per SLA. Expressed as megabytes per interval.	Dependent item	oci.boot.volume.throughput Preprocessing JSON Path: `$.VolumeGuaranteedThroughput` Custom multiplier: `1048576` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
OCI Boot Volume: Boot volume terminated or faulty	Boot volume state is "terminated"/"terminating" or "faulty".	`min(/Oracle Cloud Boot Volume by HTTP/oci.boot.volume.state,5m) >= 4`\|High
OCI Boot Volume: Boot volume state unknown	Boot volume state is unknown.	`min(/Oracle Cloud Boot Volume by HTTP/oci.boot.volume.state,5m) = 0`\|Warning	Depends on: OCI Boot Volume: Boot volume terminated or faulty

Oracle Cloud Networking by HTTP

Overview

This template monitors Oracle Cloud Infrastructure (OCI) single virtual network card availability and discovers attached subnets and monitors their availability.

This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.

For communication with OCI, this template utilizes script items which execute HTTP GET requests.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Oracle Cloud Infrastructure

Configuration

Setup

This template is not meant to be used independently. A host with the Oracle Cloud by HTTP template will discover OCI virtual cloud networks (VCNs) automatically, create host prototypes for each discovered VCN, and apply it this template.

If needed, you can specify an HTTP proxy for the template to use by changing the value of {$OCI.HTTP.PROXY} user macro.

If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}

LLD filter values and trigger threshold values can be changed with respective user macros.

Macros used

Name	Description	Default
{$OCI.HTTP.PROXY}	Set an HTTP proxy for OCI API requests if needed.
{$OCI.HTTP.RETURN.CODE.OK}	Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used.	`200`
{$OCI.VCN.SUBNET.DISCOVERY.STATE.MATCHES}	Sets the regex string of VCN subnet states to allow in discovery.	`.*`
{$OCI.VCN.SUBNET.DISCOVERY.STATE.NOT_MATCHES}	Sets the regex string of VCN subnet states to ignore in discovery.	`CHANGE_IF_NEEDED`
{$OCI.VCN.SUBNET.DISCOVERY.NAME.MATCHES}	Sets the regex string of VCN subnet names to allow in discovery.	`.*`
{$OCI.VCN.SUBNET.DISCOVERY.NAME.NOT_MATCHES}	Sets the regex string of VCN subnet names to ignore in discovery.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Get VCN state

Name	Description	Type	Key and additional info
Get VCN state	State of the virtual cloud network.	Script	oci.vcn.state.get Preprocessing Replace: `" ->` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Get subnets	Get data about subnets linked to the particular VCN.	Script	oci.vcn.subnets.get

State of the virtual cloud network.

Script

oci.vcn.state.get

Preprocessing

Replace: " ->
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Get subnets

Get data about subnets linked to the particular VCN.

Script

oci.vcn.subnets.get

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
OCI VCN: VCN state terminated	Virtual cloud network state is "terminated" or "terminating".	`min(/Oracle Cloud Networking by HTTP/oci.vcn.state.get,5m) = 3 or min(/Oracle Cloud Networking by HTTP/oci.vcn.state.get,5m) = 4`\|High
OCI VCN: VCN state unknown	Virtual cloud network state is unknown.	`min(/Oracle Cloud Networking by HTTP/oci.vcn.state.get,5m) = 0`\|Warning

LLD rule Subnet discovery

Name Description Type Key and additional info

Subnet discovery

Name	Description	Type	Key and additional info
Subnet discovery	Discover subnets linked to the particular VCN.	Dependent item	oci.vcn.subnet.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discover subnets linked to the particular VCN.

Dependent item

oci.vcn.subnet.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Subnet discovery

Name Description Type Key and additional info

Subnet [{#NAME}]: Get subnet state

Name	Description	Type	Key and additional info
Subnet [{#NAME}]: Get subnet state	Current state of subnet.	Dependent item	oci.vcn.subnet.state[{#ID}] Preprocessing JSON Path: `$..[?(@.id == '{#ID}')].state.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Current state of subnet.

Dependent item

oci.vcn.subnet.state[{#ID}]

Preprocessing

JSON Path: $..[?(@.id == '{#ID}')].state.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Subnet discovery

Name	Description	Expression	Severity	Dependencies and additional info
OCI VCN: Subnet [{#NAME}]: Subnet state terminated	Virtual cloud network subnet state is "terminated" or "terminating".	`min(/Oracle Cloud Networking by HTTP/oci.vcn.subnet.state[{#ID}],5m) = 3 or min(/Oracle Cloud Networking by HTTP/oci.vcn.subnet.state[{#ID}],5m) = 4`\|High
OCI VCN: Subnet [{#NAME}]: Subnet state unknown	Virtual cloud network subnet state is unknown.	`min(/Oracle Cloud Networking by HTTP/oci.vcn.subnet.state[{#ID}],5m) = 0`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

cloud

cloud_openstack_http

View README Download JSON

OpenStack by HTTP

Overview

This template is designed for the effortless deployment of OpenStack monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

OpenStack Yoga release and OpenStack built from sources (27568ea3):

Identity API v3
Compute API v2.1 (for OpenStack Nova by HTTP template)

Configuration

Setup

This is a master template that needs to be assigned to a host, and it will discover all OpenStack services supported by Zabbix automatically.

Before using this template it is recommended to create a separate monitoring user on OpenStack that will have access to specific API resources. Zabbix uses OpenStack application credentials for authorization, as it is a more secure method than a username and password-based authentication.

Below are instructions and examples on how to set up a user on OpenStack that will be used by Zabbix. Examples use the OpenStack CLI (command-line interface) tool, but this can also be done from OpenStack Horizon (web interface).

The project that needs to be monitored is assumed to be already present in OpenStack. In the following examples, a project named zabbix is used:

# openstack project list
+----------------------------------+--------------------+
| ID                               | Name               |
+----------------------------------+--------------------+
| 28d6bb25d62b4e7e8c2d59ce056a0334 | service            |
| 4688a19e02324c42a34220e9b6a2407e | admin              |
| bc78db4bb2044148a0abf90be512fa12 | zabbix             |
+----------------------------------+--------------------+

After the project name is noted, a monitoring user needs to be created. This can be done by executing an openstack user create command:

# openstack user create --project zabbix --password-prompt zabbix-monitoring
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| default_project_id  | bc78db4bb2044148a0abf90be512fa12 |
| domain_id           | default                          |
| enabled             | True                             |
| id                  | abd3eda9a29244568b1801e4825b6d71 |
| name                | zabbix-monitoring                |
| options             | {}                               |
| password_expires_at | None                             |
+---------------------+----------------------------------+

When the monitoring user is created, it needs to be assigned a role. But first, a monitoring-specific role needs to be created:

# openstack role create --description "A role for Zabbix monitoring user" monitoring
+-------------+-----------------------------------+
| Field       | Value                             |
+-------------+-----------------------------------+
| description | A role for Zabbix monitoring user |
| domain_id   | None                              |
| id          | 93577a7f13184cf7af76f7bdecf7f6ee  |
| name        | monitoring                        |
| options     | {}                                |
+-------------+-----------------------------------+

Then assign this newly created role to the monitoring user created in Step 1:

# openstack role add --user zabbix-monitoring --project zabbix monitoring

Verify that the role has been assigned correctly. There should be one role only:

# openstack role assignment list --user zabbix-monitoring --project zabbix --names
+------------+---------------------------+-------+----------------+--------+--------+-----------+
| Role       | User                      | Group | Project        | Domain | System | Inherited |
+------------+---------------------------+-------+----------------+--------+--------+-----------+
| monitoring | zabbix-monitoring@Default |       | zabbix@Default |        |        | False     |
+------------+---------------------------+-------+----------------+--------+--------+-----------+

Get the OpenStack RC file for the monitoring user in this project, source it, and generate application credentials:

# openstack application credential create --description "Application credential for Zabbix monitoring" zabbix-app-cred
  +--------------+----------------------------------------------------------------------------------------+
  | Field        | Value                                                                                  |
  +--------------+----------------------------------------------------------------------------------------+
  | description  | Application credential for Zabbix monitoring                                           |
  | expires_at   | None                                                                                   |
  | id           | c8087b91354249f3b157a50fc5ecfb3c                                                       |
  | name         | zabbix-app-cred                                                                        |
  | project_id   | bc78db4bb2044148a0abf90be512fa12                                                       |
  | roles        | monitoring                                                                             |
  | secret       | E1kC-s8QTWUaIpmexF18GW-FL3TI9-HXoexdExvGsw7uOhb3SEFW1zDa1qTs80Vqn-2xgviIPRuYOCDp2NDVUg |
  | system       | None                                                                                   |
  | unrestricted | False                                                                                  |
  | user_id      | abd3eda9a29244568b1801e4825b6d71                                                       |
  +--------------+----------------------------------------------------------------------------------------+

While creating the application credential, it is also possible to define access rules using the --access-rules flag, which offers even more fine-grained access to various API endpoints. This is optional and up to the user to decide if such rules are needed.

Once the application credential is created, the values of id and secret need to be set as user macro values in Zabbix:

value of id in {$APP.CRED.ID} user macro;
value of secret in {$APP.CRED.SECRET} user macro.

At this point, the monitoring user will not be able to access any resources on OpenStack, therefore some access rights need to be defined. Access rights are set using policies. Each service has its own policy file, therefore further steps for setting up policies, are mentioned in the template documentation of each supported service, e.g., OpenStack Nova by HTTP.

Macros used

Name	Description	Default
{$OPENSTACK.KEYSTONE.API.ENDPOINT}	API endpoint for Identity Service, e.g., https://local.openstack:5000.
{$OPENSTACK.AUTH.INTERVAL}	API token regeneration interval, in minutes. By default, OpenStack API tokens expire after 60m.	`50m`
{$OPENSTACK.HTTP.PROXY}	Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used.
{$OPENSTACK.APP.CRED.ID}	Application credential ID for monitoring user access.
{$OPENSTACK.APP.CRED.SECRET}	Application credential password for monitoring user access.

Items

Name	Description	Type	Key and additional info
Get access token and service catalog	Authorizes user on the OpenStack Identity service and gets the service catalog.	Script	openstack.identity.auth

LLD rule OpenStack: Nova discovery

Name	Description	Type	Key and additional info
OpenStack: Nova discovery	Discovers OpenStack services from the monitoring user's services catalog.	Dependent item	openstack.services.nova.discovery

OpenStack Nova by HTTP

Overview

This template is designed for the effortless deployment of OpenStack Nova monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

OpenStack Yoga release and OpenStack built from sources (27568ea3):

Compute API v2.1

Configuration

Setup

This template is not meant to be used independently. A host with the OpenStack by HTTP template will discover the Nova service automatically and create a host prototype with this template assigned to it.

If needed, you can specify an HTTP proxy for the template to use by changing the value of {$OPENSTACK.NOVA.HTTP.PROXY} user macro.

For tenant usage statistics, it is possible to choose a custom time period for which the data will be queried. This can be set with the {$OPENSTACK.NOVA.TENANT.PERIOD} macro value. The value can be one of the following:

y - current year until now;
m - current month until now (default value);
w - current week until now;
d - current day until now;

This template discovers servers (instances) present in the project and monitors their statuses, but, depending on different use cases, most likely it is not necessary to monitor all servers. To filter which servers to monitor, set the {$OPENSTACK.SERVER.DISCOVERY.NAME.MATCHES} and {$OPENSTACK.SERVER.DISCOVERY.NAME.NOT_MATCHES} macro values accordingly. This logic also applies to other low-level discovery rules.

OpenStack configuration

For the OpenStack monitoring user to be able to access the API resources used in this template, it is needed to configure the policy file for OpenStack Nova.

On the OpenStack server, open the /etc/nova/policy.json file in your favorite text editor.

In this file, assign the following target resources to the role that the monitoring user uses:

{
  "os_compute_api:servers:index": "role:monitoring",
  "os_compute_api:servers:show": "role:monitoring",
  "os_compute_api:os-services:list": "role:monitoring",
  "os_compute_api:os-hypervisors:list-detail": "role:monitoring",
  "os_compute_api:os-availability-zone:detail": "role:monitoring",
  "os_compute_api:os-simple-tenant-usage:list": "role:monitoring"
}

If some role is already assigned to the target, it is possible to add another role with or, for example, role:firstRole or role:monitoring.

Note that a restart of OpenStack Nova services might be needed for these new changes to be applied.

Macros used

Name	Description	Default
{$OPENSTACK.NOVA.SERVICE.URL}	API endpoint for Nova Service, e.g., https://local.openstack:8774/v2.1.
{$OPENSTACK.TOKEN}	API token for the monitoring user.
{$OPENSTACK.HTTP.PROXY}	Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used.
{$OPENSTACK.NOVA.TENANT.PERIOD}	Period for which tenant usage statistics will be queried. Possible values are: 'y' - current year until now, 'm' - current month until now, 'w' - current week until now, 'd' - current day until now.	`m`
{$OPENSTACK.NOVA.INTERVAL.LIMITS}	Interval for absolute limit HTTP agent item query.	`3m`
{$OPENSTACK.NOVA.INTERVAL.SERVERS}	Interval for server HTTP agent item queries.	`3m`
{$OPENSTACK.NOVA.INTERVAL.SERVICES}	Interval for service HTTP agent item query.	`3m`
{$OPENSTACK.NOVA.INTERVAL.HYPERVISOR}	Interval for hypervisor HTTP agent item query.	`3m`
{$OPENSTACK.NOVA.INTERVAL.AVAILABILITY_ZONE}	Interval for availability zone HTTP agent item query.	`3m`
{$OPENSTACK.NOVA.INTERVAL.TENANTS}	Interval for tenant HTTP agent item query.	`3m`
{$OPENSTACK.NOVA.INSTANCES.UTIL.WARN}	Sets the percentage threshold for creating a warning severity event about instances resource count.	`75`
{$OPENSTACK.NOVA.INSTANCES.UTIL.HIGH}	Sets the percentage threshold for creating a high severity event about instances resource count.	`90`
{$OPENSTACK.NOVA.CPU.UTIL.WARN}	Sets the percentage threshold for creating a warning severity event about vCPU resource usage.	`75`
{$OPENSTACK.NOVA.CPU.UTIL.HIGH}	Sets the percentage threshold for creating a high severity event about vCPU resource usage.	`90`
{$OPENSTACK.NOVA.RAM.UTIL.WARN}	Sets the percentage threshold for creating a warning severity event about RAM resource usage.	`75`
{$OPENSTACK.NOVA.RAM.UTIL.HIGH}	Sets the percentage threshold for creating a high severity event about RAM resource usage.	`90`
{$OPENSTACK.SERVER.DISCOVERY.NAME.MATCHES}	Sets the server name regex filter to use in server discovery for including.	`.*`
{$OPENSTACK.SERVER.DISCOVERY.NAME.NOT_MATCHES}	Sets the server name regex filter to use in server discovery for excluding.	`CHANGE_IF_NEEDED`
{$OPENSTACK.SERVICES.DISCOVERY.HOST.MATCHES}	Sets the host name regex filter to use in compute services discovery for including.	`.*`
{$OPENSTACK.SERVICES.DISCOVERY.HOST.NOT_MATCHES}	Sets the host name regex filter to use in compute services discovery for excluding.	`CHANGE_IF_NEEDED`
{$OPENSTACK.SERVICES.DISCOVERY.BINARY.MATCHES}	Sets the binary name regex filter to use in compute services discovery for including.	`.*`
{$OPENSTACK.SERVICES.DISCOVERY.BINARY.NOT_MATCHES}	Sets the binary name regex filter to use in compute services discovery for excluding.	`CHANGE_IF_NEEDED`
{$OPENSTACK.HYPERVISOR.DISCOVERY.HOSTNAME.MATCHES}	Sets the hostname regex filter to use in hypervisor discovery for including.	`.*`
{$OPENSTACK.HYPERVISOR.DISCOVERY.HOSTNAME.NOT_MATCHES}	Sets the hostname regex filter to use in hypervisor discovery for excluding.	`CHANGE_IF_NEEDED`
{$OPENSTACK.HYPERVISOR.DISCOVERY.TYPE.MATCHES}	Sets the type regex filter to use in hypervisor discovery for including.	`.*`
{$OPENSTACK.HYPERVISOR.DISCOVERY.TYPE.NOT_MATCHES}	Sets the type regex filter to use in hypervisor discovery for excluding.	`CHANGE_IF_NEEDED`
{$OPENSTACK.HYPERVISOR.DISCOVERY.IP.MATCHES}	Sets the host IP address regex filter to use in hypervisor discovery for including.	`.*`
{$OPENSTACK.HYPERVISOR.DISCOVERY.IP.NOT_MATCHES}	Sets the host IP address regex filter to use in hypervisor discovery for excluding.	`CHANGE_IF_NEEDED`
{$OPENSTACK.AVAILABILITY_ZONE.DISCOVERY.NAME.MATCHES}	Sets the zone name regex filter to use in availability zone discovery for including.	`.*`
{$OPENSTACK.AVAILABILITYZONE.DISCOVERY.NAME.NOTMATCHES}	Sets the zone name regex filter to use in availability zone discovery for excluding.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get absolute limits	Gets absolute limits for the project.	HTTP agent	openstack.nova.limits.get Preprocessing JSON Path: `$.limits.absolute` ⛔️Custom on fail: Set error to: `Could not get absolute project limits`
Get servers	Gets a list of servers.	HTTP agent	openstack.nova.servers.get Preprocessing JSON Path: `$.servers` ⛔️Custom on fail: Set error to: `Could not get servers list`
Get compute services	Gets a list of compute services and its data.	HTTP agent	openstack.nova.services.get Preprocessing JSON Path: `$.services` ⛔️Custom on fail: Set error to: `Could not get compute services list`
Get hypervisors	Gets a list of hypervisors and its data.	HTTP agent	openstack.nova.hypervisors.get Preprocessing JSON Path: `$.hypervisors` ⛔️Custom on fail: Set error to: `Could not get hypervisors list`
Get availability zones	Gets a list of availability zones and its data.	HTTP agent	openstack.nova.availability_zone.get Preprocessing JSON Path: `$.availabilityZoneInfo` ⛔️Custom on fail: Set error to: `Could not get availability zones list`
Get tenants	Gets a list of tenants and its data.	Script	openstack.nova.tenant.get Preprocessing JSON Path: `$.tenant_usages` ⛔️Custom on fail: Set error to: `Could not get tenant list`
Instances count, current	Number of servers in each tenant.	Dependent item	openstack.nova.limits.instances.current Preprocessing JSON Path: `$.totalInstancesUsed`
Instances count, max	Number of allowed servers for each tenant.	Dependent item	openstack.nova.limits.instances.max Preprocessing JSON Path: `$.maxTotalInstances`
Instances count, free	Number of available servers for each tenant.	Calculated	openstack.nova.limits.instances.free Preprocessing Discard unchanged with heartbeat: `1h`
vCPUs usage, current	Number of used server cores in each tenant.	Dependent item	openstack.nova.limits.vcpu.current Preprocessing JSON Path: `$.totalCoresUsed`
vCPUs usage, max	Number of allowed server cores for each tenant.	Dependent item	openstack.nova.limits.vcpu.max Preprocessing JSON Path: `$.maxTotalCores`
vCPUs usage, free	Number of available server cores for each tenant.	Calculated	openstack.nova.limits.vcpu.free Preprocessing Discard unchanged with heartbeat: `1h`
RAM usage, current	Amount of used server RAM.	Dependent item	openstack.nova.limits.ram.current Preprocessing JSON Path: `$.totalRAMUsed` Custom multiplier: `1048576`
RAM usage, max	Amount of allowed server RAM.	Dependent item	openstack.nova.limits.ram.max Preprocessing JSON Path: `$.maxTotalRAMSize` Custom multiplier: `1048576`
RAM usage, free	Amount of available server RAM.	Calculated	openstack.nova.limits.ram.free Preprocessing Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
OpenStack Nova: Current instances count is too high	Current instances count has exceeded {$OPENSTACK.NOVA.INSTANCES.UTIL.HIGH}% of the max available instances count.	`last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.current) >= ({$OPENSTACK.NOVA.INSTANCES.UTIL.HIGH} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.max))`\|High
OpenStack Nova: Current instances count is high	Current instances count has exceeded {$OPENSTACK.NOVA.INSTANCES.UTIL.WARN}% of the max available instances count.	`last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.current) >= ({$OPENSTACK.NOVA.INSTANCES.UTIL.WARN} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.max))`\|Warning	Depends on: OpenStack Nova: Current instances count is too high
OpenStack Nova: Current vCPU usage is too high	Current vCPU usage has exceeded {$OPENSTACK.NOVA.CPU.UTIL.HIGH}% of the max available vCPU usage.	`last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.current) >= ({$OPENSTACK.NOVA.CPU.UTIL.HIGH} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.max))`\|High
OpenStack Nova: Current vCPU usage is high	Current vCPU usage has exceeded {$OPENSTACK.NOVA.CPU.UTIL.WARN}% of the max available vCPU usage.	`last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.current) >= ({$OPENSTACK.NOVA.CPU.UTIL.WARN} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.max))`\|Warning	Depends on: OpenStack Nova: Current vCPU usage is too high
OpenStack Nova: Current RAM usage is too high	Current RAM usage has exceeded {$OPENSTACK.NOVA.RAM.UTIL.HIGH}% of the max available RAM usage.	`last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.current) >= ({$OPENSTACK.NOVA.RAM.UTIL.HIGH} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.max))`\|High
OpenStack Nova: Current RAM usage is high	Current RAM usage has exceeded {$OPENSTACK.NOVA.RAM.UTIL.WARN}% of the max available RAM usage.	`last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.current) >= ({$OPENSTACK.NOVA.RAM.UTIL.WARN} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.max))`\|Warning	Depends on: OpenStack Nova: Current RAM usage is too high

LLD rule Nova: Servers discovery

Name Description Type Key and additional info

Nova: Servers discovery

Name	Description	Type	Key and additional info
Nova: Servers discovery	Discovers OpenStack Nova servers.	Dependent item	openstack.nova.server.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discovers OpenStack Nova servers.

Dependent item

openstack.nova.server.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Nova: Servers discovery

Name Description Type Key and additional info

Name	Description	Type	Key and additional info
Server [{#SERVERID}]:[{#SERVERNAME}]: Status	Server status.	HTTP agent	openstack.nova.server.status.get[{#SERVER_ID}] Preprocessing JSON Path: `$.server.status` ⛔️Custom on fail: Set error to: `Could not parse the detailed server report` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Server [{#SERVERID}]:[{#SERVERNAME}]: Status

Server status.

HTTP agent

openstack.nova.server.status.get[{#SERVER_ID}]

Preprocessing

JSON Path: $.server.status
⛔️Custom on fail: Set error to: Could not parse the detailed server report
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Nova: Servers discovery

Name	Description	Expression	Severity	Dependencies and additional info
OpenStack Nova: Server [{#SERVERID}]:[{#SERVERNAME}]: Status is "ERROR"	Server is in "ERROR" status.	`last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}])=5`\|High	Manual close: Yes
OpenStack Nova: Server [{#SERVERID}]:[{#SERVERNAME}]: Status has changed	Status of the server has changed. Acknowledge to close the problem manually.	`last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}])<>last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}],#2) and length(last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}]))>0`\|Info	Manual close: Yes Depends on: OpenStack Nova: Server [{#SERVERID}]:[{#SERVERNAME}]: Status is "ERROR"

LLD rule Nova: Compute services discovery

Name Description Type Key and additional info

Nova: Compute services discovery

Name	Description	Type	Key and additional info
Nova: Compute services discovery	Discovers OpenStack compute services.	Dependent item	openstack.nova.services.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discovers OpenStack compute services.

Dependent item

openstack.nova.services.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Nova: Compute services discovery

Name	Description	Type	Key and additional info
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Raw data	Raw data of the service.	Dependent item	openstack.nova.services.raw[{#ID}] Preprocessing JSON Path: `$[?(@.id == "{#ID}")].first()` ⛔️Custom on fail: Set error to: `Could not parse the detailed services report`
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: State	State of the service.	Dependent item	openstack.nova.services.state[{#ID}] Preprocessing JSON Path: `$.state` ⛔️Custom on fail: Set error to: `Could not parse the detailed services report` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Status	Status of the service.	Dependent item	openstack.nova.services.status[{#ID}] Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set error to: `Could not parse the detailed services report` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Disabling reason	Reason for disabling a service.	Dependent item	openstack.nova.services.disabled.reason[{#ID}] Preprocessing JSON Path: `$.disabled_reason` ⛔️Custom on fail: Set error to: `Could not parse the detailed services report` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nova: Compute services discovery

Name	Description	Expression	Severity	Dependencies and additional info
OpenStack Nova: Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: State is "down"	State of the service is "down".	`last(/OpenStack Nova by HTTP/openstack.nova.services.state[{#ID}])=0`\|Warning	Manual close: Yes
OpenStack Nova: Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Status is "disabled"	Status of the server is disabled. Acknowledge to close the problem manually.	`last(/OpenStack Nova by HTTP/openstack.nova.services.status[{#ID}])=0 and length(last(/OpenStack Nova by HTTP/openstack.nova.services.disabled.reason[{#ID}]))>=0`\|Info	Manual close: Yes

LLD rule Nova: Hypervisor discovery

Name Description Type Key and additional info

Nova: Hypervisor discovery

Name	Description	Type	Key and additional info
Nova: Hypervisor discovery	Discovers OpenStack Nova hypervisors.	Dependent item	openstack.nova.hypervisors.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discovers OpenStack Nova hypervisors.

Dependent item

openstack.nova.hypervisors.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Nova: Hypervisor discovery

Name	Description	Type	Key and additional info
Hypervisor [{#ID}]:[{#HOSTNAME}]: Raw data	Raw data of the hypervisor.	Dependent item	openstack.nova.hypervisors.raw[{#ID}] Preprocessing JSON Path: `$[?(@.id == "{#ID}")].first()` ⛔️Custom on fail: Set error to: `Could not parse the detailed hypervisor report`
Hypervisor [{#ID}]:[{#HOSTNAME}]: State	State of the hypervisor.	Dependent item	openstack.nova.hypervisors.state[{#ID}] Preprocessing JSON Path: `$.state` ⛔️Custom on fail: Set error to: `Could not parse the detailed hypervisor report` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Hypervisor [{#ID}]:[{#HOSTNAME}]: Status	Status of the hypervisor.	Dependent item	openstack.nova.hypervisors.status[{#ID}] Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set error to: `Could not parse the detailed hypervisor report` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Hypervisor [{#ID}]:[{#HOSTNAME}]: Version	Hypervisor version.	Dependent item	openstack.nova.hypervisors.version[{#ID}] Preprocessing JSON Path: `$.hypervisor_version` ⛔️Custom on fail: Set error to: `Could not parse the detailed hypervisor report` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Nova: Hypervisor discovery

Name	Description	Expression	Severity
OpenStack Nova: Hypervisor [{#ID}]:[{#HOSTNAME}]: State is "down"	State of the hypervisor is "down".	`last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.state[{#ID}])=0`\|Warning	Manual close: Yes
OpenStack Nova: Hypervisor [{#ID}]:[{#HOSTNAME}]: Status is "disabled"	Status of the hypervisor is disabled.	`last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.status[{#ID}])=0`\|Info	Manual close: Yes
OpenStack Nova: Hypervisor [{#ID}]:[{#HOSTNAME}]: Version has changed	Version of the hypervisor has changed. Acknowledge to close the problem manually.	`last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.version[{#ID}])<>last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.version[{#ID}],#2) and length(last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.version[{#ID}]))>0`\|Info	Manual close: Yes

LLD rule Nova: Availability zones discovery

Name Description Type Key and additional info

Nova: Availability zones discovery

Name	Description	Type	Key and additional info
Nova: Availability zones discovery	Discovers OpenStack Nova availability zones.	Dependent item	openstack.nova.availability_zone.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discovers OpenStack Nova availability zones.

Dependent item

openstack.nova.availability_zone.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Nova: Availability zones discovery

Name Description Type Key and additional info

Availability zone [{#ZONE_NAME}]: Raw data

Name	Description	Type	Key and additional info
Availability zone [{#ZONE_NAME}]: Raw data	Raw data of the availability zone.	Dependent item	openstack.nova.availabilityzone.raw[{#ZONENAME}] Preprocessing JSON Path: `$[?(@.zoneName == "{#ZONE_NAME}")].first()` ⛔️Custom on fail: Set error to: `Could not parse the detailed availability zone report`
Availability zone [{#ZONE_NAME}]: State	Current state of the availability zone.	Dependent item	openstack.nova.availabilityzone.state[{#ZONENAME}] Preprocessing JSON Path: `$.zoneState.available` ⛔️Custom on fail: Set error to: `Could not parse the detailed availability zone report` Boolean to decimal Discard unchanged with heartbeat: `1h`
Availability zone [{#ZONE_NAME}]: Host count	Count of hosts and service objects under single availability zone.	Dependent item	openstack.nova.availabilityzone.hostcount[{#ZONE_NAME}] Preprocessing JSON Path: `$['hosts'].[].[].length()` ⛔️Custom on fail: Set error to: `Could not parse the detailed availability zone report` Discard unchanged with heartbeat: `1h`

Raw data of the availability zone.

Dependent item

openstack.nova.availabilityzone.raw[{#ZONENAME}]

Preprocessing

JSON Path: $[?(@.zoneName == "{#ZONE_NAME}")].first()
⛔️Custom on fail: Set error to: Could not parse the detailed availability zone report

Availability zone [{#ZONE_NAME}]: State

Current state of the availability zone.

Dependent item

openstack.nova.availabilityzone.state[{#ZONENAME}]

Preprocessing

JSON Path: $.zoneState.available
⛔️Custom on fail: Set error to: Could not parse the detailed availability zone report
Boolean to decimal
Discard unchanged with heartbeat: 1h

Availability zone [{#ZONE_NAME}]: Host count

Count of hosts and service objects under single availability zone.

Dependent item

openstack.nova.availabilityzone.hostcount[{#ZONE_NAME}]

Preprocessing

JSON Path: $['hosts'].[*].[*].length()
⛔️Custom on fail: Set error to: Could not parse the detailed availability zone report
Discard unchanged with heartbeat: 1h

Trigger prototypes for Nova: Availability zones discovery

Name	Description	Expression	Severity	Dependencies and additional info
OpenStack Nova: Availability zone [{#ZONE_NAME}]: Zone is unavailable	Availability zone is not available.	`last(/OpenStack Nova by HTTP/openstack.nova.availability_zone.state[{#ZONE_NAME}])=0`\|Warning	Manual close: Yes

LLD rule Nova: Tenant discovery

Name Description Type Key and additional info

Nova: Tenant discovery

Name	Description	Type	Key and additional info
Nova: Tenant discovery	Discovers tenants and their usage data.	Dependent item	openstack.nova.tenant.discovery Preprocessing Discard unchanged with heartbeat: `1h`

Discovers tenants and their usage data.

Dependent item

openstack.nova.tenant.discovery

Preprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for Nova: Tenant discovery

Name	Description	Type	Key and additional info
Tenant [{#TENANT_ID}]: Raw data	Raw data of the tenant.	Dependent item	openstack.nova.tenant.raw[{#TENANT_ID}] Preprocessing JSON Path: `$[?(@.tenant_id == "{#TENANT_ID}")].first()` ⛔️Custom on fail: Set error to: `Could not parse the tenant report`
Tenant [{#TENANT_ID}]: Total hours	Total duration that the servers exist (in hours).	Dependent item	openstack.nova.tenant.totalhours[{#TENANTID}] Preprocessing JSON Path: `$.total_hours` ⛔️Custom on fail: Set error to: `Could not parse the detailed tenant report` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Tenant [{#TENANT_ID}]: Total vCPUs usage	Total vCPU usage hours for the current tenant (project). Multiplying the number of virtual CPUs of the server by hours the server exists, and then adding that all together for each server.	Dependent item	openstack.nova.tenant.totalvcpu[{#TENANTID}] Preprocessing JSON Path: `$.total_vcpus_usage` ⛔️Custom on fail: Set error to: `Could not parse the detailed tenant report` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Tenant [{#TENANT_ID}]: Total disk usage	Total disk usage hours for the current tenant (project). Multiplying the server disk size (in GiB) by hours the server exists, and then adding that all together for each server.	Dependent item	openstack.nova.tenant.diskusage[{#TENANTID}] Preprocessing JSON Path: `$.total_local_gb_usage` ⛔️Custom on fail: Set error to: `Could not parse the detailed tenant report` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`
Tenant [{#TENANT_ID}]: Total memory usage	Total memory usage hours for the current tenant (project). Multiplying the server memory size (in MiB) by hours the server exists, and then adding that all together for each server.	Dependent item	openstack.nova.tenant.totalmemorymbusage[{#TENANTID}] Preprocessing JSON Path: `$.total_memory_mb_usage` ⛔️Custom on fail: Set error to: `Could not parse the detailed tenant report` JavaScript: `return Math.round(value * 100) / 100;` Discard unchanged with heartbeat: `1h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

cloud

cloud_gcp_http

View README Download JSON

GCP by HTTP

Overview

This template is designed to monitor Google Cloud Platform (hereinafter - GCP) by Zabbix. It works without any external scripts and uses the script item. The template currently supports the discovery of Compute Engine/Cloud SQL instances and Compute Engine project quota metrics.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Google Cloud Platform

Configuration

Setup

Enable the Stackdriver Monitoring API for the GCP project you wish to monitor. >Refer to the vendor documentation.
Create a service account in Google Cloud console for the project you have to monitor. >Refer to the vendor documentation.
Create and download the service account key in JSON format. >Refer to the vendor documentation.
If you want to monitor Cloud SQL services - don't forget to activate the Cloud SQL Admin API. >Refer to the vendor documentation for the details.
Copy the project_id, private_key_id, private_key, client_email from the JSON key file and add them to their corresponding macros {$GCP.PROJECT.ID}, {$GCP.PRIVATE.KEY.ID}, {$GCP.PRIVATE.KEY}, {$GCP.CLIENT.EMAIL} on the template/host.

Additional information:

Make sure that you're creating the service account using the credentials with the `Project Owner/Project IAM Admin/service account Admin` role.

The service account JSON key file can only be downloaded once: regenerate it if the previous key has been lost.

The service account should have `Project Viewer` permissions or granular permissions for the GCP Compute Engine API/GCP Cloud SQL.

You can copy and paste private_key string data from the Service Account JSON key file as is or replace the new line metasymbol (\n) with an actual new line.

IMPORTANT!!!

 Secret authorization token is defined as a plain text in host prototype settings by default due to Zabbix templates export/import limits: therefore, it is highly recommended to change the user macro `{$GCP.AUTH.TOKEN}` value type to `SECRET` for all host prototypes after the template `GCP by HTTP` import.

 All the instances/quotas/metrics discovered are related to a particular GCP project.
 To monitor several GCP projects - create their corresponding service accounts/Zabbix hosts.

 GCP Access Token is available for 1 hour (3600 seconds) after the generation request.

 To avoid a GCP token inconsistency between Zabbix database and Zabbix server configuration cache, don't set Zabbix server configuration parameter CacheUpdateFrequency to a value over 45 minutes and don't set the update interval for the GCP Authorization item to more than 1 hour (maximum CacheUpdateFrequency value).

Additional information about metrics and used API methods:

Compute Engine

Cloud SQL

Macros used

Name	Description	Default
{$GCP.PROJECT.ID}	GCP project ID.
{$GCP.CLIENT.EMAIL}	Service account client e-mail.
{$GCP.PRIVATE.KEY.ID}	Service account private key id.
{$GCP.PRIVATE.KEY}	Service account private key data.
{$GCP.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$GCP.DATA.TIMEOUT}	A response timeout for an API.	`15s`
{$GCP.AUTH.FREQUENCY}	The update interval for the GCP Authorization item, which also equals to the access token regeneration request frequency. Check the template documentation notes carefully for more details.	`45m`
{$GCP.GCE.INST.NAME.MATCHES}	The filter to include GCP Compute Engine instances by namespace.	`.*`
{$GCP.GCE.INST.NAME.NOT_MATCHES}	The filter to exclude GCP Compute Engine instances by namespace.	`CHANGE_IF_NEEDED`
{$GCP.GCE.ZONE.MATCHES}	The filter to include GCP Compute Engine instances by zone.	`.*`
{$GCP.GCE.ZONE.NOT_MATCHES}	The filter to exclude GCP Compute Engine instances by zone.	`CHANGE_IF_NEEDED`
{$GCP.MYSQL.INST.NAME.MATCHES}	The filter to include GCP Cloud SQL MySQL instances by namespace.	`.*`
{$GCP.MYSQL.INST.NAME.NOT_MATCHES}	The filter to exclude GCP Cloud SQL MySQL instances by namespace.	`CHANGE_IF_NEEDED`
{$GCP.MYSQL.ZONE.MATCHES}	The filter to include GCP Cloud SQL MySQL instances by zone.	`.*`
{$GCP.MYSQL.ZONE.NOT_MATCHES}	The filter to exclude GCP Cloud SQL MySQL instances by zone.	`CHANGE_IF_NEEDED`
{$GCP.MYSQL.INST.TYPE.MATCHES}	The filter to include GCP Cloud SQL MySQL instances by type (standalone/replica).	`.*`
{$GCP.MYSQL.INST.TYPE.NOT_MATCHES}	The filter to exclude GCP Cloud SQL MySQL instances by type (standalone/replica). Set a macro value 'CLOUDSQLINSTANCE' to exclude standalone Instances or 'READREPLICAINSTANCE' to exclude read-only Replicas.	`CHANGE_IF_NEEDED`
{$GCP.PGSQL.INST.NAME.MATCHES}	The filter to include GCP Cloud SQL PostgreSQL instances by namespace.	`.*`
{$GCP.PGSQL.INST.NAME.NOT_MATCHES}	The filter to exclude GCP Cloud SQL PostgreSQL instances by namespace.	`CHANGE_IF_NEEDED`
{$GCP.PGSQL.ZONE.MATCHES}	The filter to include GCP Cloud SQL PostgreSQL instances by zone.	`.*`
{$GCP.PGSQL.ZONE.NOT_MATCHES}	The filter to exclude GCP Cloud SQL PostgreSQL instances by zone.	`CHANGE_IF_NEEDED`
{$GCP.PGSQL.INST.TYPE.MATCHES}	The filter to include GCP Cloud SQL PostgreSQL instances by type (standalone/replica).	`.*`
{$GCP.PGSQL.INST.TYPE.NOT_MATCHES}	The filter to exclude GCP Cloud SQL PostgreSQL instances by type (standalone/replica). Set a macro value 'CLOUDSQLINSTANCE' to exclude standalone Instances or 'READREPLICAINSTANCE' to exclude read-only Replicas.	`CHANGE_IF_NEEDED`
{$GCP.MSSQL.INST.NAME.MATCHES}	The filter to include GCP Cloud SQL MSSQL instances by namespace.	`.*`
{$GCP.MSSQL.INST.NAME.NOT_MATCHES}	The filter to exclude GCP Cloud SQL MSSQL instances by namespace.	`CHANGE_IF_NEEDED`
{$GCP.MSSQL.ZONE.MATCHES}	The filter to include GCP Cloud SQL MSSQL instances by zone.	`.*`
{$GCP.MSSQL.ZONE.NOT_MATCHES}	The filter to exclude GCP Cloud SQL MSSQL instances by zone.	`CHANGE_IF_NEEDED`
{$GCP.MSSQL.INST.TYPE.MATCHES}	The filter to include GCP Cloud SQL MSSQL instances by type (standalone/replica).	`.*`
{$GCP.MSSQL.INST.TYPE.NOT_MATCHES}	The filter to exclude GCP Cloud SQL MSSQL instances by type (standalone/replica). Set a macro value 'CLOUDSQLINSTANCE' to exclude standalone Instances or 'READREPLICAINSTANCE' to exclude read-only Replicas.	`CHANGE_IF_NEEDED`
{$GCP.GCE.QUOTA.MATCHES}	The filter to include GCP Compute Engine project quotas by namespace.	`.*`
{$GCP.GCE.QUOTA.NOT_MATCHES}	The filter to exclude GCP Compute Engine project quotas by namespace.	`CHANGE_IF_NEEDED`
{$GCP.GCE.QUOTA.PUSED.MIN.WARN}	GCP Compute Engine project quota warning utilization threshold.	`80`
{$GCP.GCE.QUOTA.PUSED.MIN.CRIT}	GCP Compute Engine project quota critical quota utilization threshold.	`95`

Items

Name	Description	Type	Key and additional info
Authorization	Google Cloud Platform REST authorization with service account authentication parameters and temporary-generated RSA-based JWT-token usage. The necessary scopes are pre-defined. Returns a signed authorization token with 1 hour lifetime; it is required only once, and is used for all the dependent script items. Check the template documentation for the details.	Script	gcp.authorization
Instances get	Get GCP Compute Engine instances.	Dependent item	gcp.gce.instances.get Preprocessing JavaScript: `The text is too long. Please see the template.`
Authorization errors check	A list of errors from API requests.	Dependent item	gcp.auth.err.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``
Instances get	GCP Cloud SQL: Instances get.	Dependent item	gcp.cloudsql.instances.get Preprocessing JavaScript: `The text is too long. Please see the template.`
Cloud SQL instances total	GCP Cloud SQL instances total count.	Dependent item	gcp.cloudsql.instances.total Preprocessing JSON Path: `$.[*].length()`
MSSQL instances count	GCP Cloud SQL MSSQL instances count.	Dependent item	gcp.cloudsql.instances.mssql_count Preprocessing JSON Path: `$[?(@.db_type =~ 'SQLSERVER')].length()` ⛔️Custom on fail: Discard value
MySQL instances count	GCP Cloud SQL MySQL instances count.	Dependent item	gcp.cloudsql.instances.mysql_count Preprocessing JSON Path: `$[?(@.db_type =~ 'MYSQL')].length()` ⛔️Custom on fail: Discard value
PostgreSQL instances count	GCP Cloud SQL PostgreSQL instances count.	Dependent item	gcp.cloudsql.instances.pgsql_count Preprocessing JSON Path: `$[?(@.db_type =~ 'POSTGRES')].length()` ⛔️Custom on fail: Discard value
GCE instances total	GCP Compute Engine instances total count.	Dependent item	gcp.gce.instances.total Preprocessing JSON Path: `$.[*].length()` ⛔️Custom on fail: Discard value
Regular GCE instances count	GCP Compute Engine: Regular instances count.	Dependent item	gcp.gce.instances.regular_count Preprocessing JSON Path: `$[?(@.i_type == 'regular')].length()` ⛔️Custom on fail: Discard value
Container-optimized GCE instances count	GCP Compute Engine: count of instances with Container-Optimized OS used.	Dependent item	gcp.gce.instances.cos_count Preprocessing JSON Path: `$[?(@.i_type == 'container-optimized')].length()` ⛔️Custom on fail: Discard value
Project quotas get	GCP Compute Engine resource quotas available for the particular project.	Dependent item	gcp.gce.quotas.get Preprocessing JavaScript: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
GCP: Authorization has failed	GCP: Authorization has failed. Check the authorization parameters and GCP API availability from a network segment, where Zabbix-server/proxy is located.	`length(last(/GCP by HTTP/gcp.auth.err.check)) > 0`\|Average

LLD rule GCP Compute Engine: Instances discovery

Name Description Type Key and additional info

GCP Compute Engine: Instances discovery

GCP Compute Engine: Instances discovery.

Dependent item

gcp.gce.inst.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

LLD rule GCP Cloud SQL: PostgreSQL instances discovery

Name Description Type Key and additional info

GCP Cloud SQL: PostgreSQL instances discovery

GCP Cloud SQL: PostgreSQL instances discovery.

Dependent item

gcp.cloudsql.pgsql.inst.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

LLD rule GCP Cloud SQL: MSSQL instances discovery

Name Description Type Key and additional info

GCP Cloud SQL: MSSQL instances discovery

GCP Cloud SQL: MSSQL instances discovery.

Dependent item

gcp.cloudsql.mssql.inst.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

LLD rule GCP Cloud SQL: MySQL instances discovery

Name Description Type Key and additional info

GCP Cloud SQL: MySQL instances discovery

GCP Cloud SQL: MySQL instances discovery.

Dependent item

gcp.cloudsql.mysql.inst.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

LLD rule GCP Compute Engine: Project quotas discovery

Name Description Type Key and additional info

GCP Compute Engine: Project quotas discovery

GCP Compute Engine: Quotas discovery.

Dependent item

gcp.gce.quotas.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for GCP Compute Engine: Project quotas discovery

Name	Description	Type	Key and additional info
Quota [{#GCE.QUOTA.NAME}]: Raw data	GCP Compute Engine: Get metrics for [{#GCE.QUOTA.NAME}] quota.	Dependent item	gcp.gce.quota.single.raw[{#GCE.QUOTA.NAME}] Preprocessing JSON Path: `$[?(@.metric == "{#GCE.QUOTA.NAME}")].first()`
Quota [{#GCE.QUOTA.NAME}]: Usage	GCP Compute Engine: The current usage value for [{#GCE.QUOTA.NAME}] quota.	Dependent item	gcp.gce.quota.usage[{#GCE.QUOTA.NAME}] Preprocessing JSON Path: `$.usage`
Quota [{#GCE.QUOTA.NAME}]: Limit	GCP Compute Engine: The current limit value for [{#GCE.QUOTA.NAME}] quota.	Dependent item	gcp.gce.quota.limit[{#GCE.QUOTA.NAME}] Preprocessing JSON Path: `$.limit`
Quota [{#GCE.QUOTA.NAME}]: Percentage used	GCP Compute Engine: Percentage usage for [{#GCE.QUOTA.NAME}] quota.	Dependent item	gcp.gce.quota.pused[{#GCE.QUOTA.NAME}] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.p_used`

Trigger prototypes for GCP Compute Engine: Project quotas discovery

Name	Description	Expression	Severity
GCP: Quota [{#GCE.QUOTA.NAME}] limit has been changed	GCP Compute Engine: The limit for the `{#GCE.QUOTA.NAME}` quota has been changed.	`change(/GCP by HTTP/gcp.gce.quota.limit[{#GCE.QUOTA.NAME}]) <> 0`\|Info	Manual close: Yes
GCP: Quota [{#GCE.QUOTA.NAME}] usage is close to reaching the limit	GCP Compute Engine: The usage percentage for the `{#GCE.QUOTA.NAME}` quota is close to reaching the limit.	`last(/GCP by HTTP/gcp.gce.quota.pused[{#GCE.QUOTA.NAME}]) >= {$GCP.GCE.QUOTA.PUSED.MIN.WARN:"{#GCE.QUOTA.NAME}"}`\|Warning	Manual close: Yes Depends on: GCP: Quota [{#GCE.QUOTA.NAME}] usage is critically close to reaching the limit
GCP: Quota [{#GCE.QUOTA.NAME}] usage is critically close to reaching the limit	GCP Compute Engine: The usage percentage for the `{#GCE.QUOTA.NAME}` quota is critically close to reaching the limit.	`last(/GCP by HTTP/gcp.gce.quota.pused[{#GCE.QUOTA.NAME}]) >= {$GCP.GCE.QUOTA.PUSED.MIN.CRIT:"{#GCE.QUOTA.NAME}"}`\|Average	Manual close: Yes

GCP Compute Engine Instance by HTTP

Overview

This template is designed to monitor Google Cloud Platform Compute Engine instances by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Compute Engine

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name	Description	Default
{$GCP.DATA.TIMEOUT}	A response timeout for an API.	`15s`
{$GCP.TIME.WINDOW}	Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request.	`5m`
{$GCP.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$GCE.DISK.NAME.MATCHES}	The filter to include GCP Compute Engine disks by namespace.	`.*`
{$GCE.DISK.NAME.NOT_MATCHES}	The filter to exclude GCP Compute Engine disks by namespace.	`CHANGE_IF_NEEDED`
{$GCE.DISK.DEV_TYPE.MATCHES}	The filter to include GCP Compute Engine disks by device type.	`.*`
{$GCE.DISK.DEVTYPE.NOTMATCHES}	The filter to exclude GCP Compute Engine disks by device type.	`CHANGE_IF_NEEDED`
{$GCE.DISK.STOR_TYPE.MATCHES}	The filter to include GCP Compute Engine disks by storage type.	`.*`
{$GCE.DISK.STORTYPE.NOTMATCHES}	The filter to exclude GCP Compute Engine disks by storage type.	`CHANGE_IF_NEEDED`
{$GCE.CPU.UTIL.MAX}	GCP Compute Engine instance CPU utilization threshold.	`95`
{$GCE.RAM.UTIL.MAX}	GCP Compute Engine instance RAM utilization threshold.	`90`

Items

Name	Description	Type	Key and additional info
Metrics get	GCP Compute Engine metrics get in raw format.	Script	gcp.gce.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Firewall: Dropped packets	Count of incoming packets dropped by the firewall.	Dependent item	gcp.gce.firewall.droppedpacketscount Preprocessing JSON Path: `$.dropped_packets_count`
Firewall: Dropped bytes	Count of incoming bytes dropped by the firewall.	Dependent item	gcp.gce.firewall.droppedbytescount Preprocessing JSON Path: `$.dropped_bytes_count`
Guest visible vCPUs	Number of vCPUs visible inside the guest. For many GCE machine types, the number of vCPUs visible inside the guest is equal to the `compute.googleapis.com/instance/cpu/reserved_cores` metric. For shared-core machine types, the number of guest-visible vCPUs differs from the number of reserved cores. For example, e2-small instances have two vCPUs visible inside the guest and 0.5 fractional vCPUs reserved. Therefore, for an e2-small instance, `compute.googleapis.com/instance/cpu/guest_visible_vcpus` has a value of 2 and `compute.googleapis.com/instance/cpu/reserved_cores` has a value of 0.5.	Dependent item	gcp.gce.cpu.guestvisiblevcpus Preprocessing JSON Path: `$.guest_visible_vcpus`
Reserved vCPUs	Number of vCPUs reserved on the host of the instance.	Dependent item	gcp.gce.cpu.reserved_cores Preprocessing JSON Path: `$.reserved_cores`
Scheduler wait time	Wait time is the time a vCPU is ready to run, but unexpectedly not scheduled to run. The wait time returned here is the accumulated value for all vCPUs. The time interval for which the value was measured is returned by Monitoring in whole seconds as starttime and endtime. This metric is only available for VMs that belong to the e2 family or to overcommitted VMs on sole-tenant nodes.	Dependent item	gcp.gce.cpu.schedulerwaittime Preprocessing JSON Path: `$.scheduler_wait_time` ⛔️Custom on fail: Discard value
CPU usage time	Delta vCPU usage for all vCPUs, in vCPU-seconds. To compute the per-vCPU utilization fraction, divide this value by (end-start)*N, where end and start define this value's time interval and N is `compute.googleapis.com/instance/cpu/reserved_cores` at the end of the interval. This value is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/usage_time`, which is reported from inside the VM.	Dependent item	gcp.gce.cpu.usage_time Preprocessing JSON Path: `$.usage_time`
CPU utilization	Fractional utilization of allocated CPU on this instance. This metric is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/utilization`, which is reported from inside the VM.	Dependent item	gcp.gce.cpu.utilization Preprocessing JSON Path: `$.utilization` Custom multiplier: `100`
Memory size	Total VM memory size. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types.	Dependent item	gcp.gce.memory.ram_size Preprocessing JSON Path: `$.ram_size` ⛔️Custom on fail: Discard value
Memory used	Memory currently used in the VM. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types.	Dependent item	gcp.gce.memory.ram_used Preprocessing JSON Path: `$.ram_used` ⛔️Custom on fail: Discard value
Memory usage percentage	Memory usage Percentage. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types.	Dependent item	gcp.gce.memory.ram_pused Preprocessing JSON Path: `$.ram_pused` ⛔️Custom on fail: Discard value
VM swap in	The amount of memory read into the guest from its own swap space. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types.	Dependent item	gcp.gce.memory.swapinbytes_count Preprocessing JSON Path: `$.swap_in_bytes_count` ⛔️Custom on fail: Discard value
VM swap out	The amount of memory written from the guest to its own swap space. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types.	Dependent item	gcp.gce.memory.swapoutbytes_count Preprocessing JSON Path: `$.swap_out_bytes_count` ⛔️Custom on fail: Discard value
Network: Received bytes	Count of bytes received from the network without load-balancing.	Dependent item	gcp.gce.network.lb.receivedbytescount.false Preprocessing JSON Path: `$.received_bytes_count.false`
Network: Received bytes: Load-balanced	Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used.	Dependent item	gcp.gce.network.lb.receivedbytescount.true Preprocessing JSON Path: `$.received_bytes_count.true` ⛔️Custom on fail: Discard value
Network: Received packets	Count of packets received from the network without load-balancing.	Dependent item	gcp.gce.network.lb.receivedpacketscount.false Preprocessing JSON Path: `$.received_packets_count.false`
Network: Received packets: Load-balanced	Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used.	Dependent item	gcp.gce.network.lb.receivedpacketscount.true Preprocessing JSON Path: `$.received_packets_count.true` ⛔️Custom on fail: Discard value
Network: Sent bytes	Count of bytes sent over the network without load-balancing.	Dependent item	gcp.gce.network.lb.sentbytescount.false Preprocessing JSON Path: `$.sent_bytes_count.false`
Network: Sent bytes: Load-balanced	Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used.	Dependent item	gcp.gce.network.lb.sentbytescount.true Preprocessing JSON Path: `$.sent_bytes_count.true` ⛔️Custom on fail: Discard value
Network: Sent packets	Count of packets sent over the network without load-balancing.	Dependent item	gcp.gce.network.lb.sentpacketscount.false Preprocessing JSON Path: `$.sent_packets_count.false`
Network: Sent packets: Load-balanced	Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used.	Dependent item	gcp.gce.network.lb.sentpacketscount.true Preprocessing JSON Path: `$.sent_packets_count.true` ⛔️Custom on fail: Discard value
Network: Mirrored bytes	The count of mirrored bytes.	Dependent item	gcp.gce.network.mirroredbytescount Preprocessing JSON Path: `$.mirrored_bytes_count`
Network: Mirrored packets	The count of mirrored packets.	Dependent item	gcp.gce.network.mirroredpacketscount Preprocessing JSON Path: `$.mirrored_packets_count`
Network: Mirrored packets dropped: Out of quota	The count of mirrored packets dropped. Reason - out of quota.	Dependent item	gcp.gce.network.mirrdroppedpackets.outofquota Preprocessing JSON Path: `$.out_of_quota`
Network: Mirrored packets dropped: Unknown	The count of mirrored packets dropped. Reason - unknown.	Dependent item	gcp.gce.network.mirrdroppedpackets.unknown Preprocessing JSON Path: `$.unknown`
Network: Mirrored packets dropped: Invalid	The count of mirrored packets dropped. Reason - invalid.	Dependent item	gcp.gce.network.mirrdroppedpackets.invalid Preprocessing JSON Path: `$.invalid`
Integrity: Early boot validation status	The validation status of early boot integrity policy. Empty value if integrity monitoring isn't enabled.	Dependent item	gcp.gce.integrity.earlybootvalidation_status Preprocessing JSON Path: `$.early_boot_validation_status` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Integrity: Late boot validation status	The validation status of late boot integrity policy. Empty value if integrity monitoring isn't enabled.	Dependent item	gcp.gce.integrity.latebootvalidation_status Preprocessing JSON Path: `$.late_boot_validation_status` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Instance uptime	Elapsed time since the VM was started, in seconds.	Dependent item	gcp.gce.instance.uptime Preprocessing JSON Path: `$.uptime_total`
Instance state	GCP Compute Engine instance state.	HTTP agent	gcp.gce.instance.state Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set value to: `10` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
Disks get	Disk entities and metrics related to a particular instance.	Script	gcp.gce.disks.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
GCP Compute Engine Instance: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/GCP Compute Engine Instance by HTTP/gcp.gce.cpu.utilization,15m) >= {$GCE.CPU.UTIL.MAX}`\|Average	Manual close: Yes
GCP Compute Engine Instance: High memory utilization	RAM utilization is too high. The system might be slow to respond.	`min(/GCP Compute Engine Instance by HTTP/gcp.gce.memory.ram_pused,15m) >= {$GCE.RAM.UTIL.MAX}`\|Average
GCP Compute Engine Instance: Instance is in suspended state	The VM is in a suspended state. You can resume the VM or delete it.	`last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 7`\|Info	Manual close: Yes
GCP Compute Engine Instance: The instance is in repairing state	The VM is being repaired. Repairing occurs when the VM encounters an internal error or the underlying machine is unavailable due to maintenance. During this time, the VM is unusable.	`last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 4`\|Warning	Manual close: Yes
GCP Compute Engine Instance: The instance is in terminated state	The VM is stopped. You stopped the VM, or the VM encountered a failure.	`last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 5`\|Average	Manual close: Yes
GCP Compute Engine Instance: Failed to get the instance state	Failed to get the instance state. Check access permissions to GCP API or service account.	`last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 10`\|Average	Manual close: Yes

LLD rule GCP Compute Engine: Physical disks discovery

Name Description Type Key and additional info

GCP Compute Engine: Physical disks discovery

GCP Compute Engine: Physical disks discovery.

Dependent item

gcp.gce.phys.disks.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for GCP Compute Engine: Physical disks discovery

Name	Description	Type	Key and additional info
Disk [{#GCE.DISK.NAME}]: Raw data	Data in raw format for the disk with the name [{#GCE.DISK.NAME}].	Dependent item	gcp.gce.quota.single.raw[{#GCE.DISK.NAME}] Preprocessing JSON Path: `$[?(@.disk_name == "{#GCE.DISK.NAME}")].metrics.first()`
Disk [{#GCE.DISK.NAME}]: Read bytes	Count of bytes read from [{#GCE.DISK.NAME}] disk.	Dependent item	gcp.gce.disk.readbytescount[{#GCE.DISK.NAME}] Preprocessing JSON Path: `$.read_bytes_count`
Disk [{#GCE.DISK.NAME}]: Read operations	Count of read IO operations from [{#GCE.DISK.NAME}] disk.	Dependent item	gcp.gce.disk.readopscount[{#GCE.DISK.NAME}] Preprocessing JSON Path: `$.read_ops_count`
Disk [{#GCE.DISK.NAME}]: Write bytes	Count of bytes written to {#GCE.DISK.NAME}] disk.	Dependent item	gcp.gce.disk.writebytescount[{#GCE.DISK.NAME}] Preprocessing JSON Path: `$.write_bytes_count`
Disk [{#GCE.DISK.NAME}]: Write operations	Count of write IO operations to [{#GCE.DISK.NAME}] disk.	Dependent item	gcp.gce.disk.writeopscount[{#GCE.DISK.NAME}] Preprocessing JSON Path: `$.write_ops_count`

GCP Cloud SQL MySQL by HTTP

Overview

This template is designed to monitor Google Cloud Platform Cloud SQL MySQL instances by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Cloud SQL MySQL versions: 8.0, 5.7

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name	Description	Default
{$GCP.DATA.TIMEOUT}	A response timeout for an API.	`15s`
{$GCP.TIME.WINDOW}	Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request.	`5m`
{$GCP.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$CLOUD_SQL.MYSQL.DISK.UTIL.WARN}	GCP Cloud SQL MySQL instance warning disk usage threshold.	`80`
{$CLOUD_SQL.MYSQL.DISK.UTIL.CRIT}	GCP Cloud SQL MySQL instance critical disk usage threshold.	`90`
{$CLOUD_SQL.MYSQL.CPU.UTIL.MAX}	GCP Cloud SQL MySQL instance CPU usage threshold.	`95`
{$CLOUD_SQL.MYSQL.RAM.UTIL.MAX}	GCP Cloud SQL MySQL instance RAM usage threshold.	`90`

Items

Name	Description	Type	Key and additional info
Metrics get	MySQL metrics in raw format.	Script	gcp.cloudsql.mysql.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Reserved CPU cores	Number of cores reserved for the database.	Dependent item	gcp.cloudsql.mysql.cpu.reserved_cores Preprocessing JSON Path: `$.base_reserved_cores`
CPU usage time	Cumulative CPU usage time in seconds.	Dependent item	gcp.cloudsql.mysql.cpu.usage_time Preprocessing JSON Path: `$.base_usage_time`
CPU utilization	Current CPU utilization represented as a percentage of the reserved CPU that is currently in use.	Dependent item	gcp.cloudsql.mysql.cpu.utilization Preprocessing JSON Path: `$.base_utilization` Custom multiplier: `100`
Disk size	Maximum data disk size in bytes.	Dependent item	gcp.cloudsql.mysql.disk.quota Preprocessing JSON Path: `$.mysql_quota`
Disk bytes used	Data utilization in bytes.	Dependent item	gcp.cloudsql.mysql.disk.bytes_used Preprocessing JSON Path: `$.mysql_bytes_used`
Disk read I/O	Delta count of data disk read I/O operations.	Dependent item	gcp.cloudsql.mysql.disk.readopscount Preprocessing JSON Path: `$.base_read_ops_count`
Disk write I/O	Delta count of data disk write I/O operations.	Dependent item	gcp.cloudsql.mysql.disk.writeopscount Preprocessing JSON Path: `$.base_write_ops_count`
Disk utilization	The fraction of the disk quota that is currently in use. Shown as percentage.	Dependent item	gcp.cloudsql.mysql.disk.utilization Preprocessing JSON Path: `$.mysql_utilization` Custom multiplier: `100`
Memory size	Maximum RAM size in bytes.	Dependent item	gcp.cloudsql.mysql.memory.quota Preprocessing JSON Path: `$.base_quota`
Memory used by DB engine	Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache.	Dependent item	gcp.cloudsql.mysql.memory.total_usage Preprocessing JSON Path: `$.base_total_usage`
Memory usage	The RAM usage in bytes. This metric reports the RAM usage of the server, excluding the buffer/cache.	Dependent item	gcp.cloudsql.mysql.memory.usage Preprocessing JSON Path: `$.base_usage`
Memory utilization	The fraction of the memory quota that is currently in use. Shown as percentage.	Dependent item	gcp.cloudsql.mysql.memory.utilization Preprocessing JSON Path: `$.base_ram_pused`
Network: Received bytes	Delta count of bytes received through the network.	Dependent item	gcp.cloudsql.mysql.network.receivedbytescount Preprocessing JSON Path: `$.base_received_bytes_count`
Network: Sent bytes	Delta count of bytes sent through the network.	Dependent item	gcp.cloudsql.mysql.network.sentbytescount Preprocessing JSON Path: `$.base_sent_bytes_count`
Connections	Number of connections to the databases on the Cloud SQL instance.	Dependent item	gcp.cloudsql.mysql.network.connections Preprocessing JSON Path: `$.base_connections`
Instance state	GCP Cloud SQL MySQL Current instance state.	HTTP agent	gcp.cloudsql.mysql.inst.state Preprocessing JSON Path: `$.timeSeriesData[0].pointData[0].values[0].stringValue` ⛔️Custom on fail: Set value to: `10` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
DB engine state	GCP Cloud SQL MySQL DB Engine State.	HTTP agent	gcp.cloudsql.mysql.db.state Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.timeSeriesData[0].pointData[0].values[0].int64Value` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `10m`
InnoDB dirty pages	Number of unflushed pages in the InnoDB buffer pool.	Dependent item	gcp.cloudsql.mysql.innodbbufferpoolpagesdirty Preprocessing JSON Path: `$.mysql_innodb_buffer_pool_pages_dirty`
InnoDB free pages	Number of unused pages in the InnoDB buffer pool.	Dependent item	gcp.cloudsql.mysql.innodbbufferpoolpagesfree Preprocessing JSON Path: `$.mysql_innodb_buffer_pool_pages_free`
InnoDB total pages	Total number of pages in the InnoDB buffer pool.	Dependent item	gcp.cloudsql.mysql.innodbbufferpoolpagestotal Preprocessing JSON Path: `$.mysql_innodb_buffer_pool_pages_total`
InnoDB fsync calls	Delta count of InnoDB fsync() calls.	Dependent item	gcp.cloudsql.mysql.innodbdatafsyncs Preprocessing JSON Path: `$.mysql_innodb_data_fsyncs`
InnoDB log fsync calls	Delta count of InnoDB fsync() calls to the log file.	Dependent item	gcp.cloudsql.mysql.innodboslog_fsyncs Preprocessing JSON Path: `$.mysql_innodb_os_log_fsyncs`
InnoDB pages read	Delta count of InnoDB pages read.	Dependent item	gcp.cloudsql.mysql.innodbpagesread Preprocessing JSON Path: `$.mysql_innodb_pages_read`
InnoDB pages written	Delta count of InnoDB pages written.	Dependent item	gcp.cloudsql.mysql.innodbpageswritten Preprocessing JSON Path: `$.mysql_innodb_pages_written`
Open tables	The number of tables that are currently open.	Dependent item	gcp.cloudsql.mysql.open_tables Preprocessing JSON Path: `$.mysql_open_tables`
Open table definitions	The number of table definitions that are currently cached.	Dependent item	gcp.cloudsql.mysql.opentabledefinitions Preprocessing JSON Path: `$.mysql_open_table_definitions`
Queries	Delta of statements executed by the server.	Dependent item	gcp.cloudsql.queries Preprocessing JSON Path: `$.mysql_queries`
Questions	Delta of statements executed by the server sent by the client.	Dependent item	gcp.cloudsql.questions Preprocessing JSON Path: `$.mysql_questions`
Network: Bytes received by MySQL	Delta count of bytes received by MySQL process.	Dependent item	gcp.cloudsql.mysqlreceivedbytes_count Preprocessing JSON Path: `$.mysql_received_bytes_count`
Network: Bytes sent by MySQL	Delta count of bytes sent by MySQL process.	Dependent item	gcp.cloudsql.mysqlsentbytes_count Preprocessing JSON Path: `$.mysql_sent_bytes_count`

Triggers

Name	Description	Expression	Severity
GCP MySQL: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.cpu.utilization,5m) >= {$CLOUD_SQL.MYSQL.CPU.UTIL.MAX}`\|Average
GCP MySQL: Disk space is low	High utilization of the storage space.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.disk.utilization) >= {$CLOUD_SQL.MYSQL.DISK.UTIL.WARN}`\|Warning	Depends on: GCP MySQL: Disk space is critically low
GCP MySQL: Disk space is critically low	Critical utilization of the disk space.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.disk.utilization) >= {$CLOUD_SQL.MYSQL.DISK.UTIL.CRIT}`\|Average
GCP MySQL: High memory utilization	RAM utilization is too high. The system might be slow to respond.	`min(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.memory.utilization,5m) >= {$CLOUD_SQL.MYSQL.RAM.UTIL.MAX}`\|High
GCP MySQL: Instance is in suspended state	The instance is in suspended state. It is not available, for example, due to problems with billing.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 1`\|Warning
GCP MySQL: Instance is stopped by the owner	The instance has been stopped by the owner. It is not currently running, but it's ready to be restarted.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 2`\|Info
GCP MySQL: Instance is in maintenance	The instance is down for maintenance.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 4`\|Info
GCP MySQL: Instance is in failed state	The instance creation failed, or an operation left the instance in an own bad state.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 5`\|Average
GCP MySQL: Instance is in unknown state	The state of the instance is unknown.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 6`\|Average
GCP MySQL: Failed to get the instance state	Failed to get the instance state. Check access permissions to GCP API or service account.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 10`\|Average
GCP MySQL: Database engine is down	Database engine is down. If an instance experiences unplanned (non-maintenance) downtime, the instance state will still be RUNNING, but the database engine state metric will report 0.	`last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.db.state)=0`\|Average	Depends on: GCP MySQL: Instance is stopped by the owner GCP MySQL: Instance is in suspended state GCP MySQL: Instance is in maintenance GCP MySQL: Instance is in failed state GCP MySQL: Instance is in unknown state GCP MySQL: Failed to get the instance state

GCP Cloud SQL MySQL Replica by HTTP

Overview

This template is designed to monitor Google Cloud Platform Cloud SQL metrics for the MySQL read-only replica instances by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Cloud SQL MySQL read replica versions: 8.0, 5.7

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name Description Default

{$GCP.DATA.TIMEOUT}

A response timeout for an API.

15s

{$GCP.TIME.WINDOW}

Time interval for the data requests.

Supported usage type:

1. The default update interval for most of the items.

2. The minimal time window for the data requested in the Monitoring Query Language REST API request.

5m

{$GCP.PROXY}

Sets HTTP proxy value. If this macro is empty then no proxy is used.

Items

Name	Description	Type	Key and additional info
Replica metrics get	MySQL replication metrics data in raw format.	Script	gcp.cloudsql.mysql.repl.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Last I/O thread error number	The error number of the most recent error that caused the I/O thread to stop.	Dependent item	gcp.cloudsql.mysql.repl.lastioerrno Preprocessing JSON Path: `$.last_io_errno`
Last SQL thread error number	The error number of the most recent error that caused the SQL thread to stop.	Dependent item	gcp.cloudsql.mysql.repl.lastsqlerrno Preprocessing JSON Path: `$.last_sql_errno`
Replication lag	Number of seconds the read replica is behind its primary (approximation).	Dependent item	gcp.cloudsql.mysql.repl.replica_lag Preprocessing JSON Path: `$.replica_lag`
Network lag	Indicates time taken from primary binary log to IO thread on replica.	Dependent item	gcp.cloudsql.mysql.repl.network_lag Preprocessing JSON Path: `$.network_lag`
Replication state	The current serving state of replication. This metric is only available for the MySQL/PostgreSQL instances.	Dependent item	gcp.cloudsql.mysql.repl.state Preprocessing JSON Path: `$.state` JavaScript: `The text is too long. Please see the template.`
Slave I/O thread running	Indicates whether the I/O thread for reading the primary's binary log is running. Possible values are Yes, No and Connecting.	Dependent item	gcp.cloudsql.mysql.repl.slaveiorunning Preprocessing JSON Path: `$.slave_io_running` JavaScript: `The text is too long. Please see the template.`
Slave SQL thread running	Indicates whether the SQL thread for executing events in the relay log is running.	Dependent item	gcp.cloudsql.mysql.repl.slavesqlrunning Preprocessing JSON Path: `$.slave_sql_running` Boolean to decimal

GCP Cloud SQL PostgreSQL by HTTP

Overview

This template is designed to monitor Google Cloud Platform Cloud SQL PostgreSQL database metrics by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Cloud SQL PostgreSQL versions: 14, 13, 12

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name	Description	Default
{$GCP.DATA.TIMEOUT}	A response timeout for an API.	`15s`
{$GCP.TIME.WINDOW}	Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request.	`5m`
{$GCP.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$GCP.CLOUD_SQL.DB.NAME.MATCHES}	The filter to include GCP Cloud SQL PostgreSQL databases by namespace.	`.*`
{$GCP.CLOUDSQL.DB.NAME.NOTMATCHES}	The filter to exclude GCP Cloud SQL PostgreSQL databases by namespace.	`CHANGE_IF_NEEDED`
{$CLOUD_SQL.PGSQL.DISK.UTIL.WARN}	GCP Cloud SQL PostgreSQL instance warning disk usage threshold.	`80`
{$CLOUD_SQL.PGSQL.DISK.UTIL.CRIT}	GCP Cloud SQL PostgreSQL instance critical disk usage threshold.	`90`
{$CLOUD_SQL.PGSQL.CPU.UTIL.MAX}	GCP Cloud SQL PostgreSQL instance CPU usage threshold.	`95`
{$CLOUD_SQL.PGSQL.RAM.UTIL.MAX}	GCP Cloud SQL PostgreSQL instance RAM usage threshold.	`90`

Items

Name	Description	Type	Key and additional info
Metrics get	PostgreSQL metrics data in raw format.	Script	gcp.cloudsql.pgsql.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Reserved CPU cores	Number of cores reserved for the database.	Dependent item	gcp.cloudsql.pgsql.cpu.reserved_cores Preprocessing JSON Path: `$.base_reserved_cores`
CPU usage time	Cumulative CPU usage time in seconds.	Dependent item	gcp.cloudsql.pgsql.cpu.usage_time Preprocessing JSON Path: `$.base_usage_time`
CPU utilization	Current CPU utilization represented as a percentage of the reserved CPU that is currently in use.	Dependent item	gcp.cloudsql.pgsql.cpu.utilization Preprocessing JSON Path: `$.base_utilization` Custom multiplier: `100`
Disk size	Maximum data disk size in bytes.	Dependent item	gcp.cloudsql.pgsql.disk.quota Preprocessing JSON Path: `$.pgsql_quota`
Disk bytes used	Data utilization in bytes.	Dependent item	gcp.cloudsql.pgsql.disk.bytes_used Preprocessing JSON Path: `$.pgsql_bytes_used`
Disk read I/O	Delta count of data disk read I/O operations.	Dependent item	gcp.cloudsql.pgsql.disk.readopscount Preprocessing JSON Path: `$.base_read_ops_count`
Disk write I/O	Delta count of data disk write I/O operations.	Dependent item	gcp.cloudsql.pgsql.disk.writeopscount Preprocessing JSON Path: `$.base_write_ops_count`
Disk utilization	The fraction of the disk quota that is currently in use. Shown as percentage.	Dependent item	gcp.cloudsql.pgsql.disk.utilization Preprocessing JSON Path: `$.pgsql_utilization` Custom multiplier: `100`
Memory size	Maximum RAM size in bytes.	Dependent item	gcp.cloudsql.pgsql.memory.quota Preprocessing JSON Path: `$.base_quota`
Memory used by DB engine	Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache.	Dependent item	gcp.cloudsql.pgsql.memory.total_usage Preprocessing JSON Path: `$.base_total_usage`
Memory usage	The RAM usage in bytes. This metric reports the RAM usage of the server, excluding the buffer/cache.	Dependent item	gcp.cloudsql.pgsql.memory.usage Preprocessing JSON Path: `$.base_usage`
Memory utilization	The fraction of the memory quota that is currently in use. Shown as percentage.	Dependent item	gcp.cloudsql.pgsql.memory.utilization Preprocessing JSON Path: `$.base_ram_pused`
Network: Received bytes	Delta count of bytes received through the network.	Dependent item	gcp.cloudsql.pgsql.network.receivedbytescount Preprocessing JSON Path: `$.base_received_bytes_count`
Network: Sent bytes	Delta count of bytes sent through the network.	Dependent item	gcp.cloudsql.pgsql.network.sentbytescount Preprocessing JSON Path: `$.base_sent_bytes_count`
Instance state	GCP Cloud SQL PostgreSQL Current instance state.	HTTP agent	gcp.cloudsql.pgsql.inst.state Preprocessing JSON Path: `$.timeSeriesData[0].pointData[0].values[0].stringValue` ⛔️Custom on fail: Set value to: `10` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
DB engine state	GCP Cloud SQL PostgreSQL DB Engine State.	HTTP agent	gcp.cloudsql.pgsql.db.state Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.timeSeriesData[0].pointData[0].values[0].int64Value` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `10m`
Transaction ID utilization	Current utilization represented as a percentage of transaction IDs consumed by the Cloud SQL PostgreSQL instance.	Dependent item	gcp.cloudsql.pgsql.transactionidutilization Preprocessing JSON Path: `$.pgsql_transaction_id_utilization` Custom multiplier: `100`
Assigned transactions	Delta count of assigned transaction IDs.	Dependent item	gcp.cloudsql.pgsql.transactionidcount_assigned Preprocessing JSON Path: `$.pgsql_assigned`
Frozen transactions	Delta count of frozen transaction IDs.	Dependent item	gcp.cloudsql.pgsql.transactionidcount_frozen Preprocessing JSON Path: `$.pgsql_frozen`
Data written to temporary	Total data size (in bytes) written to temporary files by the queries.	Dependent item	gcp.cloudsql.pgsql.tempbyteswritten_count Preprocessing JSON Path: `$.pgsql_temp_bytes_written_count`
Temporary files used for writing data	Total number of temporary files used for writing data while performing algorithms such as join and sort.	Dependent item	gcp.cloudsql.pgsql.tempfileswritten_count Preprocessing JSON Path: `$.pgsql_temp_files_written_count`
Oldest running transaction age	Age of the oldest running transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type.	Dependent item	gcp.cloudsql.pgsql.oldest_transaction.running Preprocessing JSON Path: `$.pgsql_running` ⛔️Custom on fail: Discard value
Oldest prepared transaction age	Age of the oldest prepared transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type.	Dependent item	gcp.cloudsql.pgsql.oldest_transaction.prepared Preprocessing JSON Path: `$.pgsql_prepared` ⛔️Custom on fail: Discard value
Oldest replication slot transaction age	Age of the oldest replication slot transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type.	Dependent item	gcp.cloudsql.pgsql.oldesttransaction.replicationslot Preprocessing JSON Path: `$.pgsql_replication_slot` ⛔️Custom on fail: Discard value
Oldest replica transaction age	Age of the oldest replica transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type.	Dependent item	gcp.cloudsql.pgsql.oldest_transaction.replica Preprocessing JSON Path: `$.pgsql_replica` ⛔️Custom on fail: Discard value
Connections	The number of the connections to the Cloud SQL PostgreSQL instance. Includes connections to the system databases, which aren't visible by default.	Dependent item	gcp.cloudsql.pgsql.num_backends Preprocessing JSON Path: `$.pgsql_num_backends`

Triggers

Name	Description	Expression	Severity
GCP PostgreSQL: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.cpu.utilization,5m) >= {$CLOUD_SQL.PGSQL.CPU.UTIL.MAX}`\|Average
GCP PostgreSQL: Disk space is low	High utilization of the storage space.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.disk.utilization) >= {$CLOUD_SQL.PGSQL.DISK.UTIL.WARN}`\|Warning	Depends on: GCP PostgreSQL: Disk space is critically low
GCP PostgreSQL: Disk space is critically low	Critical utilization of the disk space.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.disk.utilization) >= {$CLOUD_SQL.PGSQL.DISK.UTIL.CRIT}`\|Average
GCP PostgreSQL: High memory utilization	RAM utilization is too high. The system might be slow to respond.	`min(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.memory.utilization,5m) >= {$CLOUD_SQL.PGSQL.RAM.UTIL.MAX}`\|High
GCP PostgreSQL: Instance is in suspended state	The instance is in suspended state. It is not available, for example, due to problems with billing.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 1`\|Warning
GCP PostgreSQL: Instance is stopped by the owner	The instance has been stopped by the owner. It is not currently running, but it's ready to be restarted.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 2`\|Info
GCP PostgreSQL: Instance is in maintenance	The instance is down for maintenance.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 4`\|Info
GCP PostgreSQL: Instance is in failed state	The instance creation failed, or an operation left the instance in an own bad state.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 5`\|Average
GCP PostgreSQL: Instance is in unknown state	The state of the instance is unknown.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 6`\|Average
GCP PostgreSQL: Failed to get the instance state	Failed to get the instance state. Check access permissions to GCP API or service account.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 10`\|Average
GCP PostgreSQL: Database engine is down	Database engine is down. If an instance experiences unplanned (non-maintenance) downtime, the instance state will still be RUNNING, but the database engine state metric will report 0.	`last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.db.state)=0`\|Average	Depends on: GCP PostgreSQL: Instance is stopped by the owner GCP PostgreSQL: Instance is in suspended state GCP PostgreSQL: Instance is in maintenance GCP PostgreSQL: Instance is in failed state GCP PostgreSQL: Instance is in unknown state GCP PostgreSQL: Failed to get the instance state

LLD rule GCP Cloud SQL PostgreSQL: Databases discovery

Name Description Type Key and additional info

GCP Cloud SQL PostgreSQL: Databases discovery

Databases discovery for the particular PostgreSQL instance.

HTTP agent

gcp.cloudsql.pgsql.db.discovery

Preprocessing

JSON Path: $.items
Discard unchanged with heartbeat: 3h

Item prototypes for GCP Cloud SQL PostgreSQL: Databases discovery

Name	Description	Type	Key and additional info
Database [{#PGSQL.DB.NAME}]: Metrics raw	PostgreSQL metrics in raw format.	Script	gcp.cloudsql.pgsql.db.metrics.get[{#PGSQL.DB.NAME}] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Database [{#PGSQL.DB.NAME}]: Deadlocks count	Number of deadlocks detected in the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.deadlock_count[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.deadlock_count`
Database [{#PGSQL.DB.NAME}]: Tuples returned	Total number of rows scanned while processing the queries of the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.tuplesreturnedcount[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.tuples_returned_count`
Database [{#PGSQL.DB.NAME}]: Tuples fetched	Total number of rows fetched as a result of queries to the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.tuplesfetchedcount[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.tuples_fetched_count`
Database [{#PGSQL.DB.NAME}]: Committed transactions	Delta count of number of committed transactions to the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.transactioncountcommit[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.commit`
Database [{#PGSQL.DB.NAME}]: Rolled-back transactions	Delta count of number of rolled-back transactions in the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.transactioncountrollback[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.rollback`
Database [{#PGSQL.DB.NAME}]: Buffer cache blocks read.	Number of buffer cache blocks read by the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.blocksreadcountbuffercache[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.buffer_cache`
Database [{#PGSQL.DB.NAME}]: Disk blocks read.	Number of disk blocks read by the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.blocksreadcount_disk[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.disk`
Database [{#PGSQL.DB.NAME}]: Inserted rows processed.	Number of tuples(rows) processed for insert operations for the database with the name [{#PGSQL.DB.NAME}].	Dependent item	gcp.cloudsql.pgsql.tuplesprocessedcount_insert[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.insert`
Database [{#PGSQL.DB.NAME}]: Deleted rows processed	Number of tuples(rows) processed for delete operations for the database with the name [{#PGSQL.DB.NAME}].	Dependent item	gcp.cloudsql.pgsql.tuplesprocessedcount_delete[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.delete`
Database [{#PGSQL.DB.NAME}]: Updated rows processed	Number of tuples(rows) processed for update operations for the database with the name [{#PGSQL.DB.NAME}].	Dependent item	gcp.cloudsql.pgsql.tuplesprocessedcount_update[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.update`
Database [{#PGSQL.DB.NAME}]: Live tuples	Number of live tuples(rows) in the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.tuplesizelive[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.live`
Database [{#PGSQL.DB.NAME}]: Dead tuples	Number of live tuples(rows) in the [{#PGSQL.DB.NAME}] database.	Dependent item	gcp.cloudsql.pgsql.tuplesizedead[{#PGSQL.DB.NAME}] Preprocessing JSON Path: `$.dead`

GCP Cloud SQL PostgreSQL Replica by HTTP

Overview

This template is designed to monitor Google Cloud Platform Cloud SQL PostgreSQL read-only replica instances by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Cloud SQL PostgreSQL read replica versions: 14, 13, 12

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name Description Default

{$GCP.DATA.TIMEOUT}

A response timeout for an API.

15s

{$GCP.TIME.WINDOW}

Time interval for the data requests.

Supported usage type:

1. The default update interval for most of the items.

2. The minimal time window for the data requested in the Monitoring Query Language REST API request.

5m

{$GCP.PROXY}

Sets HTTP proxy value. If this macro is empty then no proxy is used.

Items

Name	Description	Type	Key and additional info
Replica metrics get	PostgreSQL replica metrics data in raw format.	Script	gcp.cloudsql.pgsql.repl.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Network lag	Indicates time taken from primary binary log to IO thread on replica.	Dependent item	gcp.cloudsql.pgsql.repl.network_lag Preprocessing JSON Path: `$.network_lag`
Replication lag	Number of seconds the read replica is behind its primary (approximation).	Dependent item	gcp.cloudsql.pgsql.repl.replica_lag Preprocessing JSON Path: `$.replica_lag`
Replication state	The current serving state of replication. This metric is only available for the MySQL/PostgreSQL instances.	Dependent item	gcp.cloudsql.pgsql.repl.state Preprocessing JSON Path: `$.state` JavaScript: `The text is too long. Please see the template.`
Replay location lag	Replay location replication lag in bytes.	Dependent item	gcp.cloudsql.pgsql.repl.replay_location Preprocessing JSON Path: `$.replay_location`
Write location lag	Write location replication lag in bytes.	Dependent item	gcp.cloudsql.pgsql.repl.write_location Preprocessing JSON Path: `$.write_location`
Flush location lag	Flush location replication lag in bytes.	Dependent item	gcp.cloudsql.pgsql.repl.flush_location Preprocessing JSON Path: `$.flush_location`
Sent location lag	Sent location replication lag in bytes.	Dependent item	gcp.cloudsql.pgsql.repl.sent_location Preprocessing JSON Path: `$.sent_location`
Number of log archival failures	Number of failed attempts for archiving replication log files.	Dependent item	gcp.cloudsql.pgsql.repl.logarchivefailure_count Preprocessing JSON Path: `$.log_archive_failure_count`
Number of log archival successes	Number of failed attempts for archiving replication log files.	Dependent item	gcp.cloudsql.pgsql.repl.logarchivesuccess_count Preprocessing JSON Path: `$.log_archive_success_count`

GCP Cloud SQL MSSQL by HTTP

Overview

This template is designed to monitor Google Cloud Platform Cloud SQL MSSQL instances by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Cloud SQL MSSQL versions: 2022 Standard/Enterprise, 2019 Standard/Enterprise, 2017 Standard/Enterprise.

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name	Description	Default
{$GCP.DATA.TIMEOUT}	A response timeout for an API.	`15s`
{$GCP.TIME.WINDOW}	Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request.	`5m`
{$GCP.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$CLOUD_SQL.MSSQL.RES.NAME.MATCHES}	The filter to include GCP Cloud SQL MSSQL resources by namespace.	`.*`
{$CLOUDSQL.MSSQL.RES.NAME.NOTMATCHES}	The filter to exclude GCP Cloud SQL MSSQL resources by namespace.	`CHANGE_IF_NEEDED`
{$CLOUD_SQL.MSSQL.DB.NAME.MATCHES}	The filter to include GCP Cloud SQL MSSQL databases by namespace.	`.*`
{$CLOUDSQL.MSSQL.DB.NAME.NOTMATCHES}	The filter to exclude GCP Cloud SQL MSSQL databases by namespace.	`CHANGE_IF_NEEDED`
{$CLOUD_SQL.MSSQL.SCHEDULER.ID.MATCHES}	The filter to include GCP Cloud SQL MSSQL schedulers by namespace.	`.*`
{$CLOUDSQL.MSSQL.SCHEDULER.ID.NOTMATCHES}	The filter to exclude GCP Cloud SQL MSSQL schedulers by namespace.	`CHANGE_IF_NEEDED`
{$CLOUD_SQL.MSSQL.DISK.UTIL.WARN}	GCP Cloud SQL MSSQL instance warning disk usage threshold.	`80`
{$CLOUD_SQL.MSSQL.DISK.UTIL.CRIT}	GCP Cloud SQL MSSQL instance critical disk usage threshold.	`90`
{$CLOUD_SQL.MSSQL.CPU.UTIL.MAX}	GCP Cloud SQL MSSQL instance CPU usage threshold.	`95`
{$CLOUD_SQL.MSSQL.RAM.UTIL.MAX}	GCP Cloud SQL MSSQL instance RAM usage threshold.	`90`

Items

Name	Description	Type	Key and additional info
Metrics get	MSSQL metrics data in raw format.	Script	gcp.cloudsql.mssql.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Reserved CPU cores	Number of cores reserved for the database.	Dependent item	gcp.cloudsql.mssql.cpu.reserved_cores Preprocessing JSON Path: `$.base_reserved_cores`
CPU usage time	Cumulative CPU usage time in seconds.	Dependent item	gcp.cloudsql.mssql.cpu.usage_time Preprocessing JSON Path: `$.base_usage_time`
CPU utilization	Current CPU utilization represented as a percentage of the reserved CPU that is currently in use.	Dependent item	gcp.cloudsql.mssql.cpu.utilization Preprocessing JSON Path: `$.base_utilization` Custom multiplier: `100`
Disk size	Maximum data disk size in bytes.	Dependent item	gcp.cloudsql.mssql.disk.quota Preprocessing JSON Path: `$.mssql_quota`
Disk bytes used	Data utilization in bytes.	Dependent item	gcp.cloudsql.mssql.disk.bytes_used Preprocessing JSON Path: `$.mssql_bytes_used`
Disk read I/O	Delta count of data disk read I/O operations.	Dependent item	gcp.cloudsql.mssql.disk.readopscount Preprocessing JSON Path: `$.base_read_ops_count`
Disk write I/O	Delta count of data disk write I/O operations.	Dependent item	gcp.cloudsql.mssql.disk.writeopscount Preprocessing JSON Path: `$.base_write_ops_count`
Disk utilization	The fraction of the disk quota that is currently in use. Shown as percentage.	Dependent item	gcp.cloudsql.mssql.disk.utilization Preprocessing JSON Path: `$.mssql_utilization` Custom multiplier: `100`
Memory size	Maximum RAM size in bytes.	Dependent item	gcp.cloudsql.mssql.memory.quota Preprocessing JSON Path: `$.base_quota`
Memory used by DB engine	Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache.	Dependent item	gcp.cloudsql.mssql.memory.total_usage Preprocessing JSON Path: `$.base_total_usage`
Memory usage	The RAM usage in bytes. This metric reports the RAM usage of the server, excluding the buffer/cache.	Dependent item	gcp.cloudsql.mssql.memory.usage Preprocessing JSON Path: `$.base_usage`
Memory utilization	The fraction of the memory quota that is currently in use. Shown as percentage.	Dependent item	gcp.cloudsql.mssql.memory.utilization Preprocessing JSON Path: `$.base_ram_pused`
Network: Received bytes	Delta count of bytes received through the network.	Dependent item	gcp.cloudsql.mssql.network.receivedbytescount Preprocessing JSON Path: `$.base_received_bytes_count`
Network: Sent bytes	Delta count of bytes sent through the network.	Dependent item	gcp.cloudsql.mssql.network.sentbytescount Preprocessing JSON Path: `$.base_sent_bytes_count`
Connections	Number of connections to the databases on the Cloud SQL instance.	Dependent item	gcp.cloudsql.mssql.network.connections Preprocessing JSON Path: `$.base_connections`
Instance state	GCP Cloud SQL MSSQL Current instance state.	HTTP agent	gcp.cloudsql.mssql.inst.state Preprocessing JSON Path: `$.timeSeriesData[0].pointData[0].values[0].stringValue` ⛔️Custom on fail: Set value to: `10` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `10m`
DB engine state	GCP Cloud SQL MSSQL DB Engine State.	HTTP agent	gcp.cloudsql.mssql.db.state Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.timeSeriesData[0].pointData[0].values[0].int64Value` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `10m`
Connection resets	Total number of login operations started from the connection pool since the last restart of SQL Server service.	Dependent item	gcp.cloudsql.mssql.conn.connectionresetcount Preprocessing JSON Path: `$.mssql_connection_reset_count`
Login attempts	Total number of login attempts since the last restart of SQL Server service. This does not include pooled connections.	Dependent item	gcp.cloudsql.mssql.conn.loginattemptcount Preprocessing JSON Path: `$.mssql_login_attempt_count`
Logouts	Total number of logout operations since the last restart of SQL Server service.	Dependent item	gcp.cloudsql.mssql.conn.logout_count Preprocessing JSON Path: `$.mssql_logout_count`
Processes blocked	Current number of blocked processes.	Dependent item	gcp.cloudsql.mssql.conn.processes_blocked Preprocessing JSON Path: `$.mssql_processes_blocked`
Buffer cache hit ratio	Current percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups.	Dependent item	gcp.cloudsql.mssql.memory.buffercachehit_ratio Preprocessing JSON Path: `$.mssql_buffer_cache_hit_ratio`
Checkpoint pages	Total number of pages flushed to disk by a checkpoint or other operation that requires all dirty pages to be flushed.	Dependent item	gcp.cloudsql.mssql.memory.checkpointpagecount Preprocessing JSON Path: `$.mssql_checkpoint_page_count`
Free list stalls	Total number of requests that had to wait for a free page.	Dependent item	gcp.cloudsql.mssql.memory.freeliststall_count Preprocessing JSON Path: `$.mssql_free_list_stall_count`
Lazy writes	Total number of buffers written by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and makes them available to user processes.	Dependent item	gcp.cloudsql.mssql.memory.lazywritecount Preprocessing JSON Path: `$.mssql_lazy_write_count`
Memory grants pending	Current number of processes waiting for a workspace memory grant.	Dependent item	gcp.cloudsql.mssql.memory.memorygrantspending Preprocessing JSON Path: `$.mssql_memory_grants_pending`
Page life expectancy	Current number of seconds a page will stay in the buffer pool without references.	Dependent item	gcp.cloudsql.mssql.memory.pagelifeexpectancy Preprocessing JSON Path: `$.mssql_page_life_expectancy`
Batch requests	Total number of Transact-SQL command batches received.	Dependent item	gcp.cloudsql.mssql.trans.batchrequestcount Preprocessing JSON Path: `$.mssql_batch_request_count`
Forwarded records	Total number of records fetched through forwarded record pointers.	Dependent item	gcp.cloudsql.mssql.trans.forwardedrecordcount Preprocessing JSON Path: `$.mssql_forwarded_record_count`
Full scans	Total number of unrestricted full scans. These can be either base-table or full-index scans.	Dependent item	gcp.cloudsql.mssql.trans.fullscancount Preprocessing JSON Path: `$.mssql_full_scan_count`
Page splits	Total number of page splits that occur as the result of overflowing index pages.	Dependent item	gcp.cloudsql.mssql.trans.pagesplitcount Preprocessing JSON Path: `$.mssql_page_split_count`
Probe scans	Total number of probe scans that are used to find at least one single qualified row in an index or base table directly.	Dependent item	gcp.cloudsql.mssql.trans.probescancount Preprocessing JSON Path: `$.mssql_probe_scan_count`
SQL compilations	Total number of SQL compilations.	Dependent item	gcp.cloudsql.mssql.trans.sqlcompilationcount Preprocessing JSON Path: `$.mssql_sql_compilation_count`
SQL recompilations	Total number of SQL recompilations.	Dependent item	gcp.cloudsql.mssql.trans.sqlrecompilationcount Preprocessing JSON Path: `$.mssql_sql_recompilation_count`
Read page operations	Total number of physical database page reads. This metric counts physical page reads across all databases.	Dependent item	gcp.cloudsql.mssql.memory.page_ops.read Preprocessing JSON Path: `$.mssql_read`
Write age operations	Total number of physical database page writes. This metric counts physical page writes across all databases.	Dependent item	gcp.cloudsql.mssql.memory.page_ops.write Preprocessing JSON Path: `$.mssql_write`
Audits size	Tracks the size in bytes of stored SQLServer audit files on an instance. Empty value if there are no audits enabled.	Dependent item	gcp.cloudsql.mssql.audits_size Preprocessing JSON Path: `$.base_audits_size` ⛔️Custom on fail: Discard value
Audits successfully uploaded	Tracks the size in bytes of stored SQLServer audit files on an instance. Empty value if there are no audits enabled.	Dependent item	gcp.cloudsql.mssql.auditsuploadcount Preprocessing JSON Path: `$.mssql_success` ⛔️Custom on fail: Discard value
Resources get	MSSQL resources data in raw format.	Script	gcp.cloudsql.mssql.resources.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Databases get	MSSQL databases data in raw format.	Script	gcp.cloudsql.mssql.db.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Schedulers get	MSSQL schedulers data in raw format.	Script	gcp.cloudsql.mssql.schedulers.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
GCP MSSQL: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.cpu.utilization,5m) >= {$CLOUD_SQL.MSSQL.CPU.UTIL.MAX}`\|Average
GCP MSSQL: Disk space is low	High utilization of the storage space.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.disk.utilization) >= {$CLOUD_SQL.MSSQL.DISK.UTIL.WARN}`\|Warning	Depends on: GCP MSSQL: Disk space is critically low
GCP MSSQL: Disk space is critically low	Critical utilization of the disk space.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.disk.utilization) >= {$CLOUD_SQL.MSSQL.DISK.UTIL.CRIT}`\|Average
GCP MSSQL: High memory utilization	RAM utilization is too high. The system might be slow to respond.	`min(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.memory.utilization,5m) >= {$CLOUD_SQL.MSSQL.RAM.UTIL.MAX}`\|High
GCP MSSQL: Instance is in suspended state	The instance is in suspended state. It is not available, for example, due to problems with billing.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 1`\|Warning
GCP MSSQL: Instance is stopped by the owner	The instance has been stopped by the owner. It is not currently running, but it's ready to be restarted.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 2`\|Info
GCP MSSQL: Instance is in maintenance	The instance is down for maintenance.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 4`\|Info
GCP MSSQL: Instance is in failed state	The instance creation failed, or an operation left the instance in an own bad state.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 5`\|Average
GCP MSSQL: Instance is in unknown state	The state of the instance is unknown.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 6`\|Average
GCP MSSQL: Failed to get the instance state	Failed to get the instance state. Check access permissions to GCP API or service account.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 10`\|Average
GCP MSSQL: Database engine is down	Database engine is down. If an instance experiences unplanned (non-maintenance) downtime, the instance state will still be RUNNING, but the database engine state metric will report 0.	`last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.db.state)=0`\|Average	Depends on: GCP MSSQL: Instance is stopped by the owner GCP MSSQL: Instance is in suspended state GCP MSSQL: Instance is in maintenance GCP MSSQL: Instance is in failed state GCP MSSQL: Instance is in unknown state GCP MSSQL: Failed to get the instance state

LLD rule Resources discovery

Name Description Type Key and additional info

Resources discovery

Resources discovery.

Dependent item

gcp.cloudsql.resources.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Resources discovery

Name	Description	Type	Key and additional info
Resource [{#RESOURCE.NAME}]: Raw data	Data in raw format for the [{#RESOURCE.NAME}] resource.	Dependent item	gcp.cloudsql.mssql.resource.raw[{#RESOURCE.NAME}] Preprocessing JSON Path: `$[?(@.resource == "{#RESOURCE.NAME}")].metrics.first()`
Resource [{#RESOURCE.NAME}]: Deadlocks	Total number of lock requests that resulted in a deadlock for the [{#RESOURCE.NAME}] resource.	Dependent item	gcp.cloudsql.mssql.resource.deadlock_count[{#RESOURCE.NAME}] Preprocessing JSON Path: `$.deadlock_count`
Resource [{#RESOURCE.NAME}]: Lock waits	Total number of lock requests that required the caller to wait for the [{#RESOURCE.NAME}] resource.	Dependent item	gcp.cloudsql.mssql.resource.lockwaitcount[{#RESOURCE.NAME}] Preprocessing JSON Path: `$.lock_wait_count`
Resource [{#RESOURCE.NAME}]: Lock wait time	Total time lock requests were waiting for locks for the [{#RESOURCE.NAME}] resource.	Dependent item	gcp.cloudsql.mssql.resource.lockwaittime[{#RESOURCE.NAME}] Preprocessing JSON Path: `$.lock_wait_time`

LLD rule Databases discovery

Name Description Type Key and additional info

Databases discovery

Databases discovery.

Dependent item

gcp.cloudsql.db.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Databases discovery

Name Description Type Key and additional info

Database [{#DB.NAME}]: Raw data

Data in raw format for the [{#DB.NAME}] database.

Dependent item

gcp.cloudsql.mssql.db.raw[{#DB.NAME}]

Preprocessing

JSON Path: $[?(@.database == "{#DB.NAME}")].metrics.first()

Database [{#DB.NAME}]: Log bytes flushed

Total number of log bytes flushed for the [{#DB.NAME}] database.

Dependent item

gcp.cloudsql.mssql.db.logbytesflushed_count[{#DB.NAME}]

Preprocessing

JSON Path: $.log_bytes_flushed_count

Database [{#DB.NAME}]: Transactions started

Total number of transactions started for the [{#DB.NAME}] database.

Dependent item

gcp.cloudsql.mssql.db.transaction_count[{#DB.NAME}]

Preprocessing

JSON Path: $.transaction_count

LLD rule Schedulers discovery

Name Description Type Key and additional info

Schedulers discovery

Schedulers discovery.

Dependent item

gcp.cloudsql.schedulers.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Schedulers discovery

Name	Description	Type	Key and additional info
Scheduler [{#SCHEDULER.ID}]: Raw data	Data in raw format associated with the scheduler that goes by its ID [{#SCHEDULER.ID}].	Dependent item	gcp.cloudsql.mssql.scheduler.raw[{#SCHEDULER.ID}] Preprocessing JSON Path: `$[?(@.scheduler == "{#SCHEDULER.ID}")].metrics.first()`
Scheduler [{#SCHEDULER.ID}]: Active workers	Current number of active workers associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. An active worker is never preemptive, must have an associated task, and is either running, runnable, or suspended.	Dependent item	gcp.cloudsql.mssql.scheduler.active_workers[{#SCHEDULER.ID}] Preprocessing JSON Path: `$.active_workers`
Scheduler [{#SCHEDULER.ID}]: Current tasks	Current number of present tasks associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. This count includes tasks that are waiting for a worker to execute them and tasks that are currently waiting or running (in SUSPENDED or RUNNABLE state).	Dependent item	gcp.cloudsql.mssql.scheduler.current_tasks[{#SCHEDULER.ID}] Preprocessing JSON Path: `$.current_tasks`
Scheduler [{#SCHEDULER.ID}]: Current workers	Current number of workers associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. It includes workers that are not assigned any task.	Dependent item	gcp.cloudsql.mssql.scheduler.current_workers[{#SCHEDULER.ID}] Preprocessing JSON Path: `$.current_workers`
Scheduler [{#SCHEDULER.ID}]: Pending I/O operations	Current number of pending I/Os waiting to be completed that are associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. Each scheduler has a list of pending I/Os that are checked to determine whether they have been completed every time there is a context switch. The count is incremented when the request is inserted. This count is decremented when the request is completed. This number does not indicate the state of the I/Os.	Dependent item	gcp.cloudsql.mssql.scheduler.pendingdiskio[{#SCHEDULER.ID}] Preprocessing JSON Path: `$.pending_disk_io`
Scheduler [{#SCHEDULER.ID}]: Runnable tasks	Current number of workers that are associated with the scheduler that goes by its ID [{#SCHEDULER.ID}] and have assigned tasks waiting to be scheduled on the runnable queue.	Dependent item	gcp.cloudsql.mssql.scheduler.runnable_tasks[{#SCHEDULER.ID}] Preprocessing JSON Path: `$.runnable_tasks`
Scheduler [{#SCHEDULER.ID}]: Work queue	Current number of tasks in the pending queue associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. These tasks are waiting for a worker to pick them up.	Dependent item	gcp.cloudsql.mssql.scheduler.work_queue[{#SCHEDULER.ID}] Preprocessing JSON Path: `$.work_queue`

GCP Cloud SQL MSSQL Replica by HTTP

Overview

This template is designed to monitor Google Cloud Platform Cloud SQL MSSQL read-only replica instances by Zabbix.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GCP Cloud SQL MSSQL read replicas versions: 2019 Standard/Enterprise, 2017 Standard/Enterprise

Configuration

Setup

This template will be automatically connected to discovered entities with all their required parameters pre-defined.

Macros used

Name Description Default

{$GCP.DATA.TIMEOUT}

A response timeout for an API.

15s

{$GCP.TIME.WINDOW}

Time interval for the data requests.

Supported usage type:

1. The default update interval for most of the items.

2. The minimal time window for the data requested in the Monitoring Query Language REST API request.

5m

{$GCP.PROXY}

Sets HTTP proxy value. If this macro is empty then no proxy is used.

Items

Name	Description	Type	Key and additional info
Replica metrics get	MSSQL replica metrics data in raw format.	Script	gcp.cloudsql.mssql.repl.metrics.get Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Bytes sent to replica	Total number of bytes sent to the remote availability replica. For an async replica, returns the number of bytes before compression. For a sync replica without compression, returns the actual number of bytes.	Dependent item	gcp.cloudsql.mssql.repl.bytessenttoreplicacount Preprocessing JSON Path: `$.bytes_sent_to_replica_count`
Resent messages	Total count of Always On messages to resend. This includes messages that were attempted to be sent but failed and require resending.	Dependent item	gcp.cloudsql.mssql.repl.resentmessagecount Preprocessing JSON Path: `$.resent_message_count`
Log apply pending queue	Current number of log blocks that are waiting to be applied to replica.	Dependent item	gcp.cloudsql.mssql.repl.logapplypending_queue Preprocessing JSON Path: `$.log_apply_pending_queue`
Log bytes received	Total size of log records received by the replica.	Dependent item	gcp.cloudsql.mssql.repl.logbytesreceived_count Preprocessing JSON Path: `$.log_bytes_received_count`
Recovery queue	Current size of log records in bytes in the replica's log files that have not been redone.	Dependent item	gcp.cloudsql.mssql.repl.recovery_queue Preprocessing JSON Path: `$.recovery_queue` Custom multiplier: `1024`
Redone bytes	Total size in bytes of redone log records.	Dependent item	gcp.cloudsql.mssql.repl.redonebytescount Preprocessing JSON Path: `$.redone_bytes_count`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

cloud

cloud_azure_http

View README Download JSON

Azure by HTTP

Overview

This template is designed to monitor Microsoft Azure by HTTP.
It works without any external scripts and uses the script item.
Currently, the template supports the discovery of virtual machines (VMs), VM scale sets, Cosmos DB for MongoDB, storage accounts, Microsoft SQL, MySQL, and PostgreSQL servers.

Included Monitoring Templates

Azure Virtual Machine by HTTP
Azure VM Scale Set by HTTP
Azure MySQL Flexible Server by HTTP
Azure MySQL Single Server by HTTP
Azure PostgreSQL Flexible Server by HTTP
Azure PostgreSQL Single Server by HTTP
Azure Microsoft SQL Serverless Database by HTTP
Azure Microsoft SQL DTU Database by HTTP
Azure Microsoft SQL Database by HTTP
Azure SQL Managed Instance by HTTP
Azure Cost Management by HTTP
Azure Backup Jobs by HTTP

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, and {$AZURE.SUBSCRIPTION.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.VM.NAME.MATCHES}	This macro is used in virtual machines discovery rule.	`.*`
{$AZURE.VM.NAME.NOT.MATCHES}	This macro is used in virtual machines discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.VM.LOCATION.MATCHES}	This macro is used in virtual machines discovery rule.	`.*`
{$AZURE.VM.LOCATION.NOT.MATCHES}	This macro is used in virtual machines discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.SCALESET.NAME.MATCHES}	This macro is used in virtual machine scale sets discovery rule.	`.*`
{$AZURE.SCALESET.NAME.NOT.MATCHES}	This macro is used in virtual machine scale sets discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.SCALESET.LOCATION.MATCHES}	This macro is used in virtual machine scale sets discovery rule.	`.*`
{$AZURE.SCALESET.LOCATION.NOT.MATCHES}	This macro is used in virtual machine scale sets discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.SQL.INST.NAME.MATCHES}	This macro is used in Azure SQL Managed Instance discovery rule.	`.*`
{$AZURE.SQL.INST.NAME.NOT.MATCHES}	This macro is used in Azure SQL Managed Instance discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.SQL.INST.LOCATION.MATCHES}	This macro is used in Azure SQL Managed Instance discovery rule.	`.*`
{$AZURE.SQL.INST.LOCATION.NOT.MATCHES}	This macro is used in Azure SQL Managed Instance discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.VAULT.NAME.MATCHES}	This macro is used in Azure Vault discovery rule.	`.*`
{$AZURE.VAULT.NAME.NOT.MATCHES}	This macro is used in Azure Vault discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.VAULT.LOCATION.MATCHES}	This macro is used in Azure Vault discovery rule.	`.*`
{$AZURE.VAULT.LOCATION.NOT.MATCHES}	This macro is used in Azure Vault discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.STORAGE.ACC.NAME.MATCHES}	This macro is used in storage accounts discovery rule.	`.*`
{$AZURE.STORAGE.ACC.NAME.NOT.MATCHES}	This macro is used in storage accounts discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.STORAGE.ACC.LOCATION.MATCHES}	This macro is used in storage accounts discovery rule.	`.*`
{$AZURE.STORAGE.ACC.LOCATION.NOT.MATCHES}	This macro is used in storage accounts discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.STORAGE.ACC.AVAILABILITY}	The warning threshold of the storage account availability.	`70`
{$AZURE.STORAGE.ACC.BLOB.AVAILABILITY}	The warning threshold of the storage account blob services availability.	`70`
{$AZURE.STORAGE.ACC.TABLE.AVAILABILITY}	The warning threshold of the storage account table services availability.	`70`
{$AZURE.RESOURCE.GROUP.MATCHES}	This macro is used in discovery rules.	`.*`
{$AZURE.RESOURCE.GROUP.NOT.MATCHES}	This macro is used in discovery rules.	`CHANGE_IF_NEEDED`
{$AZURE.MYSQL.DB.NAME.MATCHES}	This macro is used in MySQL servers discovery rule.	`.*`
{$AZURE.MYSQL.DB.NAME.NOT.MATCHES}	This macro is used in MySQL servers discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.MYSQL.DB.LOCATION.MATCHES}	This macro is used in MySQL servers discovery rule.	`.*`
{$AZURE.MYSQL.DB.LOCATION.NOT.MATCHES}	This macro is used in MySQL servers discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.PGSQL.DB.NAME.MATCHES}	This macro is used in PostgreSQL servers discovery rule.	`.*`
{$AZURE.PGSQL.DB.NAME.NOT.MATCHES}	This macro is used in PostgreSQL servers discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.PGSQL.DB.LOCATION.MATCHES}	This macro is used in PostgreSQL servers discovery rule.	`.*`
{$AZURE.PGSQL.DB.LOCATION.NOT.MATCHES}	This macro is used in PostgreSQL servers discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.MSSQL.DB.NAME.MATCHES}	This macro is used in Microsoft SQL databases discovery rule.	`.*`
{$AZURE.MSSQL.DB.NAME.NOT.MATCHES}	This macro is used in Microsoft SQL databases discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.MSSQL.DB.LOCATION.MATCHES}	This macro is used in Microsoft SQL databases discovery rule.	`.*`
{$AZURE.MSSQL.DB.LOCATION.NOT.MATCHES}	This macro is used in Microsoft SQL databases discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.MSSQL.DB.SIZE.NOT.MATCHES}	This macro is used in Microsoft SQL databases discovery rule.	`^System$`
{$AZURE.COSMOS.MONGO.DB.NAME.MATCHES}	This macro is used in Microsoft Cosmos DB account discovery rule.	`.*`
{$AZURE.COSMOS.MONGO.DB.NAME.NOT.MATCHES}	This macro is used in Microsoft Cosmos DB account discovery rule.	`CHANGE_IF_NEEDED`
{$AZURE.COSMOS.MONGO.DB.LOCATION.MATCHES}	This macro is used in Microsoft Cosmos DB account discovery rule.	`.*`
{$AZURE.COSMOS.MONGO.DB.LOCATION.NOT.MATCHES}	This macro is used in Microsoft Cosmos DB account discovery rule.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get resources	The result of API requests is expressed in the JSON.	Script	azure.get.resources
Get errors	A list of errors from API requests.	Dependent item	azure.get.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Get storage accounts	The result of API requests is expressed in the JSON.	Script	azure.get.storage.acc
Get storage accounts errors	The errors from API requests.	Dependent item	azure.get.storage.acc.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Azure: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure by HTTP/azure.get.errors))>0`\|Average
Azure: There are errors in storages requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure by HTTP/azure.get.storage.acc.errors))>0`\|Average	Depends on: Azure: There are errors in requests to API

LLD rule Storage accounts discovery

Name Description Type Key and additional info

Storage accounts discovery

The list of all storage accounts available under the subscription.

Dependent item

azure.storage.acc.discovery

Preprocessing

Discard unchanged with heartbeat: 6h

Item prototypes for Storage accounts discovery

Name	Description	Type	Key and additional info
Storage account [{#NAME}]: Get data	The HTTP API endpoint that returns storage metrics with the name `[{#NAME}]`.	Script	azure.get.storage.acc[{#NAME}]
Storage account [{#NAME}]: Used Capacity	The amount of storage used by the storage account with the name `[{#NAME}]`, expressed in bytes. For standard storage accounts, it's the sum of capacity used by blob, table, file, and queue. For premium storage accounts and blob storage accounts, it is the same as BlobCapacity or FileCapacity.	Dependent item	azure.storage.used.capacity[{#NAME}] Preprocessing JSON Path: `$.storageAccount.UsedCapacity.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Transactions	The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use `ResponseType` dimension for the number of different types of responses.	Dependent item	azure.storage.transactions[{#NAME}] Preprocessing JSON Path: `$.storageAccount.Transactions.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Ingress	The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure.	Dependent item	azure.storage.ingress[{#NAME}] Preprocessing JSON Path: `$.storageAccount.Ingress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Egress	The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress.	Dependent item	azure.storage.engress[{#NAME}] Preprocessing JSON Path: `$.storageAccount.Egress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Success Server Latency	The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in `SuccessE2ELatency`.	Dependent item	azure.storage.success.server.latency[{#NAME}] Preprocessing JSON Path: `$.storageAccount.SuccessServerLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Success E2E Latency	The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.	Dependent item	azure.storage.success.e2e.latency[{#NAME}] Preprocessing JSON Path: `$.storageAccount.SuccessE2ELatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Availability	The percentage of availability for the storage service or a specified API operation. Availability is calculated by taking the `TotalBillableRequests` value and dividing it by the number of applicable requests, including those that produced unexpected errors. All unexpected errors result in reduced availability for the storage service or the specified API operation.	Dependent item	azure.storage.availability[{#NAME}] Preprocessing JSON Path: `$.storageAccount.Availability.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Capacity	The amount of storage used by the blob service of the storage account with the name `[{#NAME}]`, expressed in bytes.	Dependent item	azure.storage.blob.capacity[{#NAME}] Preprocessing JSON Path: `$.blobServices.BlobCapacity.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Count	The number of blob objects stored in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.blob.count[{#NAME}] Preprocessing JSON Path: `$.blobServices.BlobCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Container Count	The number of containers in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.blob.container.count[{#NAME}] Preprocessing JSON Path: `$.blobServices.ContainerCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Index Capacity	The amount of storage with the name `[{#NAME}]` used by the Azure Data Lake Storage Gen2 hierarchical index.	Dependent item	azure.storage.blob.index.capacity[{#NAME}] Preprocessing JSON Path: `$.blobServices.IndexCapacity.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Transactions	The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use `ResponseType` dimension for the number of different types of responses.	Dependent item	azure.storage.blob.transactions[{#NAME}] Preprocessing JSON Path: `$.blobServices.Transactions.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Ingress	The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure.	Dependent item	azure.storage.blob.ingress[{#NAME}] Preprocessing JSON Path: `$.blobServices.Ingress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Egress	The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress.	Dependent item	azure.storage.blob.engress[{#NAME}] Preprocessing JSON Path: `$.blobServices.Egress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Success Server Latency	The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in `SuccessE2ELatency`.	Dependent item	azure.storage.blob.success.server.latency[{#NAME}] Preprocessing JSON Path: `$.blobServices.SuccessServerLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Success E2E Latency	The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.	Dependent item	azure.storage.blob.success.e2e.latency[{#NAME}] Preprocessing JSON Path: `$.blobServices.SuccessE2ELatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Blob Availability	The percentage of availability for the storage service or a specified API operation. Availability is calculated by taking the `TotalBillableRequests` value and dividing it by the number of applicable requests, including those that produced unexpected errors. All unexpected errors result in reduced availability for the storage service or the specified API operation.	Dependent item	azure.storage.blob.availability[{#NAME}] Preprocessing JSON Path: `$.blobServices.Availability.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Capacity	The amount of storage used by the table service of the storage account with the name `[{#NAME}]`, expressed in bytes.	Dependent item	azure.storage.table.capacity[{#NAME}] Preprocessing JSON Path: `$.tableServices.TableCapacity.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Count	The number of tables in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.table.count[{#NAME}] Preprocessing JSON Path: `$.tableServices.TableCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Entity Count	The number of table entities in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.table.entity.count[{#NAME}] Preprocessing JSON Path: `$.tableServices.TableEntityCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Transactions	The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use `ResponseType` dimension for the number of different types of responses.	Dependent item	azure.storage.table.transactions[{#NAME}] Preprocessing JSON Path: `$.tableServices.Transactions.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Ingress	The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure.	Dependent item	azure.storage.table.ingress[{#NAME}] Preprocessing JSON Path: `$.tableServices.Ingress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Egress	The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress.	Dependent item	azure.storage.table.engress[{#NAME}] Preprocessing JSON Path: `$.tableServices.Egress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Success Server Latency	The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in `SuccessE2ELatency`.	Dependent item	azure.storage.table.success.server.latency[{#NAME}] Preprocessing JSON Path: `$.tableServices.SuccessServerLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Success E2E Latency	The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.	Dependent item	azure.storage.table.success.e2e.latency[{#NAME}] Preprocessing JSON Path: `$.tableServices.SuccessE2ELatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Table Availability	The percentage of availability for the storage service or a specified API operation. Availability is calculated by taking the `TotalBillableRequests` value and dividing it by the number of applicable requests, including those that produced unexpected errors. All unexpected errors result in reduced availability for the storage service or the specified API operation.	Dependent item	azure.storage.table.availability[{#NAME}] Preprocessing JSON Path: `$.tableServices.Availability.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Capacity	The amount of file storage used by the storage account with the name `[{#NAME}]`, expressed in bytes.	Dependent item	azure.storage.file.capacity[{#NAME}] Preprocessing JSON Path: `$.fileServices.FileCapacity.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Count	The number of files in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.file.count[{#NAME}] Preprocessing JSON Path: `$.fileServices.FileCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Share Count	The number of file shares in the storage account.	Dependent item	azure.storage.file.share.count[{#NAME}] Preprocessing JSON Path: `$.fileServices.FileShareCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Share Snapshot Count	The number of snapshots present on the share in storage account's Files Service.	Dependent item	azure.storage.file.shares.snapshot.count[{#NAME}] Preprocessing JSON Path: `$.fileServices.FileShareSnapshotCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Share Snapshot Size	The amount of storage used by the snapshots in storage account's File service, in bytes.	Dependent item	azure.storage.file.share.snapshot.size[{#NAME}] Preprocessing JSON Path: `$.fileServices.FileShareSnapshotSize.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Share Capacity Quota	The upper limit on the amount of storage that can be used by Azure Files Service, in bytes.	Dependent item	azure.storage.file.share.capacity.quota[{#NAME}] Preprocessing JSON Path: `$.fileServices.FileShareCapacityQuota.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Transactions	The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use `ResponseType` dimension for the number of different types of responses.	Dependent item	azure.storage.file.transactions[{#NAME}] Preprocessing JSON Path: `$.fileServices.Transactions.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Ingress	The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure.	Dependent item	azure.storage.file.ingress[{#NAME}] Preprocessing JSON Path: `$.fileServices.Ingress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Egress	The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress.	Dependent item	azure.storage.file.engress[{#NAME}] Preprocessing JSON Path: `$.fileServices.Egress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Success Server Latency	The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in `SuccessE2ELatency`.	Dependent item	azure.storage.file.success.server.latency[{#NAME}] Preprocessing JSON Path: `$.fileServices.SuccessServerLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: File Success E2E Latency	The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.	Dependent item	azure.storage.file.success.e2e.latency[{#NAME}] Preprocessing JSON Path: `$.fileServices.file.SuccessE2ELatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Capacity	The amount of queue storage used by the storage account with the name `[{#NAME}]`, expressed in bytes.	Dependent item	azure.storage.queue.capacity[{#NAME}] Preprocessing JSON Path: `$.queueServices.QueueCapacity.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Count	The number of queues in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.queue.count[{#NAME}] Preprocessing JSON Path: `$.queueServices.QueueCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Message Count	The number of unexpired queue messages in the storage account with the name `[{#NAME}]`.	Dependent item	azure.storage.queue.message.count[{#NAME}] Preprocessing JSON Path: `$.queueServices.QueueMessageCount.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Transactions	The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use `ResponseType` dimension for the number of different types of responses.	Dependent item	azure.storage.queue.transactions[{#NAME}] Preprocessing JSON Path: `$.queueServices.Transactions.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Ingress	The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure.	Dependent item	azure.storage.queue.ingress[{#NAME}] Preprocessing JSON Path: `$.queueServices.Ingress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Egress	The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress.	Dependent item	azure.storage.queue.engress[{#NAME}] Preprocessing JSON Path: `$.queueServices.Egress.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Success Server Latency	The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in `SuccessE2ELatency`.	Dependent item	azure.storage.queue.success.server.latency[{#NAME}] Preprocessing JSON Path: `$.queueServices.SuccessServerLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`
Storage account [{#NAME}]: Queue Success E2E Latency	The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.	Dependent item	azure.storage.queue.success.e2e.latency[{#NAME}] Preprocessing JSON Path: `$.queueServices.queue.SuccessE2ELatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001` Discard unchanged with heartbeat: `3h`

Trigger prototypes for Storage accounts discovery

Name	Description	Expression	Severity	Dependencies and additional info
Azure: Storage account [{#NAME}]: Availability is low		`(min(/Azure by HTTP/azure.storage.availability[{#NAME}],#3))<{$AZURE.STORAGE.ACC.AVAILABILITY:"{#NAME}"}`\|Warning
Azure: Storage account [{#NAME}]: Blob Availability is low		`(min(/Azure by HTTP/azure.storage.blob.availability[{#NAME}],#3))<{$AZURE.STORAGE.ACC.BLOB.AVAILABILITY:"{#NAME}"}`\|Warning
Azure: Storage account [{#NAME}]: Table Availability is low		`(min(/Azure by HTTP/azure.storage.table.availability[{#NAME}],#3))<{$AZURE.STORAGE.ACC.TABLE.AVAILABILITY:"{#NAME}"}`\|Warning

LLD rule Virtual machines discovery

Name Description Type Key and additional info

Virtual machines discovery

The list of virtual machines provided by the subscription.

Dependent item

azure.vm.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule Virtual machine scale set discovery

Name Description Type Key and additional info

Virtual machine scale set discovery

The list of virtual machine scale sets provided by the subscription.

Dependent item

azure.scaleset.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule Azure SQL managed instance discovery

Name Description Type Key and additional info

Azure SQL managed instance discovery

The list of Azure SQL managed instances provided by the subscription.

Dependent item

azure.sql_inst.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule Azure Vault discovery

Name Description Type Key and additional info

Azure Vault discovery

The list of Azure Recovery Services and Backup vaults provided by the subscription.

Dependent item

azure.vault.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule MySQL servers discovery

Name Description Type Key and additional info

MySQL servers discovery

The list of MySQL servers provided by the subscription.

Dependent item

azure.mysql.servers.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule PostgreSQL servers discovery

Name Description Type Key and additional info

PostgreSQL servers discovery

The list of PostgreSQL servers provided by the subscription.

Dependent item

azure.pgsql.servers.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule Microsoft SQL databases discovery

Name Description Type Key and additional info

Microsoft SQL databases discovery

The list of Microsoft SQL databases provided by the subscription.

Dependent item

azure.mssql.databases.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

LLD rule Cosmos DB account discovery

Name Description Type Key and additional info

Cosmos DB account discovery

The list of Cosmos databases provided by the subscription.

Dependent item

azure.cosmos.mongo.db.discovery

Preprocessing

JSON Path: $.resources.value
Discard unchanged with heartbeat: 6h

Azure VM Scale Set by HTTP

Overview

This template is designed to monitor Microsoft Azure virtual machine scale sets by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure virtual machine scale sets

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure virtual machine ID.
{$AZURE.SCALESET.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.SCALESET.VM.COUNT.CRIT}	The critical amount of virtual machines in the scale set.	`100`

Items

Name	Description	Type	Key and additional info
Get data	Gathers data of the virtual machine scale set.	Script	azure.scaleset.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.scaleset.data.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Availability state	The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes.	Dependent item	azure.scaleset.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of availability status.	Dependent item	azure.scaleset.availability.details Preprocessing JSON Path: `$.health.summary` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Virtual machine count	Current amount of virtual machines in the scale set.	Dependent item	azure.scaleset.vm.count Preprocessing JSON Path: `$.capacity` Discard unchanged with heartbeat: `1h`
Available memory	Amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the virtual machine.	Dependent item	azure.scaleset.vm.memory Preprocessing JSON Path: `$.metrics.AvailableMemoryBytes.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU credits consumed	Total number of credits consumed by the virtual machine. Only available on B-series burstable VMs.	Dependent item	azure.scaleset.cpu.credits.consumed Preprocessing JSON Path: `$.metrics.CPUCreditsConsumed.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU credits remaining	Total number of credits available to burst. Only available on B-series burstable VMs.	Dependent item	azure.scaleset.cpu.credits.remaining Preprocessing JSON Path: `$.metrics.CPUCreditsRemaining.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU utilization	The percentage of allocated compute units that are currently in use by the virtual machine(s).	Dependent item	azure.scaleset.cpu.utilization Preprocessing JSON Path: `$.metrics.PercentageCPU.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk bandwidth consumed	Percentage of data disk bandwidth consumed per minute.	Dependent item	azure.scaleset.data.disk.bandwidth.consumed Preprocessing JSON Path: `$.metrics.DataDiskBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk IOPS consumed	Percentage of data disk I/Os consumed per minute.	Dependent item	azure.scaleset.data.disk.iops.consumed Preprocessing JSON Path: `$.metrics.DataDiskIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk read rate	Bytes/sec read from a single disk during the monitoring period.	Dependent item	azure.scaleset.data.disk.read.bps Preprocessing JSON Path: `$.metrics.DataDiskReadBytessec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk IOPS read	Read IOPS from a single disk during the monitoring period.	Dependent item	azure.scaleset.data.disk.read.ops Preprocessing JSON Path: `$.metrics.DataDiskReadOperationsSec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk used burst BPS credits	Percentage of data disk burst bandwidth credits used so far.	Dependent item	azure.scaleset.data.disk.bandwidth.burst.used Preprocessing JSON Path: `$.metrics.DataDiskUsedBurstBPSCreditsPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk used burst IO credits	Percentage of data disk burst I/O credits used so far.	Dependent item	azure.scaleset.data.disk.iops.burst.used Preprocessing JSON Path: `$.metrics.DataDiskUsedBurstIOCreditsPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk write rate	Bytes/sec written to a single disk during the monitoring period.	Dependent item	azure.scaleset.data.disk.write.bps Preprocessing JSON Path: `$.metrics.DataDiskWriteBytessec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk IOPS write	Write IOPS from a single disk during the monitoring period.	Dependent item	azure.scaleset.data.disk.write.ops Preprocessing JSON Path: `$.metrics.DataDiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk queue depth	Data disk queue depth (or queue length).	Dependent item	azure.scaleset.data.disk.queue.depth Preprocessing JSON Path: `$.metrics.DataDiskQueueDepth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk target bandwidth	Baseline byte-per-second throughput the data disk can achieve without bursting.	Dependent item	azure.scaleset.data.disk.bandwidth.target Preprocessing JSON Path: `$.metrics.DataDiskTargetBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk target IOPS	Baseline IOPS the data disk can achieve without bursting.	Dependent item	azure.scaleset.data.disk.iops.target Preprocessing JSON Path: `$.metrics.DataDiskTargetIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk max burst bandwidth	Maximum byte-per-second throughput the data disk can achieve with bursting.	Dependent item	azure.scaleset.data.disk.bandwidth.burst.max Preprocessing JSON Path: `$.metrics.DataDiskMaxBurstBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk max burst IOPS	Maximum IOPS the data disk can achieve with bursting.	Dependent item	azure.scaleset.data.disk.iops.burst.max Preprocessing JSON Path: `$.metrics.DataDiskMaxBurstIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Disk read	Bytes read from the disk during the monitoring period.	Dependent item	azure.scaleset.disk.read Preprocessing JSON Path: `$.metrics.DiskReadBytes.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Disk IOPS read	Disk read IOPS.	Dependent item	azure.scaleset.disk.read.ops Preprocessing JSON Path: `$.metrics.DiskReadOperationsSec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Disk write	Bytes written to the disk during the monitoring period.	Dependent item	azure.scaleset.disk.write Preprocessing JSON Path: `$.metrics.DiskWriteBytes.total` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Disk IOPS write	Write IOPS from a single disk during the monitoring period.	Dependent item	azure.scaleset.disk.write.ops Preprocessing JSON Path: `$.metrics.DiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Inbound flows	Inbound Flows are the number of current flows in the inbound direction (traffic going into the VMs).	Dependent item	azure.scaleset.flows.inbound Preprocessing JSON Path: `$.metrics.InboundFlows.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Outbound flows	Outbound Flows are the number of current flows in the outbound direction (traffic going out of the VMs).	Dependent item	azure.scaleset.flows.outbound Preprocessing JSON Path: `$.metrics.OutboundFlows.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Network in total	The number of bytes received on all network interfaces by the virtual machine(s) (incoming traffic).	Dependent item	azure.scaleset.network.in.total Preprocessing JSON Path: `$.metrics.NetworkInTotal.total` ⛔️Custom on fail: Discard value
Network out total	The number of bytes out on all network interfaces by the virtual machine(s) (outgoing traffic).	Dependent item	azure.scaleset.network.out.total Preprocessing JSON Path: `$.metrics.NetworkOutTotal.total` ⛔️Custom on fail: Discard value
Inbound flow maximum creation rate	The maximum creation rate of inbound flows (traffic going into the VM).	Dependent item	azure.scaleset.flows.inbound.max Preprocessing JSON Path: `$.metrics.InboundFlowsMaximumCreationRate.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Outbound flow maximum creation rate	The maximum creation rate of outbound flows (traffic going out of the VM).	Dependent item	azure.scaleset.flows.outbound.max Preprocessing JSON Path: `$.metrics.OutboundFlowsMaximumCreationRate.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk read rate	Bytes/sec read from a single disk during the monitoring period - for an OS disk.	Dependent item	azure.scaleset.os.disk.read.bps Preprocessing JSON Path: `$.metrics.OSDiskReadBytessec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk write rate	Bytes/sec written to a single disk during the monitoring period - for an OS disk.	Dependent item	azure.scaleset.os.disk.write.bps Preprocessing JSON Path: `$.metrics.OSDiskWriteBytessec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk IOPS read	Read IOPS from a single disk during the monitoring period - for an OS disk.	Dependent item	azure.scaleset.os.disk.read.ops Preprocessing JSON Path: `$.metrics.OSDiskReadOperationsSec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk IOPS write	Write IOPS from a single disk during the monitoring period - for an OS disk.	Dependent item	azure.scaleset.os.disk.write.ops Preprocessing JSON Path: `$.metrics.OSDiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk queue depth	OS Disk queue depth (or queue length).	Dependent item	azure.scaleset.os.disk.queue.depth Preprocessing JSON Path: `$.metrics.OSDiskQueueDepth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk bandwidth consumed	Percentage of operating system disk bandwidth consumed per minute.	Dependent item	azure.scaleset.os.disk.bandwidth.consumed Preprocessing JSON Path: `$.metrics.OSDiskBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk IOPS consumed	Percentage of operating system disk I/Os consumed per minute.	Dependent item	azure.scaleset.os.disk.iops.consumed Preprocessing JSON Path: `$.metrics.OSDiskIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk target bandwidth	Baseline byte-per-second throughput the OS Disk can achieve without bursting.	Dependent item	azure.scaleset.os.disk.bandwidth.target Preprocessing JSON Path: `$.metrics.OSDiskTargetBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk target IOPS	Baseline IOPS the OS disk can achieve without bursting.	Dependent item	azure.scaleset.os.disk.iops.target Preprocessing JSON Path: `$.metrics.OSDiskTargetIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk max burst bandwidth	Maximum byte-per-second throughput the OS Disk can achieve with bursting.	Dependent item	azure.scaleset.os.disk.bandwidth.max Preprocessing JSON Path: `$.metrics.OSDiskMaxBurstBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk max burst IOPS	Maximum IOPS the OS Disk can achieve with bursting.	Dependent item	azure.scaleset.os.disk.iops.max Preprocessing JSON Path: `$.metrics.OSDiskMaxBurstIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk used burst BPS credits	Percentage of OS Disk burst bandwidth credits used so far.	Dependent item	azure.scaleset.os.disk.bandwidth.burst.used Preprocessing JSON Path: `$.metrics.OSDiskUsedBurstBPSCreditsPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk used burst IO credits	Percentage of OS Disk burst I/O credits used so far.	Dependent item	azure.scaleset.os.disk.iops.burst.used Preprocessing JSON Path: `$.metrics.OSDiskUsedBurstIOCreditsPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Premium data disk cache read hit in %	Percentage of premium data disk cache read hit.	Dependent item	azure.scaleset.premium.data.disk.cache.read.hit Preprocessing JSON Path: `$.metrics.PremiumDataDiskCacheReadHit.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Premium data disk cache read miss in %	Percentage of premium data disk cache read miss.	Dependent item	azure.scaleset.premium.data.disk.cache.read.miss Preprocessing JSON Path: `$.metrics.PremiumDataDiskCacheReadMiss.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Premium OS disk cache read hit in %	Percentage of premium OS disk cache read hit.	Dependent item	azure.scaleset.premium.os.disk.cache.read.hit Preprocessing JSON Path: `$.metrics.PremiumOSDiskCacheReadHit.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Premium OS disk cache read miss in %	Percentage of premium OS disk cache read miss.	Dependent item	azure.scaleset.premium.os.disk.cache.read.miss Preprocessing JSON Path: `$.metrics.PremiumOSDiskCacheReadMiss.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
VM cached bandwidth consumed	Percentage of cached disk bandwidth consumed by the VM.	Dependent item	azure.scaleset.vm.cached.bandwidth.consumed Preprocessing JSON Path: `$.metrics.VMCachedBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
VM cached IOPS consumed	Percentage of cached disk IOPS consumed by the VM.	Dependent item	azure.scaleset.vm.cached.iops.consumed Preprocessing JSON Path: `$.metrics.VMCachedIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
VM uncached bandwidth consumed	Percentage of uncached disk bandwidth consumed by the VM.	Dependent item	azure.scaleset.vm.uncached.bandwidth.consumed Preprocessing JSON Path: `$.metrics.VMUncachedBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
VM uncached IOPS consumed	Percentage of uncached disk IOPS consumed by the VM.	Dependent item	azure.scaleset.vm.uncached.iops.consumed Preprocessing JSON Path: `$.metrics.VMUncachedIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
VM availability metric	Measure of availability of the virtual machines over time.	Dependent item	azure.scaleset.availability Preprocessing JSON Path: `$.metrics.VmAvailabilityMetric.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression
Azure VM Scale: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure VM Scale Set by HTTP/azure.scaleset.data.errors))>0`\|Average
Azure VM Scale: Virtual machine scale set is unavailable	The resource state is unavailable.	`last(/Azure VM Scale Set by HTTP/azure.scaleset.availability.state)=2`\|High
Azure VM Scale: Virtual machine scale set is degraded	The resource is in a degraded state.	`last(/Azure VM Scale Set by HTTP/azure.scaleset.availability.state)=1`\|Average
Azure VM Scale: Virtual machine scale set is in unknown state	The resource state is unknown.	`last(/Azure VM Scale Set by HTTP/azure.scaleset.availability.state)=3`\|Warning
Azure VM Scale: High amount of VMs in the scale set	High amount of VMs in the scale set.	`min(/Azure VM Scale Set by HTTP/azure.scaleset.vm.count,5m)>{$AZURE.SCALESET.VM.COUNT.CRIT}`\|High
Azure VM Scale: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure VM Scale Set by HTTP/azure.scaleset.cpu.utilization,5m)>{$AZURE.SCALESET.CPU.UTIL.CRIT}`\|High

Azure Virtual Machine by HTTP

Overview

This template is designed to monitor Microsoft Azure virtual machines (VMs) by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure virtual machines

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure virtual machine ID.
{$AZURE.VM.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.vm.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.vm.data.errors Preprocessing JSON Path: `$.errors` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes.	Dependent item	azure.vm.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of availability status.	Dependent item	azure.vm.availability.details Preprocessing JSON Path: `$.health.summary` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
CPU utilization	Percentage of allocated compute units that are currently in use by virtual machine.	Dependent item	azure.vm.cpu.utilization Preprocessing JSON Path: `$.metrics.PercentageCPU.average` ⛔️Custom on fail: Discard value
Disk read	Bytes read from the disk during the monitoring period.	Dependent item	azure.vm.disk.read.bytes Preprocessing JSON Path: `$.metrics.DiskReadBytes.total` ⛔️Custom on fail: Discard value
Disk write	Bytes written to the disk during the monitoring period.	Dependent item	azure.vm.disk.write.bytes Preprocessing JSON Path: `$.metrics.DiskWriteBytes.total` ⛔️Custom on fail: Discard value
Disk IOPS read	The count of read operations from the disk per second.	Dependent item	azure.vm.disk.read.ops Preprocessing JSON Path: `$.metrics.DiskReadOperationsSec.average` ⛔️Custom on fail: Discard value
Disk IOPS write	The count of write operations to the disk per second.	Dependent item	azure.vm.disk.write.ops Preprocessing JSON Path: `$.metrics.DiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value
CPU credits remaining	Total number of credits available to burst. Available only on B-series burstable VMs.	Dependent item	azure.vm.cpu.credits.remaining Preprocessing JSON Path: `$.metrics.CPUCreditsRemaining.average` ⛔️Custom on fail: Discard value
CPU credits consumed	Total number of credits consumed by the virtual machine. Only available on B-series burstable VMs.	Dependent item	azure.vm.cpu.credits.consumed Preprocessing JSON Path: `$.metrics.CPUCreditsConsumed.average` ⛔️Custom on fail: Discard value
Data disk read rate	Bytes per second read from a single disk during the monitoring period.	Dependent item	azure.vm.data.disk.read.bps Preprocessing JSON Path: `$.metrics.DataDiskReadBytessec.average` ⛔️Custom on fail: Discard value
Data disk write rate	Bytes per second written to a single disk during the monitoring period.	Dependent item	azure.vm.data.disk.write.bps Preprocessing JSON Path: `$.metrics.DataDiskWriteBytessec.average` ⛔️Custom on fail: Discard value
Data disk IOPS read	Read IOPS from a single disk during the monitoring period.	Dependent item	azure.vm.data.disk.read.ops Preprocessing JSON Path: `$.metrics.DataDiskReadOperationsSec.average` ⛔️Custom on fail: Discard value
Data disk IOPS write	Write IOPS from a single disk during the monitoring period.	Dependent item	azure.vm.data.disk.write.ops Preprocessing JSON Path: `$.metrics.DataDiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value
Data disk queue depth	The number of outstanding IO requests that are waiting to be performed on a disk.	Dependent item	azure.vm.data.disk.queue.depth Preprocessing JSON Path: `$.metrics.DataDiskQueueDepth.average` ⛔️Custom on fail: Discard value
Data disk bandwidth consumed	Percentage of the data disk bandwidth consumed per minute.	Dependent item	azure.vm.data.disk.bandwidth.consumed Preprocessing JSON Path: `$.metrics.DataDiskBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value
Data disk IOPS consumed	Percentage of the data disk input/output (I/O) consumed per minute.	Dependent item	azure.vm.data.disk.iops.consumed Preprocessing JSON Path: `$.metrics.DataDiskIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value
Data disk target bandwidth	Baseline byte-per-second throughput that the data disk can achieve without bursting.	Dependent item	azure.vm.data.disk.bandwidth.target Preprocessing JSON Path: `$.metrics.DataDiskTargetBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk target IOPS	Baseline IOPS that the data disk can achieve without bursting.	Dependent item	azure.vm.data.disk.iops.target Preprocessing JSON Path: `$.metrics.DataDiskTargetIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk max burst bandwidth	Maximum byte-per-second throughput that the data disk can achieve with bursting.	Dependent item	azure.vm.data.disk.bandwidth.max Preprocessing JSON Path: `$.metrics.DataDiskMaxBurstBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk max burst IOPS	Maximum IOPS that the data disk can achieve with bursting.	Dependent item	azure.vm.data.disk.iops.max Preprocessing JSON Path: `$.metrics.DataDiskMaxBurstIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Data disk used burst BPS credits	Percentage of the data disk burst bandwidth credits used so far.	Dependent item	azure.vm.data.disk.bandwidth.burst.used Preprocessing JSON Path: `$.metrics.DataDiskUsedBurstBPSCreditsPercentage.average` ⛔️Custom on fail: Discard value
Data disk used burst IO credits	Percentage of the data disk burst I/O credits used so far.	Dependent item	azure.vm.data.disk.iops.burst.used Preprocessing JSON Path: `$.metrics.DataDiskUsedBurstIOCreditsPercentage.average` ⛔️Custom on fail: Discard value
OS disk read rate	Bytes/sec read from a single disk during the monitoring period - for an OS disk.	Dependent item	azure.vm.os.disk.read.bps Preprocessing JSON Path: `$.metrics.OSDiskReadBytessec.average` ⛔️Custom on fail: Discard value
OS disk write rate	Bytes/sec written to a single disk during the monitoring period - for an OS disk.	Dependent item	azure.vm.os.disk.write.bps Preprocessing JSON Path: `$.metrics.OSDiskWriteBytessec.average` ⛔️Custom on fail: Discard value
OS disk IOPS read	Read IOPS from a single disk during the monitoring period - for an OS disk.	Dependent item	azure.vm.os.disk.read.ops Preprocessing JSON Path: `$.metrics.OSDiskReadOperationsSec.average` ⛔️Custom on fail: Discard value
OS disk IOPS write	Write IOPS from a single disk during the monitoring period - for an OS disk.	Dependent item	azure.vm.os.disk.write.ops Preprocessing JSON Path: `$.metrics.OSDiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value
OS disk queue depth	The OS disk queue depth (or queue length).	Dependent item	azure.vm.os.disk.queue.depth Preprocessing JSON Path: `$.metrics.OSDiskQueueDepth.average` ⛔️Custom on fail: Discard value
OS disk bandwidth consumed	Percentage of the operating system disk bandwidth consumed per minute.	Dependent item	azure.vm.os.disk.bandwidth.consumed Preprocessing JSON Path: `$.metrics.OSDiskBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value
OS disk IOPS consumed	Percentage of the operating system disk I/Os consumed per minute.	Dependent item	azure.vm.os.disk.iops.consumed Preprocessing JSON Path: `$.metrics.OSDiskIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value
OS disk target bandwidth	Baseline byte-per-second throughput that the OS disk can achieve without bursting.	Dependent item	azure.vm.os.disk.bandwidth.target Preprocessing JSON Path: `$.metrics.OSDiskTargetBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk target IOPS	Baseline IOPS that the OS disk can achieve without bursting.	Dependent item	azure.vm.os.disk.iops.target Preprocessing JSON Path: `$.metrics.OSDiskTargetIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk max burst bandwidth	Maximum byte-per-second throughput that the OS disk can achieve with bursting.	Dependent item	azure.vm.os.disk.bandwidth.max Preprocessing JSON Path: `$.metrics.OSDiskMaxBurstBandwidth.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk max burst IOPS	Maximum IOPS that the OS disk can achieve with bursting.	Dependent item	azure.vm.os.disk.iops.max Preprocessing JSON Path: `$.metrics.OSDiskMaxBurstIOPS.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
OS disk used burst BPS credits	Percentage of the OS disk burst bandwidth credits used so far.	Dependent item	azure.vm.os.disk.bandwidth.burst.used Preprocessing JSON Path: `$.metrics.OSDiskUsedBurstBPSCreditsPercentage.average` ⛔️Custom on fail: Discard value
OS disk used burst IO credits	Percentage of the OS disk burst I/O credits used so far.	Dependent item	azure.vm.os.disk.iops.burst.used Preprocessing JSON Path: `$.metrics.OSDiskUsedBurstIOCreditsPercentage.average` ⛔️Custom on fail: Discard value
Inbound flows	The number of current flows in the inbound direction (the traffic going into the VM).	Dependent item	azure.vm.flows.inbound Preprocessing JSON Path: `$.metrics.InboundFlows.average` ⛔️Custom on fail: Discard value
Outbound flows	The number of current flows in the outbound direction (the traffic going out of the VM).	Dependent item	azure.vm.flows.outbound Preprocessing JSON Path: `$.metrics.OutboundFlows.average` ⛔️Custom on fail: Discard value
Inbound flows max creation rate	Maximum creation rate of the inbound flows (the traffic going into the VM).	Dependent item	azure.vm.flows.inbound.max Preprocessing JSON Path: `$.metrics.InboundFlowsMaximumCreationRate.average` ⛔️Custom on fail: Discard value
Outbound flows max creation rate	Maximum creation rate of the outbound flows (the traffic going out of the VM).	Dependent item	azure.vm.flows.outbound.max Preprocessing JSON Path: `$.metrics.OutboundFlowsMaximumCreationRate.average` ⛔️Custom on fail: Discard value
Premium data disk cache read hit in %	Percentage of premium data disk cache read hit.	Dependent item	azure.vm.premium.data.disk.cache.read.hit Preprocessing JSON Path: `$.metrics.PremiumDataDiskCacheReadHit.average` ⛔️Custom on fail: Discard value
Premium data disk cache read miss in %	Percentage of premium data disk cache read miss.	Dependent item	azure.vm.premium.data.disk.cache.read.miss Preprocessing JSON Path: `$.metrics.PremiumDataDiskCacheReadMiss.average` ⛔️Custom on fail: Discard value
Premium OS disk cache read hit in %	Percentage of premium OS disk cache read hit.	Dependent item	azure.vm.premium.os.disk.cache.read.hit Preprocessing JSON Path: `$.metrics.PremiumOSDiskCacheReadHit.average` ⛔️Custom on fail: Discard value
Premium OS disk cache read miss in %	Percentage of premium OS disk cache read miss.	Dependent item	azure.vm.premium.os.disk.cache.read.miss Preprocessing JSON Path: `$.metrics.PremiumOSDiskCacheReadMiss.average` ⛔️Custom on fail: Discard value
VM cached bandwidth consumed	Percentage of the cached disk bandwidth consumed by the VM.	Dependent item	azure.vm.cached.bandwidth.consumed Preprocessing JSON Path: `$.metrics.VMCachedBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value
VM cached IOPS consumed	Percentage of the cached disk IOPS consumed by the VM.	Dependent item	azure.vm.cached.iops.consumed Preprocessing JSON Path: `$.metrics.VMCachedIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value
VM uncached bandwidth consumed	Percentage of the uncached disk bandwidth consumed by the VM.	Dependent item	azure.vm.uncached.bandwidth.consumed Preprocessing JSON Path: `$.metrics.VMUncachedBandwidthConsumedPercentage.average` ⛔️Custom on fail: Discard value
VM uncached IOPS consumed	Percentage of the uncached disk IOPS consumed by the VM.	Dependent item	azure.vm.uncached.iops.consumed Preprocessing JSON Path: `$.metrics.VMUncachedIOPSConsumedPercentage.average` ⛔️Custom on fail: Discard value
Network in total	The number of bytes received by the VM via all network interfaces (incoming traffic).	Dependent item	azure.vm.network.in.total Preprocessing JSON Path: `$.metrics.NetworkInTotal.total` ⛔️Custom on fail: Discard value
Network out total	The number of bytes sent by the VM via all network interfaces (outgoing traffic).	Dependent item	azure.vm.network.out.total Preprocessing JSON Path: `$.metrics.NetworkOutTotal.total` ⛔️Custom on fail: Discard value
Available memory	Amount of physical memory, in bytes, immediately available for the allocation to a process or for a system use in the virtual machine.	Dependent item	azure.vm.memory.available Preprocessing JSON Path: `$.metrics.AvailableMemoryBytes.average` ⛔️Custom on fail: Discard value
Data disk latency	Average time to complete each IO during the monitoring period for Data Disk.	Dependent item	azure.vm.disk.latency Preprocessing JSON Path: `$.metrics.DataDiskLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
OS disk latency	Average time to complete each IO during the monitoring period for OS Disk.	Dependent item	azure.vm.os.disk.latency Preprocessing JSON Path: `$.metrics.OSDiskLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Temp disk latency	Average time to complete each IO during the monitoring period for temp disk.	Dependent item	azure.vm.temp.disk.latency Preprocessing JSON Path: `$.metrics.TempDiskLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Temp disk read rate	Bytes/Sec read from a single disk during the monitoring period for temp disk.	Dependent item	azure.vm.temp.disk.read.bps Preprocessing JSON Path: `$.metrics.TempDiskReadBytessec.average` ⛔️Custom on fail: Discard value
Temp disk write rate	Bytes/Sec written to a single disk during the monitoring period for temp disk.	Dependent item	azure.vm.temp.disk.write.bps Preprocessing JSON Path: `$.metrics.TempDiskWriteBytessec.average` ⛔️Custom on fail: Discard value
Temp disk IOPS read	Read IOPS from a single disk during the monitoring period for temp disk.	Dependent item	azure.vm.temp.disk.read.ops Preprocessing JSON Path: `$.metrics.TempDiskReadOperationsSec.average` ⛔️Custom on fail: Discard value
Temp disk IOPS write	Bytes/Sec written to a single disk during the monitoring period for temp disk.	Dependent item	azure.vm.temp.disk.write.ops Preprocessing JSON Path: `$.metrics.TempDiskWriteOperationsSec.average` ⛔️Custom on fail: Discard value
Temp disk queue depth	Temp Disk queue depth (or queue length).	Dependent item	azure.vm.temp.disk.queue.depth Preprocessing JSON Path: `$.metrics.TempDiskQueueDepth.average` ⛔️Custom on fail: Discard value
VM availability metric	Measure of availability of virtual machine over time.	Dependent item	azure.vm.availability Preprocessing JSON Path: `$.metrics.VmAvailabilityMetric.average` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
Azure VM: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Virtual Machine by HTTP/azure.vm.data.errors))>0`\|Average
Azure VM: Virtual machine is unavailable	The resource state is unavailable.	`last(/Azure Virtual Machine by HTTP/azure.vm.availability.state)=2`\|High
Azure VM: Virtual machine is degraded	The resource is in a degraded state.	`last(/Azure Virtual Machine by HTTP/azure.vm.availability.state)=1`\|Average
Azure VM: Virtual machine is in unknown state	The resource state is unknown.	`last(/Azure Virtual Machine by HTTP/azure.vm.availability.state)=3`\|Warning
Azure VM: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure Virtual Machine by HTTP/azure.vm.cpu.utilization,5m)>{$AZURE.VM.CPU.UTIL.CRIT}`\|High

Azure MySQL Flexible Server by HTTP

Overview

This template is designed to monitor Microsoft Azure MySQL flexible servers by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure MySQL flexible servers

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure MySQL server ID.
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of the storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of the storage utilization, expressed in %.	`90`
{$AZURE.DB.ABORTED.CONN.MAX.WARN}	The number of failed attempts to connect to the MySQL server for a trigger expression.	`25`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.db.mysql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.mysql.data.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.mysql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.db.mysql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
Percentage CPU	The CPU percent of a host.	Dependent item	azure.db.mysql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.maximum`
Memory utilization	The memory percent of a host.	Dependent item	azure.db.mysql.memory.percentage Preprocessing JSON Path: `$.metrics.memory_percent.maximum`
Network out	Network egress of a host, expressed in bytes.	Dependent item	azure.db.mysql.network.egress Preprocessing JSON Path: `$.metrics.network_bytes_egress.total` Custom multiplier: `0.0088`
Network in	Network ingress of a host, expressed in bytes.	Dependent item	azure.db.mysql.network.ingress Preprocessing JSON Path: `$.metrics.network_bytes_ingress.total` Custom multiplier: `0.0088`
Connections active	The count of active connections.	Dependent item	azure.db.mysql.connections.active Preprocessing JSON Path: `$.metrics.active_connections.maximum`
Connections total	The count of total connections.	Dependent item	azure.db.mysql.connections.total Preprocessing JSON Path: `$.metrics.total_connections.total`
Connections aborted	The count of aborted connections.	Dependent item	azure.db.mysql.connections.aborted Preprocessing JSON Path: `$.metrics.aborted_connections.total`
Queries	The count of queries.	Dependent item	azure.db.mysql.queries Preprocessing JSON Path: `$.metrics.Queries.total`
IO consumption percent	The consumption percent of I/O.	Dependent item	azure.db.mysql.io.consumption.percent Preprocessing JSON Path: `$.metrics.io_consumption_percent.maximum`
Storage percent	The storage utilization, expressed in %.	Dependent item	azure.db.mysql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.maximum`
Storage used	Used storage space, expressed in bytes.	Dependent item	azure.db.mysql.storage.used Preprocessing JSON Path: `$.metrics.storage_used.maximum`
Storage limit	The storage limit, expressed in bytes.	Dependent item	azure.db.mysql.storage.limit Preprocessing JSON Path: `$.metrics.storage_limit.maximum`
Backup storage used	Used backup storage, expressed in bytes.	Dependent item	azure.db.mysql.storage.backup.used Preprocessing JSON Path: `$.metrics.backup_storage_used.maximum`
Replication lag	The replication lag, expressed in seconds.	Dependent item	azure.db.mysql.replication.lag Preprocessing JSON Path: `$.metrics.replication_lag.maximum` ⛔️Custom on fail: Discard value
CPU credits remaining	The remaining CPU credits.	Dependent item	azure.db.mysql.cpu.credits.remaining Preprocessing JSON Path: `$.metrics.cpu_credits_remaining.maximum` ⛔️Custom on fail: Discard value
CPU credits consumed	The consumed CPU credits.	Dependent item	azure.db.mysql.cpu.credits.consumed Preprocessing JSON Path: `$.metrics.cpu_credits_consumed.maximum` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
Azure MySQL Flexible: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.data.errors))>0`\|Average
Azure MySQL Flexible: MySQL server is unavailable	The resource state is unavailable.	`last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.availability.state)=2`\|High
Azure MySQL Flexible: MySQL server is degraded	The resource is in a degraded state.	`last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.availability.state)=1`\|Average
Azure MySQL Flexible: MySQL server is in unknown state	The resource state is unknown.	`last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.availability.state)=3`\|Warning
Azure MySQL Flexible: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure MySQL Flexible: Server has aborted connections	The number of failed attempts to connect to the MySQL server is more than `{$AZURE.DB.ABORTED.CONN.MAX.WARN}`.	`min(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.connections.aborted,5m)>{$AZURE.DB.ABORTED.CONN.MAX.WARN}`\|Average
Azure MySQL Flexible: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure MySQL Flexible: Storage space is low	High utilization of the storage space.	`last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning

Azure MySQL Single Server by HTTP

Overview

This template is designed to monitor Microsoft Azure MySQL single servers by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure MySQL single servers

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure MySQL server ID.
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.MEMORY.UTIL.CRIT}	The critical threshold of memory utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of storage utilization, expressed in %.	`90`
{$AZURE.DB.FAILED.CONN.MAX.WARN}	The number of failed attempts to connect to the MySQL server for trigger expression.	`25`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.db.mysql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.mysql.data.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.mysql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.db.mysql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
Percentage CPU	The CPU percent of a host.	Dependent item	azure.db.mysql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.average`
Memory utilization	The memory percent of a host.	Dependent item	azure.db.mysql.memory.percentage Preprocessing JSON Path: `$.metrics.memory_percent.average`
Network out	The network outbound traffic across the active connections.	Dependent item	azure.db.mysql.network.egress Preprocessing JSON Path: `$.metrics.network_bytes_egress.total` ⛔️Custom on fail: Discard value Custom multiplier: `0.0088`
Network in	The network inbound traffic across the active connections.	Dependent item	azure.db.mysql.network.ingress Preprocessing JSON Path: `$.metrics.network_bytes_ingress.total` ⛔️Custom on fail: Discard value Custom multiplier: `0.0088`
Connections active	The count of active connections.	Dependent item	azure.db.mysql.connections.active Preprocessing JSON Path: `$.metrics.active_connections.average`
Connections failed	The count of failed connections.	Dependent item	azure.db.mysql.connections.failed Preprocessing JSON Path: `$.metrics.connections_failed.total` ⛔️Custom on fail: Discard value
IO consumption percent	The consumption percent of I/O.	Dependent item	azure.db.mysql.io.consumption.percent Preprocessing JSON Path: `$.metrics.io_consumption_percent.average`
Storage percent	The storage utilization, expressed in %.	Dependent item	azure.db.mysql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.average`
Storage used	Used storage space, expressed in bytes.	Dependent item	azure.db.mysql.storage.used Preprocessing JSON Path: `$.metrics.storage_used.average`
Storage limit	The storage limit, expressed in bytes.	Dependent item	azure.db.mysql.storage.limit Preprocessing JSON Path: `$.metrics.storage_limit.maximum`
Backup storage used	Used backup storage, expressed in bytes.	Dependent item	azure.db.mysql.storage.backup.used Preprocessing JSON Path: `$.metrics.backup_storage_used.average`
Replication lag	The replication lag, expressed in seconds.	Dependent item	azure.db.mysql.replication.lag Preprocessing JSON Path: `$.metrics.seconds_behind_master.maximum` ⛔️Custom on fail: Discard value
Server log storage percent	The storage utilization by server log, expressed in %.	Dependent item	azure.db.mysql.storage.server.log.percent Preprocessing JSON Path: `$.metrics.serverlog_storage_percent.average`
Server log storage used	The storage space used by server log, expressed in bytes.	Dependent item	azure.db.mysql.storage.server.log.used Preprocessing JSON Path: `$.metrics.serverlog_storage_usage.average`
Server log storage limit	The storage limit of server log, expressed in bytes.	Dependent item	azure.db.mysql.storage.server.log.limit Preprocessing JSON Path: `$.metrics.serverlog_storage_limit.maximum`

Triggers

Name	Description	Expression
Azure MySQL Single: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure MySQL Single Server by HTTP/azure.db.mysql.data.errors))>0`\|Average
Azure MySQL Single: MySQL server is unavailable	The resource state is unavailable.	`last(/Azure MySQL Single Server by HTTP/azure.db.mysql.availability.state)=2`\|High
Azure MySQL Single: MySQL server is degraded	The resource is in a degraded state.	`last(/Azure MySQL Single Server by HTTP/azure.db.mysql.availability.state)=1`\|Average
Azure MySQL Single: MySQL server is in unknown state	The resource state is unknown.	`last(/Azure MySQL Single Server by HTTP/azure.db.mysql.availability.state)=3`\|Warning
Azure MySQL Single: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure MySQL Single Server by HTTP/azure.db.mysql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure MySQL Single: High memory utilization	The system is running out of free memory.	`min(/Azure MySQL Single Server by HTTP/azure.db.mysql.memory.percentage,5m)>{$AZURE.DB.MEMORY.UTIL.CRIT}`\|Average
Azure MySQL Single: Server has failed connections	The number of failed attempts to connect to the MySQL server is more than `{$AZURE.DB.FAILED.CONN.MAX.WARN}`.	`min(/Azure MySQL Single Server by HTTP/azure.db.mysql.connections.failed,5m)>{$AZURE.DB.FAILED.CONN.MAX.WARN}`\|Average
Azure MySQL Single: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure MySQL Single Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure MySQL Single: Storage space is low	High utilization of the storage space.	`last(/Azure MySQL Single Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning

Azure PostgreSQL Flexible Server by HTTP

Overview

This template is designed to monitor Microsoft Azure PostgreSQL flexible servers by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure PostgreSQL flexible servers

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure PostgreSQL server ID.
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.MEMORY.UTIL.CRIT}	The critical threshold of memory utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of storage utilization, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.db.pgsql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.pgsql.data.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.pgsql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.db.pgsql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
Percentage CPU	The CPU percent of a host.	Dependent item	azure.db.pgsql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.average`
Memory utilization	The memory percent of a host.	Dependent item	azure.db.pgsql.memory.percentage Preprocessing JSON Path: `$.metrics.memory_percent.average`
Network out	The network outbound traffic across the active connections.	Dependent item	azure.db.pgsql.network.egress Preprocessing JSON Path: `$.metrics.network_bytes_egress.total` Custom multiplier: `0.1333`
Network in	The network inbound traffic across the active connections.	Dependent item	azure.db.pgsql.network.ingress Preprocessing JSON Path: `$.metrics.network_bytes_ingress.total` Custom multiplier: `0.1333`
Connections active	The count of active connections.	Dependent item	azure.db.pgsql.connections.active Preprocessing JSON Path: `$.metrics.active_connections.average`
Connections succeeded	The count of succeeded connections.	Dependent item	azure.db.pgsql.connections.succeeded Preprocessing JSON Path: `$.metrics.connections_succeeded.total`
Connections failed	The count of failed connections.	Dependent item	azure.db.pgsql.connections.failed Preprocessing JSON Path: `$.metrics.connections_failed.total` ⛔️Custom on fail: Discard value
Storage percent	The storage utilization, expressed in %.	Dependent item	azure.db.pgsql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.average`
Storage used	Used storage space, expressed in bytes.	Dependent item	azure.db.pgsql.storage.used Preprocessing JSON Path: `$.metrics.storage_used.average`
Storage free	Free storage space, expressed in bytes.	Dependent item	azure.db.pgsql.storage.free Preprocessing JSON Path: `$.metrics.storage_free.average`
Backup storage used	Used backup storage, expressed in bytes.	Dependent item	azure.db.pgsql.storage.backup.used Preprocessing JSON Path: `$.metrics.backup_storage_used.average` ⛔️Custom on fail: Discard value
CPU credits remaining	The total number of credits available to burst.	Dependent item	azure.db.pgsql.cpu.credits.remaining Preprocessing JSON Path: `$.metrics.cpu_credits_remaining.average` ⛔️Custom on fail: Discard value
CPU credits consumed	The total number of credits consumed by the database server.	Dependent item	azure.db.pgsql.cpu.credits.consumed Preprocessing JSON Path: `$.metrics.cpu_credits_consumed.average` ⛔️Custom on fail: Discard value
Data disk queue depth	The number of outstanding I/O operations to the data disk.	Dependent item	azure.db.pgsql.disk.queue.depth Preprocessing JSON Path: `$.metrics.disk_queue_depth.average` ⛔️Custom on fail: Discard value
Data disk IOPS	I/O operations per second.	Dependent item	azure.db.pgsql.iops Preprocessing JSON Path: `$.metrics.iops.average`
Data disk read IOPS	The number of the data disk I/O read operations per second.	Dependent item	azure.db.pgsql.iops.read Preprocessing JSON Path: `$.metrics.read_iops.average` ⛔️Custom on fail: Discard value
Data disk write IOPS	The number of the data disk I/O write operations per second.	Dependent item	azure.db.pgsql.iops.write Preprocessing JSON Path: `$.metrics.write_iops.average` ⛔️Custom on fail: Discard value
Data disk read Bps	Bytes read per second from the data disk during the monitoring period.	Dependent item	azure.db.pgsql.disk.bps.read Preprocessing JSON Path: `$.metrics.read_throughput.average` ⛔️Custom on fail: Discard value
Data disk write Bps	Bytes written per second to the data disk during the monitoring period.	Dependent item	azure.db.pgsql.disk.bps.write Preprocessing JSON Path: `$.metrics.write_throughput.average` ⛔️Custom on fail: Discard value
Transaction log storage used	The storage space used by transaction log, expressed in bytes.	Dependent item	azure.db.pgsql.storage.txlogs.used Preprocessing JSON Path: `$.metrics.txlogs_storage_used.average`
Maximum used transaction IDs	The maximum number of used transaction IDs.	Dependent item	azure.db.pgsql.txid.used.max Preprocessing JSON Path: `$.metrics.maximum_used_transactionIDs.average`

Triggers

Name	Description	Expression
Azure PostgreSQL Flexible: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.data.errors))>0`\|Average
Azure PostgreSQL Flexible: PostgreSQL server is unavailable	The resource state is unavailable.	`last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.availability.state)=2`\|High
Azure PostgreSQL Flexible: PostgreSQL server is degraded	The resource is in a degraded state.	`last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.availability.state)=1`\|Average
Azure PostgreSQL Flexible: PostgreSQL server is in unknown state	The resource state is unknown.	`last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.availability.state)=3`\|Warning
Azure PostgreSQL Flexible: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure PostgreSQL Flexible: High memory utilization	The system is running out of free memory.	`min(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.memory.percentage,5m)>{$AZURE.DB.MEMORY.UTIL.CRIT}`\|Average
Azure PostgreSQL Flexible: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure PostgreSQL Flexible: Storage space is low	High utilization of the storage space.	`last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning

Azure PostgreSQL Single Server by HTTP

Overview

This template is designed to monitor Microsoft Azure PostgreSQL servers by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure PostgreSQL servers

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure PostgreSQL server ID.
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.MEMORY.UTIL.CRIT}	The critical threshold of memory utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of storage utilization, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.db.pgsql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.pgsql.data.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.pgsql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.db.pgsql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
Percentage CPU	The CPU percent of a host.	Dependent item	azure.db.pgsql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.average`
Memory utilization	The memory percent of a host.	Dependent item	azure.db.pgsql.memory.percentage Preprocessing JSON Path: `$.metrics.memory_percent.average`
Network out	The network outbound traffic across the active connections.	Dependent item	azure.db.pgsql.network.egress Preprocessing JSON Path: `$.metrics.network_bytes_egress.total` ⛔️Custom on fail: Discard value Custom multiplier: `0.1333`
Network in	The network inbound traffic across the active connections.	Dependent item	azure.db.pgsql.network.ingress Preprocessing JSON Path: `$.metrics.network_bytes_ingress.total` ⛔️Custom on fail: Discard value Custom multiplier: `0.1333`
Connections active	The count of active connections.	Dependent item	azure.db.pgsql.connections.active Preprocessing JSON Path: `$.metrics.active_connections.average`
Connections failed	The count of failed connections.	Dependent item	azure.db.pgsql.connections.failed Preprocessing JSON Path: `$.metrics.connections_failed.total` ⛔️Custom on fail: Discard value
IO consumption percent	The consumption percent of I/O.	Dependent item	azure.db.pgsql.io.consumption.percent Preprocessing JSON Path: `$.metrics.io_consumption_percent.average`
Storage percent	The storage utilization, expressed in %.	Dependent item	azure.db.pgsql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.average`
Storage used	Used storage space, expressed in bytes.	Dependent item	azure.db.pgsql.storage.used Preprocessing JSON Path: `$.metrics.storage_used.average`
Storage limit	The storage limit, expressed in bytes.	Dependent item	azure.db.pgsql.storage.limit Preprocessing JSON Path: `$.metrics.storage_limit.maximum`
Backup storage used	Used backup storage, expressed in bytes.	Dependent item	azure.db.pgsql.storage.backup.used Preprocessing JSON Path: `$.metrics.backup_storage_used.average`
Replication lag	The replication lag, expressed in seconds.	Dependent item	azure.db.pgsql.replica.log.delay Preprocessing JSON Path: `$.metrics.pg_replica_log_delay_in_seconds.maximum` ⛔️Custom on fail: Discard value
Max lag across replicas in bytes	Lag for the most lagging replica, expressed in bytes.	Dependent item	azure.db.pgsql.replica.log.delay.bytes Preprocessing JSON Path: `$.metrics.pg_replica_log_delay_in_bytes.maximum` ⛔️Custom on fail: Discard value
Server log storage percent	The storage utilization by server log, expressed in %.	Dependent item	azure.db.pgsql.storage.server.log.percent Preprocessing JSON Path: `$.metrics.serverlog_storage_percent.average`
Server log storage used	The storage space used by server log, expressed in bytes.	Dependent item	azure.db.pgsql.storage.server.log.used Preprocessing JSON Path: `$.metrics.serverlog_storage_usage.average`
Server log storage limit	The storage limit of server log, expressed in bytes.	Dependent item	azure.db.pgsql.storage.server.log.limit Preprocessing JSON Path: `$.metrics.serverlog_storage_limit.maximum`

Triggers

Name	Description	Expression
Azure PostgreSQL Single: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.data.errors))>0`\|Average
Azure PostgreSQL Single: PostgreSQL server is unavailable	The resource state is unavailable.	`last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.availability.state)=2`\|High
Azure PostgreSQL Single: PostgreSQL server is degraded	The resource is in a degraded state.	`last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.availability.state)=1`\|Average
Azure PostgreSQL Single: PostgreSQL server is in unknown state	The resource state is unknown.	`last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.availability.state)=3`\|Warning
Azure PostgreSQL Single: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure PostgreSQL Single: High memory utilization	The system is running out of free memory.	`min(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.memory.percentage,5m)>{$AZURE.DB.MEMORY.UTIL.CRIT}`\|Average
Azure PostgreSQL Single: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure PostgreSQL Single: Storage space is low	High utilization of the storage space.	`last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning

Azure Microsoft SQL Serverless Database by HTTP

Overview

This template is designed to monitor Microsoft SQL serverless databases by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure SQL serverless databases

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure Microsoft SQL database ID.
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.MEMORY.UTIL.CRIT}	The critical threshold of memory utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of storage utilization, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.db.mssql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.mssql.data.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.mssql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.db.mssql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
Percentage CPU	The CPU percent of a host.	Dependent item	azure.db.mssql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.average`
Data IO percentage	The physical data read percentage.	Dependent item	azure.db.mssql.data.read.percentage Preprocessing JSON Path: `$.metrics.physical_data_read_percent.average`
Log IO percentage	The percentage of I/O log. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.log.write.percentage Preprocessing JSON Path: `$.metrics.log_write_percent.average`
Data space used	Data space used. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.storage.used Preprocessing JSON Path: `$.metrics.storage.maximum` ⛔️Custom on fail: Discard value
Connections successful	The count of successful connections.	Dependent item	azure.db.mssql.connections.successful Preprocessing JSON Path: `$.metrics.connection_successful.total`
Connections failed: System errors	The count of failed connections with system errors.	Dependent item	azure.db.mssql.connections.failed.system Preprocessing JSON Path: `$.metrics.connection_failed.total`
Connections blocked by firewall	The count of connections blocked by firewall.	Dependent item	azure.db.mssql.firewall.blocked Preprocessing JSON Path: `$.metrics.blocked_by_firewall.total`
Deadlocks	The count of deadlocks. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.deadlocks Preprocessing JSON Path: `$.metrics.deadlock.total`
Data space used percent	The percentage of used data space. Not applicable to the data warehouses or Hyperscale databases.	Dependent item	azure.db.mssql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.maximum` ⛔️Custom on fail: Discard value
In-Memory OLTP storage percent	In-Memory OLTP storage percent. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.storage.xtp.percent Preprocessing JSON Path: `$.metrics.xtp_storage_percent.average`
Workers percentage	The percentage of workers. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.workers.percent Preprocessing JSON Path: `$.metrics.workers_percent.average`
Sessions percentage	The percentage of sessions. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.sessions.percent Preprocessing JSON Path: `$.metrics.sessions_percent.average`
CPU limit	The CPU limit. Applies to the vCore-based databases.	Dependent item	azure.db.mssql.cpu.limit Preprocessing JSON Path: `$.metrics.cpu_limit.average` ⛔️Custom on fail: Discard value
CPU used	The CPU used. Applies to the vCore-based databases.	Dependent item	azure.db.mssql.cpu.used Preprocessing JSON Path: `$.metrics.cpu_used.average` ⛔️Custom on fail: Discard value
SQL Server process core percent	The CPU usage as a percentage of the SQL DB process. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.server.cpu.percent Preprocessing JSON Path: `$.metrics.sqlserver_process_core_percent.maximum` ⛔️Custom on fail: Discard value
SQL Server process memory percent	Memory usage as a percentage of the SQL DB process. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.server.memory.percent Preprocessing JSON Path: `$.metrics.sqlserver_process_memory_percent.maximum` ⛔️Custom on fail: Discard value
Tempdb data file size	Space used in `tempdb` data files, expressed in bytes. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.tempdb.data.size Preprocessing JSON Path: `$.metrics.tempdb_data_size.maximum` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Tempdb log file size	Space used in `tempdb` transaction log files, expressed in bytes. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.tempdb.log.size Preprocessing JSON Path: `$.metrics.tempdb_log_size.maximum` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Tempdb log used percent	The percentage of space used in `tempdb` transaction log files. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.tempdb.log.percent Preprocessing JSON Path: `$.metrics.tempdb_log_used_percent.maximum` ⛔️Custom on fail: Discard value
App CPU billed	App CPU billed. Applies to serverless databases.	Dependent item	azure.db.mssql.app.cpu.billed Preprocessing JSON Path: `$.metrics.app_cpu_billed.total` ⛔️Custom on fail: Discard value
App CPU percentage	App CPU percentage. Applies to serverless databases.	Dependent item	azure.db.mssql.app.cpu.percent Preprocessing JSON Path: `$.metrics.app_cpu_percent.average` ⛔️Custom on fail: Discard value
App memory percentage	App memory percentage. Applies to serverless databases.	Dependent item	azure.db.mssql.app.memory.percent Preprocessing JSON Path: `$.metrics.app_memory_percent.average` ⛔️Custom on fail: Discard value
Data space allocated	The allocated data storage. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.storage.allocated Preprocessing JSON Path: `$.metrics.allocated_data_storage.average` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
Azure MSSQL Serverless: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.data.errors))>0`\|Average
Azure MSSQL Serverless: Microsoft SQL database is unavailable	The resource state is unavailable.	`last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.availability.state)=2`\|High
Azure MSSQL Serverless: Microsoft SQL database is degraded	The resource is in a degraded state.	`last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.availability.state)=1`\|Average
Azure MSSQL Serverless: Microsoft SQL database is in unknown state	The resource state is unknown.	`last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.availability.state)=3`\|Warning
Azure MSSQL Serverless: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure MSSQL Serverless: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure MSSQL Serverless: Storage space is low	High utilization of the storage space.	`last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning

Azure Microsoft SQL DTU Database by HTTP

Overview

This template is designed to monitor Microsoft SQL DTU-based databases via HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure SQL DTU-based databases

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure SQL DTU-based database ID.
{$AZURE.DB.DTU.UTIL.CRIT}	The critical threshold of DTU utilization, expressed in %.	`90`
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.MEMORY.UTIL.CRIT}	The critical threshold of memory utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of storage utilization, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in JSON format.	Script	azure.db.mssql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.mssql.data.errors Preprocessing JSON Path: `$.errors` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.mssql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	A detailed summary of the availability status.	Dependent item	azure.db.mssql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
CPU percentage	The average percentage of CPU usage on a host.	Dependent item	azure.db.mssql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.average` ⛔️Custom on fail: Discard value
DTU percentage	The average percentage of DTU consumption for a DTU-based database.	Dependent item	azure.db.mssql.dtu.percentage Preprocessing JSON Path: `$.metrics.dtu_consumption_percent.average` ⛔️Custom on fail: Discard value
Data IO percentage	The average percentage of physical data read.	Dependent item	azure.db.mssql.data.read.percentage Preprocessing JSON Path: `$.metrics.physical_data_read_percent.average` ⛔️Custom on fail: Discard value
Log IO percentage	The percentage of I/O used for log writes. Not applicable to data warehouses.	Dependent item	azure.db.mssql.log.write.percentage Preprocessing JSON Path: `$.metrics.log_write_percent.average` ⛔️Custom on fail: Discard value
Data space used	Data space used. Not applicable to data warehouses.	Dependent item	azure.db.mssql.storage.used Preprocessing JSON Path: `$.metrics.storage.maximum` ⛔️Custom on fail: Discard value
Connections successful	The number of successful connections.	Dependent item	azure.db.mssql.connections.successful Preprocessing JSON Path: `$.metrics.connection_successful.total` ⛔️Custom on fail: Discard value
Connections failed: System errors	The number of failed connections with system errors.	Dependent item	azure.db.mssql.connections.failed.system Preprocessing JSON Path: `$.metrics.connection_failed.total` ⛔️Custom on fail: Discard value
Connections blocked by firewall	The number of connections blocked by the firewall.	Dependent item	azure.db.mssql.firewall.blocked Preprocessing JSON Path: `$.metrics.blocked_by_firewall.total` ⛔️Custom on fail: Discard value
Deadlocks	The number of deadlocks. Not applicable to data warehouses.	Dependent item	azure.db.mssql.deadlocks Preprocessing JSON Path: `$.metrics.deadlock.total` ⛔️Custom on fail: Discard value
Data space used percent	Data space used in percent. Not applicable to data warehouses or Hyperscale databases.	Dependent item	azure.db.mssql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.maximum` ⛔️Custom on fail: Discard value
In-Memory OLTP storage percent	In-Memory OLTP storage percent. Not applicable to data warehouses.	Dependent item	azure.db.mssql.storage.xtp.percent Preprocessing JSON Path: `$.metrics.xtp_storage_percent.average` ⛔️Custom on fail: Discard value
Workers percentage	The percentage of workers. Not applicable to data warehouses.	Dependent item	azure.db.mssql.workers.percent Preprocessing JSON Path: `$.metrics.workers_percent.average` ⛔️Custom on fail: Discard value
Sessions percentage	The percentage of sessions. Not applicable to data warehouses.	Dependent item	azure.db.mssql.sessions.percent Preprocessing JSON Path: `$.metrics.sessions_percent.average` ⛔️Custom on fail: Discard value
Sessions count	The number of active sessions. Not applicable to Synapse DW Analytics.	Dependent item	azure.db.mssql.sessions.count Preprocessing JSON Path: `$.metrics.sessions_count.average` ⛔️Custom on fail: Discard value
DTU limit	The DTU limit. Applicable to DTU-based databases.	Dependent item	azure.db.mssql.dtu.limit Preprocessing JSON Path: `$.metrics.dtu_limit.average` ⛔️Custom on fail: Discard value
DTU used	The DTU used. Applicable to DTU-based databases.	Dependent item	azure.db.mssql.dtu.used Preprocessing JSON Path: `$.metrics.dtu_used.average` ⛔️Custom on fail: Discard value
SQL instance CPU percent	CPU usage from all user and system workloads. Not applicable to data warehouses.	Dependent item	azure.db.mssql.server.cpu.percent Preprocessing JSON Path: `$.metrics.sql_instance_cpu_percent.maximum` ⛔️Custom on fail: Discard value
SQL instance memory percent	The percentage of memory used by the database engine instance. Not applicable to data warehouses.	Dependent item	azure.db.mssql.server.memory.percent Preprocessing JSON Path: `$.metrics.sql_instance_cpu_percent.maximum` ⛔️Custom on fail: Discard value
Tempdb data file size	The space used in `tempdb` data files, expressed in bytes. Not applicable to data warehouses.	Dependent item	azure.db.mssql.tempdb.data.size Preprocessing JSON Path: `$.metrics.tempdb_data_size.maximum` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Tempdb log file size	The space used in the `tempdb` transaction log file, expressed in bytes. Not applicable to data warehouses.	Dependent item	azure.db.mssql.tempdb.log.size Preprocessing JSON Path: `$.metrics.tempdb_log_size.maximum` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Tempdb log used percent	The percentage of space used in the `tempdb` transaction log file. Not applicable to data warehouses.	Dependent item	azure.db.mssql.tempdb.log.percent Preprocessing JSON Path: `$.metrics.tempdb_log_used_percent.maximum` ⛔️Custom on fail: Discard value
Data space allocated	The allocated data storage. Not applicable to data warehouses.	Dependent item	azure.db.mssql.storage.allocated Preprocessing JSON Path: `$.metrics.allocated_data_storage.average` ⛔️Custom on fail: Discard value
Log backup storage size	The cumulative log backup storage size. Applies to vCore-based and Hyperscale databases.	Dependent item	azure.db.mssql.storage.backup.log.size Preprocessing JSON Path: `$.metrics.log_backup_size_bytes.maximum` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`

Triggers

Name	Description	Expression	Severity
Azure MSSQL DTU: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.data.errors))>0`\|Average
Azure MSSQL DTU: Microsoft SQL database is unavailable	The resource state is unavailable.	`last(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.availability.state)=2`\|High
Azure MSSQL DTU: Microsoft SQL database is degraded	The resource is in a degraded state.	`last(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.availability.state)=1`\|Average
Azure MSSQL DTU: Microsoft SQL database is in unknown state	The resource state is unknown.	`last(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.availability.state)=3`\|Warning
Azure MSSQL DTU: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure MSSQL DTU: High DTU utilization	The DTU utilization is too high. The system might be slow to respond.	`min(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.dtu.percentage,5m)>{$AZURE.DB.DTU.UTIL.CRIT}`\|High
Azure MSSQL DTU: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure MSSQL DTU: Storage space is low	High utilization of the storage space.	`last(/Azure Microsoft SQL DTU Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning	Depends on: Azure MSSQL DTU: Storage space is critically low

Azure Microsoft SQL Database by HTTP

Overview

This template is designed to monitor Microsoft SQL databases by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure SQL databases

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure Microsoft SQL database ID.
{$AZURE.DB.CPU.UTIL.CRIT}	The critical threshold of CPU utilization, expressed in %.	`90`
{$AZURE.DB.MEMORY.UTIL.CRIT}	The critical threshold of memory utilization, expressed in %.	`90`
{$AZURE.DB.STORAGE.PUSED.WARN}	The warning threshold of storage utilization, expressed in %.	`80`
{$AZURE.DB.STORAGE.PUSED.CRIT}	The critical threshold of storage utilization, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.db.mssql.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.db.mssql.data.errors Preprocessing JSON Path: `$.errors` Discard unchanged with heartbeat: `1h`
Availability state	The availability status of the resource.	Dependent item	azure.db.mssql.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.db.mssql.availability.details Preprocessing JSON Path: `$.health.summary` Discard unchanged with heartbeat: `1h`
Percentage CPU	The CPU percent of a host.	Dependent item	azure.db.mssql.cpu.percentage Preprocessing JSON Path: `$.metrics.cpu_percent.average`
Data IO percentage	The percentage of physical data read.	Dependent item	azure.db.mssql.data.read.percentage Preprocessing JSON Path: `$.metrics.physical_data_read_percent.average`
Log IO percentage	The percentage of I/O log. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.log.write.percentage Preprocessing JSON Path: `$.metrics.log_write_percent.average`
Data space used	Data space used. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.storage.used Preprocessing JSON Path: `$.metrics.storage.maximum` ⛔️Custom on fail: Discard value
Connections successful	The count of successful connections.	Dependent item	azure.db.mssql.connections.successful Preprocessing JSON Path: `$.metrics.connection_successful.total`
Connections failed: System errors	The count of failed connections with system errors.	Dependent item	azure.db.mssql.connections.failed.system Preprocessing JSON Path: `$.metrics.connection_failed.total`
Connections blocked by firewall	The count of connections blocked by firewall.	Dependent item	azure.db.mssql.firewall.blocked Preprocessing JSON Path: `$.metrics.blocked_by_firewall.total`
Deadlocks	The count of deadlocks. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.deadlocks Preprocessing JSON Path: `$.metrics.deadlock.total`
Data space used percent	Data space used percent. Not applicable to the data warehouses or Hyperscale databases.	Dependent item	azure.db.mssql.storage.percent Preprocessing JSON Path: `$.metrics.storage_percent.maximum` ⛔️Custom on fail: Discard value
In-Memory OLTP storage percent	In-Memory OLTP storage percent. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.storage.xtp.percent Preprocessing JSON Path: `$.metrics.xtp_storage_percent.average`
Workers percentage	The percentage of workers. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.workers.percent Preprocessing JSON Path: `$.metrics.workers_percent.average`
Sessions percentage	The percentage of sessions. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.sessions.percent Preprocessing JSON Path: `$.metrics.sessions_percent.average`
Sessions count	The number of active sessions. Not applicable to Synapse DW Analytics.	Dependent item	azure.db.mssql.sessions.count Preprocessing JSON Path: `$.metrics.sessions_count.average`
CPU limit	The CPU limit. Applies to the vCore-based databases.	Dependent item	azure.db.mssql.cpu.limit Preprocessing JSON Path: `$.metrics.cpu_limit.average` ⛔️Custom on fail: Discard value
CPU used	The CPU used. Applies to the vCore-based databases.	Dependent item	azure.db.mssql.cpu.used Preprocessing JSON Path: `$.metrics.cpu_used.average` ⛔️Custom on fail: Discard value
SQL Server process core percent	The CPU usage as a percentage of the SQL DB process. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.server.cpu.percent Preprocessing JSON Path: `$.metrics.sqlserver_process_core_percent.maximum` ⛔️Custom on fail: Discard value
SQL Server process memory percent	Memory usage as a percentage of the SQL DB process. Not applicable to data warehouses.	Dependent item	azure.db.mssql.server.memory.percent Preprocessing JSON Path: `$.metrics.sqlserver_process_memory_percent.maximum` ⛔️Custom on fail: Discard value
Tempdb data file size	The space used in `tempdb` data files, expressed in bytes. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.tempdb.data.size Preprocessing JSON Path: `$.metrics.tempdb_data_size.maximum` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Tempdb log file size	The space used in `tempdb` transaction log file, expressed in bytes. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.tempdb.log.size Preprocessing JSON Path: `$.metrics.tempdb_log_size.maximum` ⛔️Custom on fail: Discard value Custom multiplier: `1024`
Tempdb log used percent	The percentage of space used in `tempdb` transaction log file. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.tempdb.log.percent Preprocessing JSON Path: `$.metrics.tempdb_log_used_percent.maximum` ⛔️Custom on fail: Discard value
Data space allocated	The allocated data storage. Not applicable to the data warehouses.	Dependent item	azure.db.mssql.storage.allocated Preprocessing JSON Path: `$.metrics.allocated_data_storage.average` ⛔️Custom on fail: Discard value
Full backup storage size	Cumulative full backup storage size. Applies to the vCore-based databases. Not applicable to the Hyperscale databases.	Dependent item	azure.db.mssql.storage.backup.size Preprocessing JSON Path: `$.metrics.full_backup_size_bytes.maximum` Discard unchanged with heartbeat: `1d`
Differential backup storage size	Cumulative differential backup storage size. Applies to the vCore-based databases. Not applicable to the Hyperscale databases.	Dependent item	azure.db.mssql.storage.backup.diff.size Preprocessing JSON Path: `$.metrics.diff_backup_size_bytes.maximum` Discard unchanged with heartbeat: `1d`
Log backup storage size	Cumulative log backup storage size. Applies to the vCore-based and Hyperscale databases.	Dependent item	azure.db.mssql.storage.backup.log.size Preprocessing JSON Path: `$.metrics.log_backup_size_bytes.maximum` Discard unchanged with heartbeat: `1d`

Triggers

Name	Description	Expression
Azure MSSQL: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.data.errors))>0`\|Average
Azure MSSQL: Microsoft SQL database is unavailable	The resource state is unavailable.	`last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.availability.state)=2`\|High
Azure MSSQL: Microsoft SQL database is degraded	The resource is in a degraded state.	`last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.availability.state)=1`\|Average
Azure MSSQL: Microsoft SQL database is in unknown state	The resource state is unknown.	`last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.availability.state)=3`\|Warning
Azure MSSQL: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT}`\|High
Azure MSSQL: Storage space is critically low	Critical utilization of the storage space.	`last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT}`\|Average
Azure MSSQL: Storage space is low	High utilization of the storage space.	`last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN}`\|Warning

Azure Cosmos DB for MongoDB by HTTP

Overview

This template is designed for the effortless deployment of Azure Cosmos DB for MongoDB monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure Cosmos DB

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure Cosmos DB ID.
{$AZURE.DB.COSMOS.MONGO.AVAILABILITY}	The warning threshold of the Cosmos DB for MongoDB service availability.	`70`

Items

Name	Description	Type	Key and additional info
Get data	The result of API requests is expressed in the JSON.	Script	azure.cosmosdb.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.cosmosdb.data.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `1h`
Total requests	Number of requests per minute.	Dependent item	azure.cosmosdb.total.requests Preprocessing JSON Path: `$.requests.TotalRequests.count` ⛔️Custom on fail: Discard value
Total request units	The request units consumed per minute.	Dependent item	azure.cosmosdb.total.request.units Preprocessing JSON Path: `$.requests.TotalRequestUnits.total` ⛔️Custom on fail: Discard value
Metadata requests	The count of metadata requests. Cosmos DB maintains system metadata collection for each account, which allows you to enumerate collections, databases, etc., and their configurations, free of charge.	Dependent item	azure.cosmosdb.metadata.requests Preprocessing JSON Path: `$.requests.MetadataRequests.count` ⛔️Custom on fail: Discard value
Mongo requests	The number of Mongo requests made.	Dependent item	azure.cosmosdb.mongo.requests Preprocessing JSON Path: `$.requests.MongoRequests.count` ⛔️Custom on fail: Discard value
Mongo request charge	The Mongo request units consumed.	Dependent item	azure.cosmosdb.mongo.requests.charge Preprocessing JSON Path: `$.requests.MongoRequestCharge.total` ⛔️Custom on fail: Discard value
Server side latency	The server side latency.	Dependent item	azure.cosmosdb.server.side.latency Preprocessing JSON Path: `$.requests.ServerSideLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Server side latency, gateway	The server side latency in gateway connection mode.	Dependent item	azure.cosmosdb.server.side.latency.gateway Preprocessing JSON Path: `$.requests.ServerSideLatencyGateway.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Server side latency, direct	The server side latency in direct connection mode.	Dependent item	azure.cosmosdb.server.side.latency.direct Preprocessing JSON Path: `$.requests.ServerSideLatencyDirect.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Replication latency, P99	The P99 replication latency across source and target regions for geo-enabled account.	Dependent item	azure.cosmosdb.replication.latency Preprocessing JSON Path: `$.requests.ReplicationLatency.average` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Service availability	The account requests availability at one hour granularity.	Dependent item	azure.cosmosdb.service.availability Preprocessing JSON Path: `$.availability.ServiceAvailability.average` ⛔️Custom on fail: Discard value
Data usage	The total data usage.	Dependent item	azure.cosmosdb.data.usage Preprocessing JSON Path: `$.requests.DataUsage.total` ⛔️Custom on fail: Discard value
Index usage	The total index usage.	Dependent item	azure.cosmosdb.index.usage Preprocessing JSON Path: `$.requests.IndexUsage.total` ⛔️Custom on fail: Discard value
Document quota	The total storage quota.	Dependent item	azure.cosmosdb.document.quota Preprocessing JSON Path: `$.requests.DocumentQuota.total` ⛔️Custom on fail: Discard value
Document count	The total document count.	Dependent item	azure.cosmosdb.document.count Preprocessing JSON Path: `$.requests.DocumentCount.total` ⛔️Custom on fail: Discard value
Normalized RU consumption	The max RU consumption percentage per minute.	Dependent item	azure.cosmosdb.normalized.ru.consumption Preprocessing JSON Path: `$.requests.NormalizedRUConsumption.maximum` ⛔️Custom on fail: Discard value
Physical partition throughput	The physical partition throughput.	Dependent item	azure.cosmosdb.physical.partition.throughput Preprocessing JSON Path: `$.requests.PhysicalPartitionThroughputInfo.maximum` ⛔️Custom on fail: Discard value
Autoscale max throughput	The autoscale max throughput.	Dependent item	azure.cosmosdb.autoscale.max.throughput Preprocessing JSON Path: `$.requests.AutoscaleMaxThroughput.maximum` ⛔️Custom on fail: Discard value
Provisioned throughput	The provisioned throughput.	Dependent item	azure.cosmosdb.provisioned.throughput Preprocessing JSON Path: `$.requests.ProvisionedThroughput.maximum` ⛔️Custom on fail: Discard value
Physical partition size	The physical partition size in bytes.	Dependent item	azure.cosmosdb.physical.partition.size Preprocessing JSON Path: `$.requests.PhysicalPartitionSizeInfo.maximum` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Azure Cosmos DB: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Cosmos DB for MongoDB by HTTP/azure.cosmosdb.data.errors))>0`\|Average
Azure Cosmos DB: Cosmos DB for MongoDB account: Availability is low		`(min(/Azure Cosmos DB for MongoDB by HTTP/azure.cosmosdb.service.availability,#3))<{$AZURE.DB.COSMOS.MONGO.AVAILABILITY}`\|Warning

Azure Cost Management by HTTP

Overview

This template is designed to monitor Microsoft Cost Management by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Microsoft Azure

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`60s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.BILLING.MONTH}	Months to get historical data from Azure Cost Management API, no more than 11 (plus current month). The time period for pulling the data cannot exceed 1 year.	`11`
{$AZURE.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable services by name.	`.*`
{$AZURE.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered services by name.	`CHANGE_IF_NEEDED`
{$AZURE.LLD.FILTER.RESOURCE.LOCATION.MATCHES}	Filter of discoverable locations by name.	`.*`
{$AZURE.LLD.FILTER.RESOURCE.LOCATION.NOT_MATCHES}	Filter to exclude discovered locations by name.	`CHANGE_IF_NEEDED`
{$AZURE.LLD.FILTER.RESOURCE.GROUP.MATCHES}	Filter of discoverable resource groups by name.	`.*`
{$AZURE.LLD.FILTER.RESOURCE.GROUP.NOT_MATCHES}	Filter to exclude discovered resource groups by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get monthly costs	The result of API requests is expressed in the JSON.	Script	azure.get.monthly.costs
Get daily costs	The result of API requests is expressed in the JSON.	Script	azure.get.daily.costs
Azure Cost: Get monthly costs errors	A list of errors from API requests.	Dependent item	azure.get.monthly.costs.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `1h`
Azure Cost: Get daily costs errors	A list of errors from API requests.	Dependent item	azure.get.daily.costs.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Azure Cost: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Cost Management by HTTP/azure.get.monthly.costs.errors))>0`\|Average
Azure Cost: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Cost Management by HTTP/azure.get.daily.costs.errors))>0`\|Average

LLD rule Azure daily costs by services discovery

Name Description Type Key and additional info

Azure daily costs by services discovery

Discovery of daily costs by services.

Dependent item

azure.daily.services.costs.discovery

Preprocessing

JSON Path: $.data
⛔️Custom on fail: Discard value

Item prototypes for Azure daily costs by services discovery

Name Description Type Key and additional info

Service ["{#AZURE.SERVICE.NAME}"]: Meter ["{#AZURE.BILLING.METER}"]: Subcategory ["{#AZURE.BILLING.METER.SUBCATEGORY}"] daily cost

The daily cost by service {#AZURE.SERVICE.NAME}, meter {#AZURE.BILLING.METER}, subcategory {#AZURE.BILLING.METER.SUBCATEGORY}.

Dependent item

azure.daily.cost["{#AZURE.SERVICE.NAME}", "{#AZURE.BILLING.METER}", "{#AZURE.BILLING.METER.SUBCATEGORY}","{#AZURE.RESOURCE.GROUP}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule Azure monthly costs by services discovery

Name Description Type Key and additional info

Azure monthly costs by services discovery

Discovery of monthly costs by services.

Dependent item

azure.monthly.services.costs.discovery

Preprocessing

JSON Path: $.serviceCost.data
⛔️Custom on fail: Discard value

Item prototypes for Azure monthly costs by services discovery

Name Description Type Key and additional info

Service ["{#AZURE.SERVICE.NAME}"]: Month ["{#AZURE.BILLING.MONTH}"] cost

The monthly cost by service {#AZURE.SERVICE.NAME}.

Dependent item

azure.monthly.service.cost["{#AZURE.SERVICE.NAME}", "{#AZURE.BILLING.MONTH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule Azure monthly costs by location discovery

Name Description Type Key and additional info

Azure monthly costs by location discovery

Discovery of monthly costs by location.

Dependent item

azure.monthly.location.costs.discovery

Preprocessing

JSON Path: $.resourceLocationCost.data
⛔️Custom on fail: Discard value

Item prototypes for Azure monthly costs by location discovery

Name Description Type Key and additional info

Location: ["{#AZURE.RESOURCE.LOCATION}"]: Month ["{#AZURE.BILLING.MONTH}"] cost

The monthly cost by location {#AZURE.RESOURCE.LOCATION}.

Dependent item

azure.monthly.location.cost["{#AZURE.RESOURCE.LOCATION}", "{#AZURE.BILLING.MONTH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule Azure monthly costs by resource group discovery

Name Description Type Key and additional info

Azure monthly costs by resource group discovery

Discovery of monthly costs by resource group.

Dependent item

azure.monthly.resource.group.costs.discovery

Preprocessing

JSON Path: $.resourceGroupCost.data
⛔️Custom on fail: Discard value

Item prototypes for Azure monthly costs by resource group discovery

Name Description Type Key and additional info

Resource group: ["{#AZURE.RESOURCE.GROUP}"]: Month ["{#AZURE.BILLING.MONTH}"] cost

The monthly cost by resource group {#AZURE.RESOURCE.GROUP}.

Dependent item

azure.monthly.resource.group.cost["{#AZURE.RESOURCE.GROUP}", "{#AZURE.BILLING.MONTH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule Azure monthly costs discovery

Name Description Type Key and additional info

Azure monthly costs discovery

Discovery of monthly costs.

Dependent item

azure.monthly.costs.discovery

Preprocessing

JSON Path: $.monthCost.data
⛔️Custom on fail: Discard value

Item prototypes for Azure monthly costs discovery

Name Description Type Key and additional info

Month ["{#AZURE.BILLING.MONTH}"] cost

The monthly cost.

Dependent item

azure.monthly.cost["{#AZURE.BILLING.MONTH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Azure SQL Managed Instance by HTTP

Overview

This template is designed to monitor Microsoft Azure SQL Managed Instance by HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Azure SQL Managed Instance

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure SQL managed instance ID.
{$AZURE.SQL.INST.SPACE.CRIT}	Storage space critical threshold, expressed in %.	`90`
{$AZURE.SQL.INST.SPACE.WARN}	Storage space warning threshold, expressed in %.	`80`
{$AZURE.SQL.INST.CPU.WARN}	CPU utilization warning threshold, expressed in %.	`80`
{$AZURE.SQL.INST.CPU.CRIT}	CPU utilization critical threshold, expressed in %.	`90`

Items

Name	Description	Type	Key and additional info
Get data	Gathers data of the Azure SQL managed instance.	Script	azure.sql_inst.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.sql_inst.data.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Availability state	The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes.	Dependent item	azure.sql_inst.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.sql_inst.availability.details Preprocessing JSON Path: `$.health.summary` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Average CPU utilization	Average CPU utilization of the instance.	Dependent item	azure.sql_inst.cpu Preprocessing JSON Path: `$.metrics.avg_cpu_percent.average` ⛔️Custom on fail: Discard value
IO bytes read	Bytes read by the managed instance.	Dependent item	azure.sql_inst.bytes.read Preprocessing JSON Path: `$.metrics.io_bytes_read.average` ⛔️Custom on fail: Discard value
IO bytes write	Bytes written by the managed instance.	Dependent item	azure.sql_inst.bytes.write Preprocessing JSON Path: `$.metrics.io_bytes_written.average` ⛔️Custom on fail: Discard value
IO request count	IO request count by the managed instance.	Dependent item	azure.sql_inst.requests Preprocessing JSON Path: `$.metrics.io_requests.average` ⛔️Custom on fail: Discard value
Storage space reserved	Storage space reserved by the managed instance.	Dependent item	azure.sql_inst.storage.reserved Preprocessing JSON Path: `$.metrics.reserved_storage_mb.average` ⛔️Custom on fail: Discard value Custom multiplier: `1048576` Discard unchanged with heartbeat: `1h`
Storage space used	Storage space used by the managed instance.	Dependent item	azure.sql_inst.storage.used Preprocessing JSON Path: `$.metrics.storage_space_used_mb.average` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
Storage space utilization	Managed instance storage space utilization, in percent.	Calculated	azure.sql_inst.storage.utilization
Virtual core count	Virtual core count available to the managed instance.	Dependent item	azure.sql_inst.core.count Preprocessing JSON Path: `$.metrics.virtual_core_count.average` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Instance state	State of the managed instance.	Dependent item	azure.sql_inst.state Preprocessing JSON Path: `$.instance.properties.state` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Instance collation	Collation of the managed instance.	Dependent item	azure.sql_inst.collation Preprocessing JSON Path: `$.instance.properties.collation` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1d`
Instance provisioning state	Provisioning state of the managed instance.	Dependent item	azure.sql_inst.provision Preprocessing JSON Path: `$.instance.properties.provisioningState` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity
Azure SQL instance: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.data.errors))>0`\|Average
Azure SQL instance: Azure SQL managed instance is unavailable	The resource state is unavailable.	`last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.availability.state)=2`\|High
Azure SQL instance: Azure SQL managed instance is degraded	The resource is in a degraded state.	`last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.availability.state)=1`\|Average
Azure SQL instance: Azure SQL managed instance is in unknown state	The resource state is unknown.	`last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.availability.state)=3`\|Warning
Azure SQL instance: Critically high CPU utilization	CPU utilization is critically high.	`min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.cpu, 10m)>={$AZURE.SQL.INST.CPU.CRIT}`\|Average	Depends on: Azure SQL instance: High CPU utilization
Azure SQL instance: High CPU utilization	CPU utilization is too high.	`min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.cpu, 10m)>={$AZURE.SQL.INST.CPU.WARN}`\|Warning
Azure SQL instance: Storage free space is critically low	The free storage space has been less than `{$AZURE.SQL.INST.SPACE.CRIT}`% for 5m.	`min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.storage.utilization,5m)>{$AZURE.SQL.INST.SPACE.CRIT}`\|Average	Manual close: Yes Depends on: Azure SQL instance: Storage free space is low
Azure SQL instance: Storage free space is low	The free storage space has been less than `{$AZURE.SQL.INST.SPACE.WARN}`% for 5m.	`min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.storage.utilization,5m)>{$AZURE.SQL.INST.SPACE.WARN}`\|Warning	Manual close: Yes
Azure SQL instance: Instance state has changed	Azure SQL managed instance state has changed.	`change(/Azure SQL Managed Instance by HTTP/azure.sql_inst.state)=1`\|Warning
Azure SQL instance: Instance collation has changed	Azure SQL managed instance collation has changed.	`change(/Azure SQL Managed Instance by HTTP/azure.sql_inst.collation)=1`\|Average
Azure SQL instance: Instance provisioning state has changed	Azure SQL managed instance provisioning state has changed.	`change(/Azure SQL Managed Instance by HTTP/azure.sql_inst.provision)<>0`\|Warning

Azure Backup Jobs by HTTP

Overview

This template is designed to monitor Microsoft Azure Backup Jobs via HTTP. It works without any external scripts and uses the script item.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Azure Recovery Services vaults

Configuration

Setup

Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.

az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>

Link the template to a host.
Configure the macros: {$AZURE.APP.ID}, {$AZURE.PASSWORD}, {$AZURE.TENANT.ID}, {$AZURE.SUBSCRIPTION.ID}, and {$AZURE.RESOURCE.ID}.

Macros used

Name	Description	Default
{$AZURE.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.
{$AZURE.APP.ID}	The App ID of Microsoft Azure.
{$AZURE.PASSWORD}	Microsoft Azure password.
{$AZURE.DATA.TIMEOUT}	API response timeout.	`15s`
{$AZURE.TENANT.ID}	Microsoft Azure tenant ID.
{$AZURE.SUBSCRIPTION.ID}	Microsoft Azure subscription ID.
{$AZURE.RESOURCE.ID}	Microsoft Azure vault resource ID.
{$AZURE.JOBS.FRIENDLY.NAME.MATCHES}	Set the regex string to include backup jobs based on `entityFriendlyName`.	`.*`
{$AZURE.JOBS.FRIENDLY.NAME.NOT.MATCHES}	Set the regex string to exclude backup jobs based on `entityFriendlyName`.	`CHANGE_IF_NEEDED`
{$AZURE.JOBS.STATUS.MATCHES}	Set the regex string to include backup jobs based on status.	`.*`
{$AZURE.JOBS.STATUS.NOT.MATCHES}	Set the regex string to exclude backup jobs based on status.	`CHANGE_IF_NEEDED`
{$AZURE.JOBS.OPERATION.MATCHES}	Set the regex string to include backup jobs based on operation type.	`.*`
{$AZURE.JOBS.OPERATION.NOT.MATCHES}	Set the regex string to exclude backup jobs based on operation type.	`CHANGE_IF_NEEDED`
{$AZURE.VAULT.PERIOD}	The number of days over which to retrieve backup jobs.	`7`

Items

Name	Description	Type	Key and additional info
Get data	Gathers data of the Azure vault.	Script	azure.vault.data.get
Get errors	A list of errors from API requests.	Dependent item	azure.vault.data.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to: ``</p></li><li><p>Discard unchanged with heartbeat:`1h`
Availability state	The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes.	Dependent item	azure.vault.availability.state Preprocessing JSON Path: `$.health.availabilityState` ⛔️Custom on fail: Set value to: `3` Replace: `Available -> 0` Replace: `Degraded -> 1` Replace: `Unavailable -> 2` Replace: `Unknown -> 3` In range: `0 -> 3` ⛔️Custom on fail: Set value to: `3` Discard unchanged with heartbeat: `1h`
Availability status detailed	The summary description of the availability status.	Dependent item	azure.vault.availability.details Preprocessing JSON Path: `$.health.summary` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Jobs: Total	The number of jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.total Preprocessing JSON Path: `$.jobs.length()`
Jobs: Completed	The number of completed jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.completed Preprocessing JSON Path: `$.jobs[?(@.properties.status == "Completed")].length()`
Jobs: In progress	The number of jobs in progress over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.in_progress Preprocessing JSON Path: `$.jobs[?(@.properties.status == "InProgress")].length()`
Jobs: Failed	The number of failed jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.failed Preprocessing JSON Path: `$.jobs[?(@.properties.status == "Failed")].length()`
Jobs: Completed with warnings	The number of jobs completed with warnings over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.with_warning Preprocessing JSON Path: `The text is too long. Please see the template.`
Jobs: Cancelled	The number of cancelled jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.cancelled Preprocessing JSON Path: `$.jobs[?(@.properties.status == "Cancelled")].length()`
Jobs: Backup	The number of backup jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.backup Preprocessing JSON Path: `$.jobs[?(@.properties.operation == "Backup")].length()`
Jobs: Restore	The number of restore jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.restore Preprocessing JSON Path: `$.jobs[?(@.properties.operation == "Restore")].length()`
Jobs: Deleting backup data	The number of backup data deletion jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.backup.delete Preprocessing JSON Path: `The text is too long. Please see the template.`
Jobs: Configuring backup	The number of backup configuration jobs over the period of `{$AZURE.VAULT.PERIOD}` day(s).	Dependent item	azure.vault.jobs.backup.config Preprocessing JSON Path: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
Azure backup jobs: There are errors in requests to API	Zabbix has received errors in response to API requests.	`length(last(/Azure Backup Jobs by HTTP/azure.vault.data.errors))>0`\|Average
Azure backup jobs: Azure vault is unavailable	The resource state is unavailable.	`last(/Azure Backup Jobs by HTTP/azure.vault.availability.state)=2`\|High
Azure backup jobs: Azure vault is degraded	The resource is in a degraded state.	`last(/Azure Backup Jobs by HTTP/azure.vault.availability.state)=1`\|Average
Azure backup jobs: Azure vault is in unknown state	The resource state is unknown.	`last(/Azure Backup Jobs by HTTP/azure.vault.availability.state)=3`\|Warning
Azure backup jobs: Restore job has appeared	New restore job has appeared.	`change(/Azure Backup Jobs by HTTP/azure.vault.jobs.restore)>0`\|Average	Manual close: Yes
Azure backup jobs: Backup data deletion job has appeared	New backup data deletion job has appeared.	`change(/Azure Backup Jobs by HTTP/azure.vault.jobs.backup.delete)>0`\|Warning	Manual close: Yes
Azure backup jobs: Backup configuration job has appeared	New backup configuration job has appeared.	`change(/Azure Backup Jobs by HTTP/azure.vault.jobs.backup.config)>0`\|Info	Manual close: Yes

LLD rule Azure backup job discovery

Name Description Type Key and additional info

Azure backup job discovery

List of backup jobs in the vault.

Dependent item

azure.vault.job.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Azure backup job discovery

Name Description Type Key and additional info

Job status [{#JOB.FRIENDLY.NAME}, {#JOB.NAME}]

Job status.

Possible values:

0 - Unknown

1 - In progress

2 - Queued

3 - Completed

4 - Completed with warnings

5 - Failed

6 - Cancelled

7 - Expired

Dependent item

azure.vault.job.status[{#JOB.NAME}]

Preprocessing

JSON Path: $.jobs[?(@.name == '{#JOB.NAME}')].properties.status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Azure backup job discovery

Name	Description	Expression	Severity
Azure backup jobs: Job failed [{#JOB.NAME}]	Job has received "Failed" status.	`last(/Azure Backup Jobs by HTTP/azure.vault.job.status[{#JOB.NAME}])=5`\|High	Manual close: Yes
Azure backup jobs: Job cancelled [{#JOB.NAME}]	Job has received "Cancelled" status.	`last(/Azure Backup Jobs by HTTP/azure.vault.job.status[{#JOB.NAME}])=6`\|Average	Manual close: Yes
Azure backup jobs: Job completed with warnings [{#JOB.NAME}]	Job has received "Completed with warnings" status.	`last(/Azure Backup Jobs by HTTP/azure.vault.job.status[{#JOB.NAME}])=4`\|Warning	Manual close: Yes
Azure backup jobs: Job expired [{#JOB.NAME}]	Job has received "Expired" status.	`last(/Azure Backup Jobs by HTTP/azure.vault.job.status[{#JOB.NAME}])=7`\|Average	Manual close: Yes
Azure backup jobs: Job status unknown [{#JOB.NAME}]	Job has received "Unknown" status.	`last(/Azure Backup Jobs by HTTP/azure.vault.job.status[{#JOB.NAME}])=0`\|Average	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

cloud

cloud_aws_http

View README Download JSON

AWS by HTTP

Overview

This template is designed for the effortless deployment of AWS monitoring by Zabbix via HTTP and doesn't require any external scripts.

Currently, the template supports the discovery of EC2 and RDS instances, ECS clusters, ELB, Lambda and S3 buckets.

Included Monitoring Templates

AWS EC2 by HTTP
AWS ECS Cluster by HTTP
AWS ECS Serverless Cluster by HTTP
AWS ELB Application Load Balancer by HTTP
AWS ELB Network Load Balancer by HTTP
AWS Lambda by HTTP
AWS RDS instance by HTTP
AWS S3 bucket by HTTP
AWS Cost Explorer by HTTP

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS by HTTP

Configuration

Setup

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect metrics.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, add the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

To gather Request metrics, enable Requests metrics on your Amazon S3 buckets from the AWS console.

Set the macros: {$AWS.AUTH_TYPE}. Possible values: access_key, assume_role, role_base.

For more information about managing access keys, see official documentation.

Refer to the Macros section for a list of macros used for LLD filters.

Additional information about the metrics and used API methods:

Macros used

Name	Description	Default
{$AWS.DATA.TIMEOUT}	A response timeout for an API.	`60s`
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.REQUEST.REGION}	Region used in GET request `ListBuckets`.	`us-east-1`
{$AWS.DESCRIBE.REGION}	Region used in POST request `DescribeRegions`.	`us-east-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.EC2.LLD.FILTER.NAME.MATCHES}	Filter of discoverable EC2 instances by namespace.	`.*`
{$AWS.EC2.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered EC2 instances by namespace.	`CHANGE_IF_NEEDED`
{$AWS.EC2.LLD.FILTER.REGION.MATCHES}	Filter of discoverable EC2 instances by region.	`.*`
{$AWS.EC2.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered EC2 instances by region.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.NAME.MATCHES}	Filter of discoverable ECS clusters by name.	`.*`
{$AWS.ECS.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered ECS clusters by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.STATUS.MATCHES}	Filter of discoverable ECS clusters by status.	`ACTIVE`
{$AWS.ECS.LLD.FILTER.STATUS.NOT_MATCHES}	Filter to exclude discovered ECS clusters by status.	`CHANGE_IF_NEEDED`
{$AWS.S3.LLD.FILTER.NAME.MATCHES}	Filter of discoverable S3 buckets by namespace.	`.*`
{$AWS.S3.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered S3 buckets by namespace.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.NAME.MATCHES}	Filter of discoverable RDS instances by namespace.	`.*`
{$AWS.RDS.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered RDS instances by namespace.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.REGION.MATCHES}	Filter of discoverable RDS instances by region.	`.*`
{$AWS.RDS.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered RDS instances by region.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.REGION.MATCHES}	Filter of discoverable ECS clusters by region.	`.*`
{$AWS.ECS.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered ECS clusters by region.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.NAME.MATCHES}	Filter of discoverable ELB load balancers by name.	`.*`
{$AWS.ELB.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered ELB load balancers by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.REGION.MATCHES}	Filter of discoverable ELB load balancers by region.	`.*`
{$AWS.ELB.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered ELB load balancers by region.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.STATE.MATCHES}	Filter of discoverable ELB load balancers by status.	`active`
{$AWS.ELB.LLD.FILTER.STATE.NOT_MATCHES}	Filter to exclude discovered ELB load balancer by status.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.REGION.MATCHES}	Filter of discoverable Lambda functions by region.	`.*`
{$AWS.LAMBDA.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered Lambda functions by region.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.RUNTIME.MATCHES}	Filter of discoverable Lambda functions by Runtime.	`.*`
{$AWS.LAMBDA.LLD.FILTER.RUNTIME.NOT_MATCHES}	Filter to exclude discovered Lambda functions by Runtime.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.NAME.MATCHES}	Filter of discoverable Lambda functions by name.	`.*`
{$AWS.LAMBDA.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered Lambda functions by name.	`CHANGE_IF_NEEDED`

LLD rule S3 buckets discovery

Name	Description	Type	Key and additional info
S3 buckets discovery	Get S3 bucket instances.	Script	aws.s3.discovery

LLD rule EC2 instances discovery

Name	Description	Type	Key and additional info
EC2 instances discovery	Get EC2 instances.	Script	aws.ec2.discovery

LLD rule RDS instances discovery

Name	Description	Type	Key and additional info
RDS instances discovery	Get RDS instances.	Script	aws.rds.discovery

LLD rule ECS clusters discovery

Name	Description	Type	Key and additional info
ECS clusters discovery	Get ECS clusters.	Script	aws.ecs.discovery

LLD rule ELB load balancers discovery

Name	Description	Type	Key and additional info
ELB load balancers discovery	Get ELB load balancers.	Script	aws.elb.discovery

LLD rule Lambda discovery

Name	Description	Type	Key and additional info
Lambda discovery	Get Lambda functions.	Script	aws.lambda.discovery

AWS EC2 by HTTP

Overview

The template to monitor AWS EC2 and attached AWS EBS volumes by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS EC2 by HTTP

Configuration

Setup

The template get AWS EC2 and attached AWS EBS volumes metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon EC2 metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "ec2:DescribeVolumes",
              "cloudwatch:"DescribeAlarms",
              "cloudwatch:GetMetricData"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "cloudwatch:"DescribeAlarms",
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "cloudwatch:"DescribeAlarms",
                "cloudwatch:GetMetricData"
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

For more information, see the EC2 policies on the AWS website.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.EC2.INSTANCE.ID}.

For more information about manage access keys, see official documentation

Also, see the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	Amazon EC2 Region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.EC2.INSTANCE.ID}	EC2 instance ID.
{$AWS.EC2.LLD.FILTER.VOLUME_TYPE.MATCHES}	Filter of discoverable volumes by type.	`.*`
{$AWS.EC2.LLD.FILTER.VOLUMETYPE.NOTMATCHES}	Filter to exclude discovered volumes by type.	`CHANGE_IF_NEEDED`
{$AWS.EC2.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.EC2.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.EC2.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.EC2.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.EC2.CPU.UTIL.WARN.MAX}	The warning threshold of the CPU utilization expressed in %.	`85`
{$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN}	Minimum number of free earned CPU credits for trigger expression.	`50`
{$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}	Maximum number of spent CPU Surplus credits for trigger expression.	`100`
{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of I/O credits remaining for trigger expression.	`20`
{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`
{$AWS.EBS.BURST.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get instance metrics. Full metrics list related to EC2: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewingmetricswith_cloudwatch.html	Script	aws.ec2.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get instance alarms data	DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.ec2.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get volumes data	Get volumes attached to instance. DescribeVolumes API method: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeVolumes.html	Script	aws.ec2.get_volumes Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Check result of the instance metric data has been got correctly.	Dependent item	aws.ec2.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check result of the alarm data has been got correctly.	Dependent item	aws.ec2.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get volumes info check	Check result of the volume information has been got correctly.	Dependent item	aws.ec2.volumes.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Credit CPU: Balance	The number of earned CPU credits that an instance has accrued since it was launched or started. For T2 Standard, the CPUCreditBalance also includes the number of launch credits that have been accrued. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. The credit balance has a maximum limit, determined by the instance size. After the limit is reached, any new credits that are earned are discarded. For T2 Standard, launch credits do not count towards the limit. The credits in the CPUCreditBalance are available for the instance to spend to burst beyond its baseline CPU utilization. When an instance is running, credits in the CPUCreditBalance do not expire. When a T3 or T3a instance stops, the CPUCreditBalance value persists for seven days. Thereafter, all accrued credits are lost. When a T2 instance stops, the CPUCreditBalance value does not persist, and all accrued credits are lost.	Dependent item	aws.ec2.cpu.credit_balance Preprocessing JSON Path: `$.[?(@.Label == "CPUCreditBalance")].Values.first().first()` ⛔️Custom on fail: Discard value
Credit CPU: Usage	The number of CPU credits spent by the instance for CPU utilization. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes).	Dependent item	aws.ec2.cpu.credit_usage Preprocessing JSON Path: `$.[?(@.Label == "CPUCreditUsage")].Values.first().first()` ⛔️Custom on fail: Discard value
Credit CPU: Surplus balance	The number of surplus credits that have been spent by an unlimited instance when its CPUCreditBalance value is zero. The CPUSurplusCreditBalance value is paid down by earned CPU credits. If the number of surplus credits exceeds the maximum number of credits that the instance can earn in a 24-hour period, the spent surplus credits above the maximum incur an additional charge.	Dependent item	aws.ec2.cpu.surpluscreditbalance Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Credit CPU: Surplus charged	The number of spent surplus credits that are not paid down by earned CPU credits, and which thus incur an additional charge. Spent surplus credits are charged when any of the following occurs: - The spent surplus credits exceed the maximum number of credits that the instance can earn in a 24-hour period. Spent surplus credits above the maximum are charged at the end of the hour; - The instance is stopped or terminated; - The instance is switched from unlimited to standard.	Dependent item	aws.ec2.cpu.surpluscreditcharged Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
CPU: Utilization	The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application on a selected instance. Depending on the instance type, tools in your operating system can show a lower percentage than CloudWatch when the instance is not allocated a full processor core.	Dependent item	aws.ec2.cpu_utilization Preprocessing JSON Path: `$.[?(@.Label == "CPUUtilization")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Read bytes, rate	Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.disk.read_bytes.rate Preprocessing JSON Path: `$.[?(@.Label == "DiskReadBytes")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Disk: Read, rate	Completed read operations from all instance store volumes available to the instance in a specified period of time. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.disk.read_ops.rate Preprocessing JSON Path: `$.[?(@.Label == "DiskReadOps")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Disk: Write bytes, rate	Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.diskwritebytes.rate Preprocessing JSON Path: `$.[?(@.Label == "DiskWriteBytes")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Disk: Write ops, rate	Completed write operations to all instance store volumes available to the instance in a specified period of time. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.diskwriteops.rate Preprocessing JSON Path: `$.[?(@.Label == "DiskWriteOps")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Byte balance	Percentage of throughput credits remaining in the burst bucket for Nitro-based instances.	Dependent item	aws.ec2.ebs.byte_balance Preprocessing JSON Path: `$.[?(@.Label == "EBSByteBalance%")].Values.first().first()` ⛔️Custom on fail: Discard value
EBS: IO balance	Percentage of I/O credits remaining in the burst bucket for Nitro-based instances.	Dependent item	aws.ec2.ebs.io_balance Preprocessing JSON Path: `$.[?(@.Label == "EBSIOBalance%")].Values.first().first()` ⛔️Custom on fail: Discard value
EBS: Read bytes, rate	Bytes read from all EBS volumes attached to the instance for Nitro-based instances.	Dependent item	aws.ec2.ebs.read_bytes.rate Preprocessing JSON Path: `$.[?(@.Label == "EBSReadBytes")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Read, rate	Completed read operations from all Amazon EBS volumes attached to the instance for Nitro-based instances.	Dependent item	aws.ec2.ebs.read_ops.rate Preprocessing JSON Path: `$.[?(@.Label == "EBSReadOps")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Write bytes, rate	Bytes written to all EBS volumes attached to the instance for Nitro-based instances.	Dependent item	aws.ec2.ebs.write_bytes.rate Preprocessing JSON Path: `$.[?(@.Label == "EBSWriteBytes")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Write, rate	Completed write operations to all EBS volumes attached to the instance in a specified period of time.	Dependent item	aws.ec2.ebs.write_ops.rate Preprocessing JSON Path: `$.[?(@.Label == "EBSWriteOps")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Metadata: No token	The number of times the instance metadata service was successfully accessed using a method that does not use a token. This metric is used to determine if there are any processes accessing instance metadata that are using Instance Metadata Service Version 1, which does not use a token. If all requests use token-backed sessions, i.e., Instance Metadata Service Version 2, the value is 0.	Dependent item	aws.ec2.metadata.no_token Preprocessing JSON Path: `$.[?(@.Label == "MetadataNoToken")].Values.first().first()` ⛔️Custom on fail: Discard value
Network: Bytes in, rate	The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.	Dependent item	aws.ec2.network_in.rate Preprocessing JSON Path: `$.[?(@.Label == "NetworkIn")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Network: Bytes out, rate	The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.	Dependent item	aws.ec2.network_out.rate Preprocessing JSON Path: `$.[?(@.Label == "NetworkOut")].Values.first().first()` JavaScript: `The text is too long. Please see the template.`
Network: Packets in, rate	The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.	Dependent item	aws.ec2.packets_in.rate Preprocessing JSON Path: `$.[?(@.Label == "NetworkPacketsIn")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Network: Packets out, rate	The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.	Dependent item	aws.ec2.packets_out.rate Preprocessing JSON Path: `$.[?(@.Label == "NetworkPacketsOut")].Values.first().first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Status: Check failed	Reports whether the instance has passed both the instance status check and the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed).	Dependent item	aws.ec2.statuscheckfailed Preprocessing JSON Path: `$.[?(@.Label == "StatusCheckFailed")].Values.first().first()` ⛔️Custom on fail: Discard value
Status: Check failed, instance	Reports whether the instance has passed the instance status check in the last minute. This metric can be either 0 (passed) or 1 (failed).	Dependent item	aws.ec2.statuscheckfailed_instance Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Status: Check failed, system	Reports whether the instance has passed the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed).	Dependent item	aws.ec2.statuscheckfailed_system Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
AWS EC2: Failed to get metrics data	Failed to get CloudWatch metrics for EC2.	`length(last(/AWS EC2 by HTTP/aws.ec2.metrics.check))>0`\|Warning
AWS EC2: Failed to get alarms data	Failed to get CloudWatch alarms for EC2.	`length(last(/AWS EC2 by HTTP/aws.ec2.alarms.check))>0`\|Warning
AWS EC2: Failed to get volumes info	Failed to get CloudWatch volumes for EC2.	`length(last(/AWS EC2 by HTTP/aws.ec2.volumes.check))>0`\|Warning
AWS EC2: Instance CPU Credit balance is too low	The number of earned CPU credits has been less than {$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN} in the last 5 minutes.	`max(/AWS EC2 by HTTP/aws.ec2.cpu.credit_balance,5m)<{$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN}`\|Warning
AWS EC2: Instance has spent too many CPU surplus credits	The number of spent surplus credits that are not paid down and which thus incur an additional charge is over {$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}.	`last(/AWS EC2 by HTTP/aws.ec2.cpu.surplus_credit_charged)>{$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}`\|Warning
AWS EC2: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS EC2 by HTTP/aws.ec2.cpu_utilization,15m)>{$AWS.EC2.CPU.UTIL.WARN.MAX}`\|Warning
AWS EC2: Byte Credit balance is too low		`max(/AWS EC2 by HTTP/aws.ec2.ebs.byte_balance,5m)<{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}`\|Warning
AWS EC2: I/O Credit balance is too low		`max(/AWS EC2 by HTTP/aws.ec2.ebs.io_balance,5m)<{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}`\|Warning
AWS EC2: Instance status check failed	These checks detect problems that require your involvement to repair. The following are examples of problems that can cause instance status checks to fail: Failed system status checks Incorrect networking or startup configuration Exhausted memory Corrupted file system Incompatible kernel	`last(/AWS EC2 by HTTP/aws.ec2.status_check_failed_instance)=1`\|Average
AWS EC2: System status check failed	These checks detect underlying problems with your instance that require AWS involvement to repair. The following are examples of problems that can cause system status checks to fail: Loss of network connectivity Loss of system power Software issues on the physical host Hardware issues on the physical host that impact network reachability	`last(/AWS EC2 by HTTP/aws.ec2.status_check_failed_system)=1`\|Average

LLD rule Instance Alarms discovery

Name Description Type Key and additional info

Instance Alarms discovery

Discovery instance and attached EBS volumes alarms.

Dependent item

aws.ec2.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item

aws.ec2.alarm.getmetrics["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()
⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ec2.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.StateReason
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM).

Alarm description:

{#ALARMDESCRIPTION}

Dependent item

aws.ec2.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.StateValue
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Instance Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS EC2: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS EC2 by HTTP/aws.ec2.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS EC2 by HTTP/aws.ec2.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS EC2: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS EC2 by HTTP/aws.ec2.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Instance Volumes discovery

Name Description Type Key and additional info

Instance Volumes discovery

Discovery attached EBS volumes.

Dependent item

aws.ec2.volumes.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Volumes discovery

Name	Description	Type	Key and additional info
[{#VOLUME_ID}]: Get volume data	Get data of the "{#VOLUME_ID}" volume.	Dependent item	aws.ec2.ebs.getvolume["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.volumeId == "{#VOLUME_ID}")].first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Create time	The time stamp when volume creation was initiated.	Dependent item	aws.ec2.ebs.createtime["{#VOLUMEID}"] Preprocessing JSON Path: `$.createTime` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Status	The state of the volume. Possible values: 0 (creating), 1 (available), 2 (in-use), 3 (deleting), 4 (deleted), 5 (error).	Dependent item	aws.ec2.ebs.status["{#VOLUME_ID}"] Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Attachment state	The attachment state of the volume. Possible values: 0 (attaching), 1 (attached), 2 (detaching).	Dependent item	aws.ec2.ebs.attachmentstatus["{#VOLUMEID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Attachment time	The time stamp when the attachment initiated.	Dependent item	aws.ec2.ebs.attachmenttime["{#VOLUMEID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Device	The device name specified in the block device mapping (for example, /dev/sda1).	Dependent item	aws.ec2.ebs.device["{#VOLUME_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Get metrics	Get metrics of EBS volume. Full metrics list related to EBS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/usingcloudwatchebs.html	Script	aws.ec2.getebsmetrics["{#VOLUME_ID}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Read, bytes	Provides information on the read operations in a specified period of time. The average size of each read operation during the period, except on volumes attached to a Nitro-based instance, where the average represents the average over the specified period. For Xen instances, data is reported only when there is read activity on the volume.	Dependent item	aws.ec2.ebs.volume.readbytes["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "VolumeReadBytes")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Write, bytes	Provides information on the write operations in a specified period of time. The average size of each write operation during the period, except on volumes attached to a Nitro-based instance, where the average represents the average over the specified period. For Xen instances, data is reported only when there is write activity on the volume.	Dependent item	aws.ec2.ebs.volume.writebytes["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "VolumeWriteBytes")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Write, ops	The total number of write operations in a specified period of time. Note: write operations are counted on completion.	Dependent item	aws.ec2.ebs.volume.writeops["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "VolumeWriteOps")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Read, ops	The total number of read operations in a specified period of time. Note: read operations are counted on completion.	Dependent item	aws.ec2.ebs.volume.readops["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "VolumeReadOps")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Read time, total	This metric is not supported with Multi-Attach enabled volumes. The total number of seconds spent by all read operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 1 minutes (60 seconds): if 150 operations completed during that period, and each operation took 1 second, the value would be 150 seconds. For Xen instances, data is reported only when there is read activity on the volume.	Dependent item	aws.ec2.ebs.volume.totalreadtime["{#VOLUME_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Write time, total	This metric is not supported with Multi-Attach enabled volumes. The total number of seconds spent by all write operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 1 minute (60 seconds): if 150 operations completed during that period, and each operation took 1 second, the value would be 150 seconds. For Xen instances, data is reported only when there is write activity on the volume.	Dependent item	aws.ec2.ebs.volume.totalwritetime["{#VOLUME_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Idle time	This metric is not supported with Multi-Attach enabled volumes. The total number of seconds in a specified period of time when no read or write operations were submitted.	Dependent item	aws.ec2.ebs.volume.idletime["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "VolumeIdleTime")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Queue length	The number of read and write operation requests waiting to be completed in a specified period of time.	Dependent item	aws.ec2.ebs.volume.queuelength["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "VolumeQueueLength")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Throughput, pct	This metric is not supported with Multi-Attach enabled volumes. Used with Provisioned IOPS SSD volumes only. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. Provisioned IOPS SSD volumes deliver their provisioned performance 99.9 percent of the time. During a write, if there are no other pending I/O requests in a minute, the metric value will be 100 percent. Also, a volume's I/O performance may become degraded temporarily due to an action you have taken (for example, creating a snapshot of a volume during peak usage, running the volume on a non-EBS-optimized instance, or accessing data on the volume for the first time).	Dependent item	aws.ec2.ebs.volume.throughputpercentage["{#VOLUMEID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Consumed Read/Write, ops	Used with Provisioned IOPS SSD volumes only. The total amount of read and write operations (normalized to 256K capacity units) consumed in a specified period of time. I/O operations that are smaller than 256K each count as 1 consumed IOPS. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS.	Dependent item	aws.ec2.ebs.volume.consumedreadwriteops["{#VOLUMEID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Burst balance	Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1), and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket. Data is reported to CloudWatch only when the volume is active. If the volume is not attached, no data is reported.	Dependent item	aws.ec2.ebs.volume.burstbalance["{#VOLUMEID}"] Preprocessing JSON Path: `$.[?(@.Label == "BurstBalance")].Values.first().first()` ⛔️Custom on fail: Discard value

Trigger prototypes for Instance Volumes discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS EC2: Volume [{#VOLUME_ID}] has 'error' state		`last(/AWS EC2 by HTTP/aws.ec2.ebs.status["{#VOLUME_ID}"])=5`\|Warning
AWS EC2: Burst balance is too low		`max(/AWS EC2 by HTTP/aws.ec2.ebs.volume.burst_balance["{#VOLUME_ID}"],5m)<{$AWS.EBS.BURST.CREDIT.BALANCE.MIN.WARN}`\|Warning

AWS RDS instance by HTTP

Overview

The template to monitor AWS RDS instance by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS RDS instance by HTTP

Configuration

Setup

The template get AWS RDS instance metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon RDS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.RDS.INSTANCE.ID}.

For more information about manage access keys, see official documentation

Also, see the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	Amazon RDS Region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.RDS.INSTANCE.ID}	RDS DB Instance identifier.
{$AWS.RDS.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.RDS.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.RDS.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.EVENT_CATEGORY.MATCHES}	Filter of discoverable events by category.	`.*`
{$AWS.RDS.LLD.FILTER.EVENTCATEGORY.NOTMATCHES}	Filter to exclude discovered events by category.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.EVENTSOURCETYPE.MATCHES}	Filter of discoverable events by source type.	`.*`
{$AWS.RDS.LLD.FILTER.EVENTSOURCETYPE.NOT_MATCHES}	Filter to exclude discovered events by source type.	`CHANGE_IF_NEEDED`
{$AWS.RDS.CPU.UTIL.WARN.MAX}	The warning threshold of the CPU utilization expressed in %.	`85`
{$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN}	Minimum number of free earned CPU credits for trigger expression.	`50`
{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of I/O credits remaining for trigger expression.	`20`
{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`
{$AWS.RDS.BURST.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get instance metrics. Full metrics list related to RDS: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-metrics.html Full metrics list related to Amazon Aurora: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances	Script	aws.rds.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get instance info	Get instance info. DescribeDBInstances API method: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeDBInstances.html	Script	aws.rds.getinstanceinfo Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get instance alarms data	DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.rds.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get instance events data	DescribeEvents API method: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeEvents.html	Script	aws.rds.get_events Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.rds.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get instance info check	Data collection check.	Dependent item	aws.rds.instance_info.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.rds.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get events check	Data collection check.	Dependent item	aws.rds.events.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Class	Contains the name of the compute and memory capacity class of the DB instance.	Dependent item	aws.rds.class Preprocessing JSON Path: `$[*].DBInstanceClass.first()` Discard unchanged with heartbeat: `3h`
Engine	Database engine.	Dependent item	aws.rds.engine Preprocessing JSON Path: `$..Engine.first()` Discard unchanged with heartbeat: `3h`
Engine version	Indicates the database engine version.	Dependent item	aws.rds.engine.version Preprocessing JSON Path: `$[*].EngineVersion.first()` Discard unchanged with heartbeat: `3h`
Status	Specifies the current state of this database. All possible status values and their description: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/accessing-monitoring.html#Overview.DBInstance.Status	Dependent item	aws.rds.status Preprocessing JSON Path: `$..DBInstanceStatus.first()` Discard unchanged with heartbeat: `3h`
Storage type	Specifies the storage type associated with DB instance.	Dependent item	aws.rds.storage_type Preprocessing JSON Path: `$[*].StorageType.first()` Discard unchanged with heartbeat: `3h`
Create time	Provides the date and time the DB instance was created.	Dependent item	aws.rds.create_time Preprocessing JSON Path: `$..InstanceCreateTime.first()`
Storage: Allocated	Specifies the allocated storage size specified in gibibytes (GiB).	Dependent item	aws.rds.storage.allocated Preprocessing JSON Path: `$[*].AllocatedStorage.first()` Discard unchanged with heartbeat: `3h`
Storage: Max allocated	The upper limit in gibibytes (GiB) to which Amazon RDS can automatically scale the storage of the DB instance. If limit is not specified returns -1.	Dependent item	aws.rds.storage.max_allocated Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Read replica: State	The status of a read replica. If the instance isn't a read replica, this is blank. Boolean value that is true if the instance is operating normally, or false if the instance is in an error state.	Dependent item	aws.rds.readreplicastate Preprocessing JSON Path: `$..StatusInfos..Normal.first()` ⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `3h`
Read replica: Status	The status of a read replica. If the instance isn't a read replica, this is blank. Status of the DB instance. For a StatusType of read replica, the values can be replicating, replication stop point set, replication stop point reached, error, stopped, or terminated.	Dependent item	aws.rds.readreplicastatus Preprocessing JSON Path: `$..StatusInfos..Status.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Swap usage	The amount of swap space used. This metric is available for the Aurora PostgreSQL DB instance classes db.t3.medium, db.t3.large, db.r4.large, db.r4.xlarge, db.r5.large, db.r5.xlarge, db.r6g.large, and db.r6g.xlarge. For Aurora MySQL, this metric applies only to db.t* DB instance classes. This metric is not available for SQL Server.	Dependent item	aws.rds.swap_usage Preprocessing JSON Path: `$.[?(@.Label == "SwapUsage")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Write IOPS	The number of write records generated per second. This is more or less the number of log records generated by the database. These do not correspond to 8K page writes, and do not correspond to network packets sent.	Dependent item	aws.rds.write_iops.rate Preprocessing JSON Path: `$.[?(@.Label == "WriteIOPS")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Write latency	The average amount of time taken per disk I/O operation.	Dependent item	aws.rds.write_latency Preprocessing JSON Path: `$.[?(@.Label == "WriteLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Write throughput	The average number of bytes written to persistent storage every second.	Dependent item	aws.rds.write_throughput.rate Preprocessing JSON Path: `$.[?(@.Label == "WriteThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Network: Receive throughput	The incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.	Dependent item	aws.rds.networkreceivethroughput.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Burst balance	The percent of General Purpose SSD (gp2) burst-bucket I/O credits available.	Dependent item	aws.rds.burst_balance Preprocessing JSON Path: `$.[?(@.Label == "BurstBalance")].Values.first().first()` ⛔️Custom on fail: Discard value
CPU: Utilization	The percentage of CPU utilization.	Dependent item	aws.rds.cpu.utilization Preprocessing JSON Path: `$.[?(@.Label == "CPUUtilization")].Values.first().first()` ⛔️Custom on fail: Discard value
Credit CPU: Balance	The number of CPU credits that an instance has accumulated, reported at 5-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate. When an instance is running, credits in the CPUCreditBalance don't expire. When the instance stops, the CPUCreditBalance does not persist, and all accrued credits are lost. This metric applies only to db.t2.small and db.t2.medium instances for Aurora MySQL, and to db.t3 instances for Aurora PostgreSQL.	Dependent item	aws.rds.cpu.credit_balance Preprocessing JSON Path: `$.[?(@.Label == "CPUCreditBalance")].Values.first().first()` ⛔️Custom on fail: Discard value
Credit CPU: Usage	The number of CPU credits consumed during the specified period, reported at 5-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance. This metric applies only to db.t2.small and db.t2.medium instances for Aurora MySQL, and to db.t3 instances for Aurora PostgreSQL	Dependent item	aws.rds.cpu.credit_usage Preprocessing JSON Path: `$.[?(@.Label == "CPUCreditUsage")].Values.first().first()` ⛔️Custom on fail: Discard value
Connections	The number of client network connections to the database instance. The number of database sessions can be higher than the metric value because the metric value doesn't include the following: - Sessions that no longer have a network connection but which the database hasn't cleaned up - Sessions created by the database engine for its own purposes - Sessions created by the database engine's parallel execution capabilities - Sessions created by the database engine job scheduler - Amazon Aurora/RDS connections	Dependent item	aws.rds.database_connections Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Queue depth	The number of outstanding read/write requests waiting to access the disk.	Dependent item	aws.rds.diskqueuedepth Preprocessing JSON Path: `$.[?(@.Label == "DiskQueueDepth")].Values.first().first()` ⛔️Custom on fail: Discard value
EBS: Byte balance	The percentage of throughput credits remaining in the burst bucket of your RDS database. This metric is available for basic monitoring only. To find the instance sizes that support this metric, see the instance sizes with an asterisk (*) in the EBS optimized by default table (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current) in Amazon RDS User Guide for Linux Instances.	Dependent item	aws.rds.ebsbytebalance Preprocessing JSON Path: `$.[?(@.Label == "EBSByteBalance%")].Values.first().first()` ⛔️Custom on fail: Discard value
EBS: IO balance	The percentage of I/O credits remaining in the burst bucket of your RDS database. This metric is available for basic monitoring only. To find the instance sizes that support this metric, see the instance sizes with an asterisk (*) in the EBS optimized by default table (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current) in Amazon RDS User Guide for Linux Instances.	Dependent item	aws.rds.ebsiobalance Preprocessing JSON Path: `$.[?(@.Label == "EBSIOBalance%")].Values.first().first()` ⛔️Custom on fail: Discard value
Memory, freeable	The amount of available random access memory. For MariaDB, MySQL, Oracle, and PostgreSQL DB instances, this metric reports the value of the MemAvailable field of /proc/meminfo.	Dependent item	aws.rds.freeable_memory Preprocessing JSON Path: `$.[?(@.Label == "FreeableMemory")].Values.first().first()` ⛔️Custom on fail: Discard value
Storage: Local free	The amount of local storage available, in bytes. Unlike for other DB engines, for Aurora DB instances this metric reports the amount of storage available to each DB instance. This value depends on the DB instance class. You can increase the amount of free storage space for an instance by choosing a larger DB instance class for your instance. (This doesn't apply to Aurora Serverless v2.)	Dependent item	aws.rds.freelocalstorage Preprocessing JSON Path: `$.[?(@.Label == "FreeLocalStorage")].Values.first().first()` ⛔️Custom on fail: Discard value
Network: Receive throughput	The incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. For Amazon Aurora: The amount of network throughput received from the Aurora storage subsystem by each instance in the DB cluster.	Dependent item	aws.rds.storagenetworkreceive_throughput Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Network: Transmit throughput	The outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. For Amazon Aurora: The amount of network throughput sent to the Aurora storage subsystem by each instance in the Aurora MySQL DB cluster.	Dependent item	aws.rds.storagenetworktransmit_throughput Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Read IOPS	The average number of disk I/O operations per second. Aurora PostgreSQL-Compatible Edition reports read and write IOPS separately, in 1-minute intervals.	Dependent item	aws.rds.read_iops.rate Preprocessing JSON Path: `$.[?(@.Label == "ReadIOPS")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Read latency	The average amount of time taken per disk I/O operation.	Dependent item	aws.rds.read_latency Preprocessing JSON Path: `$.[?(@.Label == "ReadLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Read throughput	The average number of bytes read from disk per second.	Dependent item	aws.rds.read_throughput.rate Preprocessing JSON Path: `$.[?(@.Label == "ReadThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Network: Transmit throughput	The outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.	Dependent item	aws.rds.networktransmitthroughput.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Network: Throughput	The amount of network throughput both received from and transmitted to clients by each instance in the Aurora MySQL DB cluster, in bytes per second. This throughput doesn't include network traffic between instances in the DB cluster and the cluster volume.	Dependent item	aws.rds.network_throughput.rate Preprocessing JSON Path: `$.[?(@.Label == "NetworkThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Storage: Space free	The amount of available storage space.	Dependent item	aws.rds.freestoragespace Preprocessing JSON Path: `$.[?(@.Label == "FreeStorageSpace")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Read IOPS, local storage	The average number of disk read I/O operations to local storage per second. Only applies to Multi-AZ DB clusters.	Dependent item	aws.rds.readiopslocal_storage.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Read latency, local storage	The average amount of time taken per disk I/O operation for local storage. Only applies to Multi-AZ DB clusters.	Dependent item	aws.rds.readlatencylocal_storage Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Read throughput, local storage	The average number of bytes read from disk per second for local storage. Only applies to Multi-AZ DB clusters.	Dependent item	aws.rds.readthroughputlocal_storage.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Replication: Lag	The amount of time a read replica DB instance lags behind the source DB instance. Applies to MySQL, MariaDB, Oracle, PostgreSQL, and SQL Server read replicas.	Dependent item	aws.rds.replica_lag Preprocessing JSON Path: `$.[?(@.Label == "ReplicaLag")].Values.first().first()` ⛔️Custom on fail: Discard value
Disk: Write IOPS, local storage	The average number of disk write I/O operations per second on local storage in a Multi-AZ DB cluster.	Dependent item	aws.rds.writeiopslocal_storage.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Write latency, local storage	The average amount of time taken per disk I/O operation on local storage in a Multi-AZ DB cluster.	Dependent item	aws.rds.writelatencylocal_storage Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Write throughput, local storage	The average number of bytes written to disk per second for local storage.	Dependent item	aws.rds.writethroughputlocal_storage.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
SQLServer: Failed agent jobs	The number of failed Microsoft SQL Server Agent jobs during the last minute.	Dependent item	aws.rds.failedsqlserveragentjobs_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Disk: Binlog Usage	The amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas.	Dependent item	aws.rds.binlogdisk_usage Preprocessing JSON Path: `$.[?(@.Label == "BinLogDiskUsage")].Values.first().first()` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
AWS RDS: Failed to get metrics data	Failed to get CloudWatch metrics for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.metrics.check))>0`\|Warning
AWS RDS: Failed to get instance data	Failed to get CloudWatch instance info for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.instance_info.check))>0`\|Warning
AWS RDS: Failed to get alarms data	Failed to get CloudWatch alarms for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.alarms.check))>0`\|Warning
AWS RDS: Failed to get events data	Failed to get CloudWatch events for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.events.check))>0`\|Warning
AWS RDS: Read replica in error state	The status of a read replica. False if the instance is in an error state.	`last(/AWS RDS instance by HTTP/aws.rds.read_replica_state)=0`\|Average
AWS RDS: Burst balance is too low		`max(/AWS RDS instance by HTTP/aws.rds.burst_balance,5m)<{$AWS.RDS.BURST.CREDIT.BALANCE.MIN.WARN}`\|Warning
AWS RDS: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS RDS instance by HTTP/aws.rds.cpu.utilization,15m)>{$AWS.RDS.CPU.UTIL.WARN.MAX}`\|Warning
AWS RDS: Instance CPU Credit balance is too low	The number of earned CPU credits has been less than {$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN} in the last 5 minutes.	`max(/AWS RDS instance by HTTP/aws.rds.cpu.credit_balance,5m)<{$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN}`\|Warning
AWS RDS: Byte Credit balance is too low		`max(/AWS RDS instance by HTTP/aws.rds.ebs_byte_balance,5m)<{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}`\|Warning
AWS RDS: I/O Credit balance is too low		`max(/AWS RDS instance by HTTP/aws.rds.ebs_io_balance,5m)<{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}`\|Warning

LLD rule Instance Alarms discovery

Name Description Type Key and additional info

Instance Alarms discovery

Discovery instance alarms.

Dependent item

aws.rds.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.rds.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateReason.first()
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM).

Alarm description:

{#ALARMDESCRIPTION}

Dependent item

aws.rds.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateValue.first()
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Instance Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS RDS: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS RDS instance by HTTP/aws.rds.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS RDS instance by HTTP/aws.rds.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS RDS: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS RDS instance by HTTP/aws.rds.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Aurora metrics discovery

Name Description Type Key and additional info

Aurora metrics discovery

Discovery Amazon Aurora metrics.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances

Dependent item

aws.rds.aurora.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Aurora metrics discovery

Name	Description	Type	Key and additional info
Row lock time	The total time spent acquiring row locks for InnoDB tables.	Dependent item	aws.rds.row_locktime[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "RowLockTime")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Select throughput	The average number of select queries per second.	Dependent item	aws.rds.select_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "SelectThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Select latency	The amount of latency for select queries.	Dependent item	aws.rds.select_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "SelectLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Replication: Lag, max	The maximum amount of lag between the primary instance and each Aurora DB instance in the DB cluster.	Dependent item	aws.rds.aurorareplicalag.max[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Replication: Lag, min	The minimum amount of lag between the primary instance and each Aurora DB instance in the DB cluster.	Dependent item	aws.rds.aurorareplicalag.min[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Replication: Lag	For an Aurora replica, the amount of lag when replicating updates from the primary instance.	Dependent item	aws.rds.aurorareplicalag[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "AuroraReplicaLag")].Values.first().first()` ⛔️Custom on fail: Discard value
Buffer Cache hit ratio	The percentage of requests that are served by the buffer cache.	Dependent item	aws.rds.buffercachehit_ratio[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Operations: Commit latency	The amount of latency for commit operations.	Dependent item	aws.rds.commit_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "CommitLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Commit throughput	The average number of commit operations per second.	Dependent item	aws.rds.commit_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "CommitThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Deadlocks, rate	The average number of deadlocks in the database per second.	Dependent item	aws.rds.deadlocks.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "Deadlocks")].Values.first().first()` ⛔️Custom on fail: Discard value
Engine uptime	The amount of time that the instance has been running.	Dependent item	aws.rds.engine_uptime[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "EngineUptime")].Values.first().first()` ⛔️Custom on fail: Discard value
Rollback segment history list length	The undo logs that record committed transactions with delete-marked records. These records are scheduled to be processed by the InnoDB purge operation.	Dependent item	aws.rds.rollbacksegmenthistorylistlength[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Network: Throughput	The amount of network throughput received from and sent to the Aurora storage subsystem by each instance in the Aurora MySQL DB cluster.	Dependent item	aws.rds.storagenetworkthroughput[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

LLD rule Aurora MySQL metrics discovery

Name Description Type Key and additional info

Aurora MySQL metrics discovery

Discovery Aurora MySQL metrics.

Storage types:

aurora (for MySQL 5.6-compatible Aurora)

aurora-mysql (for MySQL 5.7-compatible and MySQL 8.0-compatible Aurora)

Dependent item

aws.rds.postgresql.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Aurora MySQL metrics discovery

Name	Description	Type	Key and additional info
Operations: Delete latency	The amount of latency for delete queries.	Dependent item	aws.rds.delete_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "DeleteLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Delete throughput	The average number of delete queries per second.	Dependent item	aws.rds.delete_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "DeleteThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
DML: Latency	The amount of latency for inserts, updates, and deletes.	Dependent item	aws.rds.dml_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "DMLLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
DML: Throughput	The average number of inserts, updates, and deletes per second.	Dependent item	aws.rds.dml_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "DMLThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
DDL: Latency	The amount of latency for data definition language (DDL) requests - for example, create, alter, and drop requests.	Dependent item	aws.rds.ddl_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "DDLLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
DDL: Throughput	The average number of DDL requests per second.	Dependent item	aws.rds.ddl_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "DDLThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Backtrack: Window, actual	The difference between the target backtrack window and the actual backtrack window.	Dependent item	aws.rds.backtrackwindowactual[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Backtrack: Window, alert	The number of times that the actual backtrack window is smaller than the target backtrack window for a given period of time.	Dependent item	aws.rds.backtrackwindowalert[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Transactions: Blocked, rate	The average number of transactions in the database that are blocked per second.	Dependent item	aws.rds.blocked_transactions.rate[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Replication: Binlog lag	The amount of time that a binary log replica DB cluster running on Aurora MySQL-Compatible Edition lags behind the binary log replication source. A lag means that the source is generating records faster than the replica can apply them. The metric value indicates the following: A high value: The replica is lagging the replication source. 0 or a value close to 0: The replica process is active and current. -1: Aurora can't determine the lag, which can happen during replica setup or when the replica is in an error state	Dependent item	aws.rds.aurorareplicationbinlog_lag[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Transactions: Active, rate	The average number of current transactions executing on an Aurora database instance per second. By default, Aurora doesn't enable this metric. To begin measuring this value, set innodbmonitorenable='all' in the DB parameter group for a specific DB instance.	Dependent item	aws.rds.auroratransactionsactive.rate[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Connections: Aborted	The number of client connections that have not been closed properly.	Dependent item	aws.rds.auroraclientsaborted[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "AbortedClients")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Insert latency	The amount of latency for insert queries, in milliseconds.	Dependent item	aws.rds.insert_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "InsertLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Insert throughput	The average number of insert queries per second.	Dependent item	aws.rds.insert_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "InsertThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value
Login failures, rate	The average number of failed login attempts per second.	Dependent item	aws.rds.login_failures.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "LoginFailures")].Values.first().first()` ⛔️Custom on fail: Discard value
Queries, rate	The average number of queries executed per second.	Dependent item	aws.rds.queries.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "Queries")].Values.first().first()` ⛔️Custom on fail: Discard value
Resultset cache hit ratio	The percentage of requests that are served by the Resultset cache.	Dependent item	aws.rds.resultsetcachehitratio[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Binary log files, number	The number of binlog files generated.	Dependent item	aws.rds.numbinarylog_files[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "NumBinaryLogFiles")].Values.first().first()` ⛔️Custom on fail: Discard value
Binary log files, size	The total size of the binlog files.	Dependent item	aws.rds.sumbinarylog_files[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "SumBinaryLogSize")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Update latency	The amount of latency for update queries.	Dependent item	aws.rds.update_latency[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "UpdateLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Operations: Update throughput	The average number of update queries per second.	Dependent item	aws.rds.update_throughput.rate[{#SINGLETON}] Preprocessing JSON Path: `$.[?(@.Label == "UpdateThroughput")].Values.first().first()` ⛔️Custom on fail: Discard value

LLD rule Instance Events discovery

Name Description Type Key and additional info

Instance Events discovery

Discovery instance events.

Dependent item

aws.rds.events.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Events discovery

Name Description Type Key and additional info

[{#EVENTCATEGORY}]: {#EVENTSOURCETYPE}/{#EVENTSOURCE_ID}: Message

Provides the text of this event.

Dependent item

aws.rds.eventmessage["{#EVENTCATEGORY}/{#EVENTSOURCETYPE}/{#EVENTSOURCEID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JSON Path: $[-1]
Discard unchanged with heartbeat: 3h

[{#EVENTCATEGORY}]: {#EVENTSOURCETYPE}/{#EVENTSOURCE_ID} : Date

Provides the text of this event.

Dependent item

aws.rds.eventdate["{#EVENTCATEGORY}/{#EVENTSOURCETYPE}/{#EVENTSOURCEID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JSON Path: $[-1]
Discard unchanged with heartbeat: 3h

AWS S3 bucket by HTTP

Overview

The template to monitor AWS S3 bucket by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS S3 bucket by HTTP

Configuration

Setup

The template gets AWS S3 metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon S3 metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "s3:GetMetricsConfiguration"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "s3:GetMetricsConfiguration"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "s3:GetMetricsConfiguration",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

To gather Request metrics, enable Requests metrics on your Amazon S3 buckets from the AWS console.

You can also define a filter for the Request metrics using a shared prefix, object tag, or access point.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.S3.BUCKET.NAME}.

For more information about manage access keys, see official documentation

Also, see the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.REQUEST.REGION}	Region used in GET request `ListBuckets`.	`us-east-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.S3.BUCKET.NAME}	S3 bucket name.
{$AWS.S3.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.S3.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.S3.LLD.FILTER.ID.NAME.MATCHES}	Filter of discoverable request metrics by filter ID name.	`.*`
{$AWS.S3.LLD.FILTER.ID.NAME.NOT_MATCHES}	Filter to exclude discovered request metrics by filter ID name.	`CHANGE_IF_NEEDED`
{$AWS.S3.UPDATE.INTERVAL}	Interval in seconds for getting request metrics. Used in the metric configuration and in the JavaScript API query. Must be between 1 and 86400 seconds.	`1800`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get bucket metrics. Full metrics list related to S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html	Script	aws.s3.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get alarms data	Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.s3.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.s3.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.s3.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Bucket Size	This is a daily metric for the bucket. The amount of data in bytes stored in a bucket in the STANDARD storage class, INTELLIGENTTIERING storage class, Standard-Infrequent Access (STANDARDIA) storage class, OneZone-Infrequent Access (ONEZONE_IA), Reduced Redundancy Storage (RRS) class, S3 Glacier Instant Retrieval storage class, Deep Archive Storage (S3 Glacier Deep Archive) class, or S3 Glacier Flexible Retrieval (GLACIER) storage class. This value is calculated by summing the size of all objects and metadata in the bucket (both current and noncurrent objects), including the size of all parts for all incomplete multipart uploads to the bucket.	Dependent item	aws.s3.bucketsizebytes Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Number of objects	This is a daily metric for the bucket. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket.	Dependent item	aws.s3.numberofobjects Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS S3: Failed to get metrics data	Failed to get CloudWatch metrics for S3 bucket.	`length(last(/AWS S3 bucket by HTTP/aws.s3.metrics.check))>0`\|Warning
AWS S3: Failed to get alarms data	Failed to get CloudWatch alarms for S3 bucket.	`length(last(/AWS S3 bucket by HTTP/aws.s3.alarms.check))>0`\|Warning

LLD rule Bucket Alarms discovery

Name Description Type Key and additional info

Bucket Alarms discovery

Discovery of bucket alarms.

Dependent item

aws.s3.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Bucket Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.s3.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateReason.first()
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM).

Alarm description:

{#ALARMDESCRIPTION}

Dependent item

aws.s3.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateValue.first()
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Bucket Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS S3: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS S3 bucket by HTTP/aws.s3.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS S3 bucket by HTTP/aws.s3.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS S3: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS S3 bucket by HTTP/aws.s3.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Request Metrics discovery

Name Description Type Key and additional info

Request Metrics discovery

Discovery of request metrics.

Dependent item

aws.s3.configuration.discovery

Preprocessing

JSON Path: $.filter_id
Discard unchanged with heartbeat: 3h

Item prototypes for Request Metrics discovery

Name	Description	Type	Key and additional info
Filter [{#AWS.S3.FILTER.ID.NAME}]: Get request metrics	Get bucket request metrics filter: '{#AWS.S3.FILTER.ID.NAME}'. Full metrics list related to S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html	Script	aws.s3.get_metrics["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: All	The total number of HTTP requests made to an Amazon S3 bucket, regardless of type. If you're using a metrics configuration with a filter, then this metric only returns the HTTP requests that meet the filter's requirements.	Dependent item	aws.s3.all_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "AllRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Get	The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations. Paginated list-oriented requests, like List Multipart Uploads, List Parts, Get Bucket Object versions, and others, are not included in this metric.	Dependent item	aws.s3.get_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "GetRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Put	The number of HTTP PUT requests made for objects in an Amazon S3 bucket.	Dependent item	aws.s3.put_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "PutRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Delete	The number of HTTP DELETE requests made for objects in an Amazon S3 bucket. This also includes Delete Multiple Objects requests. This metric shows the number of requests, not the number of objects deleted.	Dependent item	aws.s3.delete_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "DeleteRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Head	The number of HTTP HEAD requests made to an Amazon S3 bucket.	Dependent item	aws.s3.head_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "HeadRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Post	The number of HTTP POST requests made to an Amazon S3 bucket. Delete Multiple Objects and SELECT Object Content requests are not included in this metric.	Dependent item	aws.s3.post_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "PostRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select	The number of Amazon S3 SELECT Object Content requests made for objects in an Amazon S3 bucket.	Dependent item	aws.s3.select_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "SelectRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select, bytes scanned	The number of bytes of data scanned with Amazon S3 SELECT Object Content requests in an Amazon S3 bucket. Statistic: Average (bytes per request).	Dependent item	aws.s3.selectbytesscanned["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select, bytes returned	The number of bytes of data returned with Amazon S3 SELECT Object Content requests in an Amazon S3 bucket. Statistic: Average (bytes per request).	Dependent item	aws.s3.selectbytesreturned["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: List	The number of HTTP requests that list the contents of a bucket.	Dependent item	aws.s3.list_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "ListRequests")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Bytes downloaded	The number of bytes downloaded for requests made to an Amazon S3 bucket, where the response includes a body. Statistic: Average (bytes per request).	Dependent item	aws.s3.bytes_downloaded["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "BytesDownloaded")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Bytes uploaded	The number of bytes uploaded that contain a request body, made to an Amazon S3 bucket. Statistic: Average (bytes per request).	Dependent item	aws.s3.bytes_uploaded["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "BytesUploaded")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Errors, 4xx	The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. The average statistic shows the error rate, and the sum statistic shows the count of that type of error, during each period. Statistic: Average (reports per request).	Dependent item	aws.s3.4xx_errors["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "4xxErrors")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Errors, 5xx	The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. The average statistic shows the error rate, and the sum statistic shows the count of that type of error, during each period. Statistic: Average (reports per request).	Dependent item	aws.s3.5xx_errors["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "5xxErrors")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: First byte latency, avg	The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. Statistic: Average.	Dependent item	aws.s3.firstbytelatency.avg["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "FirstByteLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: First byte latency, p90	The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. Statistic: 90th percentile.	Dependent item	aws.s3.firstbytelatency.p90["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "FirstByteLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Total request latency, avg	The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. Statistic: Average.	Dependent item	aws.s3.totalrequestlatency.avg["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Total request latency, p90	The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. Statistic: 90th percentile.	Dependent item	aws.s3.totalrequestlatency.p90["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Latency	The maximum number of seconds by which the replication destination region is behind the source Region for a given replication rule.	Dependent item	aws.s3.replication_latency["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Bytes pending	The total number of bytes of objects pending replication for a given replication rule.	Dependent item	aws.s3.bytespendingreplication["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Operations pending	The number of operations pending replication for a given replication rule.	Dependent item	aws.s3.operationspendingreplication["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

AWS ECS Serverless Cluster by HTTP

Overview

The template to monitor AWS ECS Serverless Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about the metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS ECS Cluster by HTTP

Configuration

Setup

The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "ecs:ListServices"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the following macros {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.ECS.CLUSTER.NAME}.

For more information about managing access keys, see official documentation

Refer to the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	Amazon ECS Region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.ECS.CLUSTER.NAME}	ECS cluster name.
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ECS.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable services by name.	`.*`
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered services by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}	The warning threshold of the cluster CPU utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}	The warning threshold of the cluster memory utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}	The warning threshold of the cluster service CPU utilization expressed in %.	`80`
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}	The warning threshold of the cluster service memory utilization expressed in %.	`80`

Items

Name	Description	Type	Key and additional info
Get cluster metrics	Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get cluster services	Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.getclusterservices Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get alarms data	Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.ecs.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.ecs.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.ecs.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Container Instance Count	The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.	Dependent item	aws.ecs.containerinstancecount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Task Count	The number of tasks running in the cluster.	Dependent item	aws.ecs.task_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Service Count	The number of services in the cluster.	Dependent item	aws.ecs.service_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
CPU Utilization	Cluster CPU utilization.	Dependent item	aws.ecs.cpu_utilization Preprocessing JSON Path: `$.CPUUtilization` ⛔️Custom on fail: Discard value
Memory Utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.memory_utilization Preprocessing JSON Path: `$.MemoryUtilization` ⛔️Custom on fail: Discard value
Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.rx Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.tx Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Ephemeral Storage Reserved	The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.ephemeral.storage.reserved Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
Ephemeral Storage Utilized	The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.ephemeral.storage.utilized Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
Ephemeral Storage Utilization	The calculated Disk Utilization.	Dependent item	aws.ecs.disk.utilization Preprocessing JSON Path: `$.DiskUtilization` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
AWS ECS Serverless: Failed to get metrics data	Failed to get CloudWatch metrics for ECS Cluster.	`length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.metrics.check))>0`\|Warning
AWS ECS Serverless: Failed to get alarms data	Failed to get CloudWatch alarms for ECS Cluster.	`length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarms.check))>0`\|Warning
AWS ECS Serverless: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}`\|Warning
AWS ECS Serverless: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}`\|Warning

LLD rule Cluster Alarms discovery

Name Description Type Key and additional info

Cluster Alarms discovery

Discovery instance alarms.

Dependent item

aws.ecs.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item

aws.ecs.alarm.getmetrics["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()
⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ecs.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.StateReason
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM).

Alarm description:

{#ALARMDESCRIPTION}

Dependent item

aws.ecs.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.StateValue
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Cluster Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Serverless: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS ECS Serverless: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Cluster Services discovery

Name Description Type Key and additional info

Cluster Services discovery

Discovery {$AWS.ECS.CLUSTER.NAME} services.

Dependent item

aws.ecs.services.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Services discovery

Name	Description	Type	Key and additional info
[{#AWS.ECS.SERVICE.NAME}]: Running Task	The number of tasks currently in the `running` state.	Dependent item	aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Pending Task	The number of tasks currently in the `pending` state.	Dependent item	aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Desired Task	The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Task Set	The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: CPU Reserved	A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: CPU Utilization	A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory utilized	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Memory utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory reserved	The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Ephemeral storage reserved	The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.services.ephemeral.storage.reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
[{#AWS.ECS.SERVICE.NAME}]: Ephemeral storage utilized	The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.services.ephemeral.storage.utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
[{#AWS.ECS.SERVICE.NAME}]: Storage read bytes	The number of bytes read from storage in the resource that is specified by the dimensions that you're using.	Dependent item	aws.ecs.services.storage.read.bytes["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Storage write bytes	The number of bytes written to storage in the resource that is specified by the dimensions that you're using.	Dependent item	aws.ecs.services.storage.write.bytes["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Get metrics	Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html	Script	aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Trigger prototypes for Cluster Services discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Serverless: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}`\|Warning
AWS ECS Serverless: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}`\|Warning

AWS ECS Cluster by HTTP

Overview

The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about the metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS ECS Cluster by HTTP

Configuration

Setup

The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "ecs:ListServices"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the following macros {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.ECS.CLUSTER.NAME}.

For more information about managing access keys, see official documentation

Refer to the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	Amazon ECS Region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.ECS.CLUSTER.NAME}	ECS cluster name.
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ECS.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable services by name.	`.*`
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered services by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}	The warning threshold of the cluster CPU utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}	The warning threshold of the cluster memory utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}	The warning threshold of the cluster service CPU utilization expressed in %.	`80`
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}	The warning threshold of the cluster service memory utilization expressed in %.	`80`

Items

Name	Description	Type	Key and additional info
Get cluster metrics	Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get cluster services	Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.getclusterservices Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get alarms data	Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.ecs.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.ecs.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.ecs.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Container Instance Count	The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.	Dependent item	aws.ecs.containerinstancecount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Task Count	The number of tasks running in the cluster.	Dependent item	aws.ecs.task_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Service Count	The number of services in the cluster.	Dependent item	aws.ecs.service_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
CPU Reserved	A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.cpu_reserved Preprocessing JSON Path: `$.[?(@.Label == "CpuReserved")].Values.first().first()` ⛔️Custom on fail: Discard value
CPU Utilization	Cluster CPU utilization	Dependent item	aws.ecs.cpu_utilization Preprocessing JSON Path: `$.CPUUtilization` ⛔️Custom on fail: Discard value
Memory Utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.memory_utilization Preprocessing JSON Path: `$.MemoryUtilization` ⛔️Custom on fail: Discard value
Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.rx Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.tx Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
AWS ECS Cluster: Failed to get metrics data	Failed to get CloudWatch metrics for ECS Cluster.	`length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0`\|Warning
AWS ECS Cluster: Failed to get alarms data	Failed to get CloudWatch alarms for ECS Cluster.	`length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0`\|Warning
AWS ECS Cluster: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}`\|Warning
AWS ECS Cluster: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}`\|Warning

LLD rule Cluster Alarms discovery

Name Description Type Key and additional info

Cluster Alarms discovery

Discovery instance alarms.

Dependent item

aws.ecs.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item

aws.ecs.alarm.getmetrics["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()
⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ecs.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.StateReason
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM).

Alarm description:

{#ALARMDESCRIPTION}

Dependent item

aws.ecs.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.StateValue
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Cluster Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Cluster: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has `Alarm` state. Reason: {ITEM.LASTVALUE2}	`last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS ECS Cluster: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Cluster Services discovery

Name Description Type Key and additional info

Cluster Services discovery

Discovery {$AWS.ECS.CLUSTER.NAME} services.

Dependent item

aws.ecs.services.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Services discovery

Name	Description	Type	Key and additional info
[{#AWS.ECS.SERVICE.NAME}]: Running Task	The number of tasks currently in the `running` state.	Dependent item	aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Pending Task	The number of tasks currently in the `pending` state.	Dependent item	aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Desired Task	The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Task Set	The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: CPU Reserved	A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: CPU Utilization	A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory utilized	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Memory utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory reserved	The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Get metrics	Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html	Script	aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Trigger prototypes for Cluster Services discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Cluster: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}`\|Warning
AWS ECS Cluster: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}`\|Warning

AWS ELB Application Load Balancer by HTTP

Overview

Please scroll down for AWS ELB Network Load Balancer by HTTP.

The template is designed to monitor AWS ELB Application Load Balancer by HTTP via Zabbix, and it works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS ELB Application Load Balancer with Target Groups by HTTP

Configuration

Setup

The template gets AWS ELB Application Load Balancer metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the ELB policies page on the AWS website.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS ELB Application Load Balancer metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "elasticloadbalancing:DescribeTargetGroups"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.ELB.ARN}.

For more information about managing access keys, see official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	AWS Application Load Balancer region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.ELB.ARN}	Amazon Resource Names (ARN) of the load balancer.
{$AWS.HTTP.4XX.FAIL.MAX.WARN}	Maximum number of HTTP request failures for a trigger expression.	`5`
{$AWS.HTTP.5XX.FAIL.MAX.WARN}	Maximum number of HTTP request failures for a trigger expression.	`5`
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.MATCHES}	Filter of discoverable target groups by name.	`.*`
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.NOT_MATCHES}	Filter to exclude discovered target groups by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ELB.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get ELB Application Load Balancer metrics. Full metrics list related to Application Load Balancer: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html	Script	aws.elb.alb.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get target groups	Get ELB target group. `DescribeTargetGroups` API method: https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_DescribeTargetGroups.html	Script	aws.elb.alb.gettargetgroups Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get ELB ALB alarms data	`DescribeAlarms` API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.elb.alb.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Check that the Application Load Balancer metrics data has been received correctly.	Dependent item	aws.elb.alb.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check that the alarm data has been received correctly.	Dependent item	aws.elb.alb.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Active Connection Count	The total number of active concurrent TCP connections from clients to the load balancer and from the load balancer to targets.	Dependent item	aws.elb.alb.activeconnectioncount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
New Connection Count	The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets.	Dependent item	aws.elb.alb.newconnectioncount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Rejected Connection Count	The number of connections that were rejected because the load balancer had reached its maximum number of connections.	Dependent item	aws.elb.alb.rejectedconnectioncount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Requests Count	The number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target. Requests that are rejected before a target is chosen are not reflected in this metric.	Dependent item	aws.elb.alb.requests_count Preprocessing JSON Path: `$.[?(@.Label == "RequestCount")].Values.first().first()` ⛔️Custom on fail: Discard value
Target Response Time	The time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received. This is equivalent to the `target_processing_time` field in the access logs.	Dependent item	aws.elb.alb.targetresponsetime Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP Fixed Response Count	The number of fixed-response actions that were successful.	Dependent item	aws.elb.alb.httpfixedresponse_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Rule Evaluations	The number of rules processed by the load balancer given a request rate averaged over an hour.	Dependent item	aws.elb.alb.rule_evaluations Preprocessing JSON Path: `$.[?(@.Label == "RuleEvaluations")].Values.first().first()` ⛔️Custom on fail: Discard value
Client TLS Negotiation Error Count	The number of TLS connections initiated by the client that did not establish a session with the load balancer due to a TLS error. Possible causes include a mismatch of ciphers or protocols or the client failing to verify the server certificate and closing the connection.	Dependent item	aws.elb.alb.clienttlsnegotiationerrorcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Target TLS Negotiation Error Count	The number of TLS connections initiated by the load balancer that did not establish a session with the target. Possible causes include a mismatch of ciphers or protocols. This metric does not apply if the target is a Lambda function.	Dependent item	aws.elb.alb.targettlsnegotiationerrorcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Target Connection Error Count	The number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function.	Dependent item	aws.elb.alb.targetconnectionerror_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Consumed LCUs	The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.alb.capacity_units Preprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs")].Values.first().first()` ⛔️Custom on fail: Discard value
Processed Bytes	The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload). This count includes traffic to and from clients and Lambda functions, and traffic from an Identity Provider (IdP) if user authentication is enabled.	Dependent item	aws.elb.alb.processed_bytes Preprocessing JSON Path: `$.[?(@.Label == "ProcessedBytes")].Values.first().first()` ⛔️Custom on fail: Discard value
Desync Mitigation Mode Non Compliant Request Count	The number of requests that fail to comply with HTTP protocols.	Dependent item	aws.elb.alb.noncompliantrequest_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP Redirect Count	The number of redirect actions that were successful.	Dependent item	aws.elb.alb.httpredirectcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP Redirect Url Limit Exceeded Count	The number of redirect actions that could not be completed because the URL in the response location header is larger than 8K bytes.	Dependent item	aws.elb.alb.httpredirecturllimitexceeded_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 3XX Count	The number of HTTP 3XX redirection codes that originate from the load balancer. This count does not include response codes generated by targets.	Dependent item	aws.elb.alb.http3xxcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 4XX Count	The number of HTTP 4XX client error codes that originate from the load balancer. Client errors are generated when requests are malformed or incomplete. These requests were not received by the target, other than in the case where the load balancer returns an HTTP 460 error code. This count does not include any response codes generated by the targets.	Dependent item	aws.elb.alb.http4xxcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 5XX Count	The number of HTTP 5XX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets.	Dependent item	aws.elb.alb.http5xxcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 500 Count	The number of HTTP 500 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http500count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 502 Count	The number of HTTP 502 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http502count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 503 Count	The number of HTTP 503 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http503count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB HTTP 504 Count	The number of HTTP 504 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http504count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB Auth Error	The number of user authentications that could not be completed because an authenticate action was misconfigured, the load balancer could not establish a connection with the IdP, or the load balancer could not complete the authentication flow due to an internal error.	Dependent item	aws.elb.alb.auth_error Preprocessing JSON Path: `$.[?(@.Label == "ELBAuthError")].Values.first().first()` ⛔️Custom on fail: Discard value
ELB Auth Failure	The number of user authentications that could not be completed because the IdP denied access to the user or an authorization code was used more than once.	Dependent item	aws.elb.alb.auth_failure Preprocessing JSON Path: `$.[?(@.Label == "ELBAuthFailure")].Values.first().first()` ⛔️Custom on fail: Discard value
ELB Auth User Claims Size Exceeded	The number of times that a configured IdP returned user claims that exceeded 11K bytes in size.	Dependent item	aws.elb.alb.authuserclaimssizeexceeded Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
ELB Auth Latency	The time elapsed, in milliseconds, to query the IdP for the ID token and user info. If one or more of these operations fail, this is the time to failure.	Dependent item	aws.elb.alb.auth_latency Preprocessing JSON Path: `$.[?(@.Label == "ELBAuthLatency")].Values.first().first()` ⛔️Custom on fail: Discard value
ELB Auth Success	The number of authenticate actions that were successful. This metric is incremented at the end of the authentication workflow, after the load balancer has retrieved the user claims from the IdP.	Dependent item	aws.elb.alb.auth_success Preprocessing JSON Path: `$.[?(@.Label == "ELBAuthSuccess")].Values.first().first()` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression
AWS ELB ALB: Failed to get metrics data	Failed to get CloudWatch metrics for Application Load Balancer.	`length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.metrics.check))>0`\|Warning
AWS ELB ALB: Failed to get alarms data	Failed to get CloudWatch alarms for Application Load Balancer.	`length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarms.check))>0`\|Warning
AWS ELB ALB: Too many HTTP 4XX error codes	Too many requests failed with HTTP 4XX code.	`min(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.http_4xx_count,5m)>{$AWS.HTTP.4XX.FAIL.MAX.WARN}`\|Warning
AWS ELB ALB: Too many HTTP 5XX error codes	Too many requests failed with HTTP 5XX code.	`min(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.http_5xx_count,5m)>{$AWS.HTTP.5XX.FAIL.MAX.WARN}`\|Warning

LLD rule Load Balancer alarm discovery

Name Description Type Key and additional info

Load Balancer alarm discovery

Used for the discovery of alarm balancers.

Dependent item

aws.elb.alb.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Load Balancer alarm discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get metrics about the alarm state and its reason.

Dependent item

aws.elb.alb.alarm.getmetrics["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()
⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state reason in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.alb.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.StateReason
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The value of the alarm state. Possible values:

0 - OK;

1 - INSUFFICIENT_DATA;

2 - ALARM.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.alb.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.StateValue
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Load Balancer alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB ALB: [{#ALARM_NAME}] has 'Alarm' state	The alarm `{#ALARM_NAME}` is in the ALARM state. Reason: `{ITEM.LASTVALUE2}`	`last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS ELB ALB: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Target groups discovery

Name Description Type Key and additional info

Target groups discovery

Used for the discovery of {$AWS.ELB.TARGET.GROUP.NAME} target groups.

Dependent item

aws.elb.alb.target_groups.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Target groups discovery

Name	Description	Type	Key and additional info
[{#AWS.ELB.TARGET.GROUP.NAME}]: Get metrics	Get the metrics of the ELB target group `{#AWS.ELB.TARGET.GROUP.NAME}`. Full list of metrics related to AWS ELB here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html#user-authentication-metric-table	Script	aws.elb.alb.targetgroups.getmetrics["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 2XX Count	The number of HTTP response 2XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.targetgroups.http2xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 3XX Count	The number of HTTP response 3XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.targetgroups.http3xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 4XX Count	The number of HTTP response 4XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.targetgroups.http4xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 5XX Count	The number of HTTP response 5XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.targetgroups.http5xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy Host Count	The number of targets that are considered healthy.	Dependent item	aws.elb.alb.targetgroups.healthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "HealthyHostCount")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Host Count	The number of targets that are considered unhealthy.	Dependent item	aws.elb.alb.targetgroups.unhealthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy State Routing	The number of zones that meet the routing healthy state requirements.	Dependent item	aws.elb.alb.targetgroups.healthystate_routing["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy State Routing	The number of zones that do not meet the routing healthy state requirements, and therefore the load balancer distributes traffic to all targets in the zone, including the unhealthy targets.	Dependent item	aws.elb.alb.targetgroups.unhealthystate_routing["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Request Count Per Target	The average request count per target, in a target group. You must specify the target group using the TargetGroup dimension.	Dependent item	aws.elb.alb.target_groups.request["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Routing Request Count	The average request count per target, in a target group.	Dependent item	aws.elb.alb.targetgroups.unhealthyroutingrequestcount["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Mitigated Host Count	The number of targets under mitigation.	Dependent item	aws.elb.alb.targetgroups.mitigatedhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Anomalous Host Count	The number of hosts detected with anomalies.	Dependent item	aws.elb.alb.targetgroups.anomaloushost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy State DNS	The number of zones that meet the DNS healthy state requirements.	Dependent item	aws.elb.alb.targetgroups.healthystate_dns["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "HealthyStateDNS")].Values.first().first()` ⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy State DNS	The number of zones that do not meet the DNS healthy state requirements and therefore were marked unhealthy in DNS.	Dependent item	aws.elb.alb.targetgroups.unhealthystate_dns["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing JSON Path: `$.[?(@.Label == "UnhealthyStateDNS")].Values.first().first()` ⛔️Custom on fail: Discard value

AWS ELB Network Load Balancer by HTTP

Overview

The template is designed to monitor AWS ELB Network Load Balancer by HTTP via Zabbix, and it works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS ELB Network Load Balancer with Target Groups by HTTP

Configuration

Setup

The template gets AWS ELB Network Load Balancer metrics and uses the script item to make HTTP requests to the CloudWatch API.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS ELB Network Load Balancer metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "elasticloadbalancing:DescribeTargetGroups"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.ELB.ARN}.

For more information about managing access keys, see official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	AWS Network Load Balancer region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.ELB.ARN}	Amazon Resource Names (ARN) of the load balancer.
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.MATCHES}	Filter of discoverable target groups by name.	`.*`
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.NOT_MATCHES}	Filter to exclude discovered target groups by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ELB.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.UNHEALTHY.HOST.MAX}	Maximum number of unhealthy hosts for a trigger expression.	`0`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get ELB Network Load Balancer metrics. Full metrics list related to Network Load Balancer: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html	Script	aws.elb.nlb.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get target groups	Get ELB target group. `DescribeTargetGroups` API method: https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_DescribeTargetGroups.html	Script	aws.elb.nlb.gettargetgroups Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get ELB NLB alarms data	`DescribeAlarms` API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.elb.nlb.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Check that the Network Load Balancer metrics data has been received correctly.	Dependent item	aws.elb.nlb.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check that the alarm data has been received correctly.	Dependent item	aws.elb.nlb.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Active Flow Count	The total number of concurrent flows (or connections) from clients to targets. This metric includes connections in the `SYN_SENT` and `ESTABLISHED` states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow.	Dependent item	aws.elb.nlb.activeflowcount Preprocessing JSON Path: `$.[?(@.Label == "ActiveFlowCount")].Values.first().first()` ⛔️Custom on fail: Discard value
Active Flow Count TCP	The total number of concurrent TCP flows (or connections) from clients to targets. This metric includes connections in the `SYN_SENT` and `ESTABLISHED` states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow.	Dependent item	aws.elb.nlb.activeflowcount_tcp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Active Flow Count TLS	The total number of concurrent TLS flows (or connections) from clients to targets. This metric includes connections in the `SYN_SENT` and `ESTABLISHED` states.	Dependent item	aws.elb.nlb.activeflowcount_tls Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Active Flow Count UDP	The total number of concurrent UDP flows (or connections) from clients to targets.	Dependent item	aws.elb.nlb.activeflowcount_udp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Client TLS Negotiation Error Count	The total number of TLS handshakes that failed during negotiation between a client and a TLS listener.	Dependent item	aws.elb.nlb.clienttlsnegotiationerrorcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Consumed LCUs	The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacity_units Preprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs")].Values.first().first()` ⛔️Custom on fail: Discard value
Consumed LCUs TCP	The number of load balancer capacity units (LCU) used by your load balancer for TCP. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacityunitstcp Preprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs_TCP")].Values.first().first()` ⛔️Custom on fail: Discard value
Consumed LCUs TLS	The number of load balancer capacity units (LCU) used by your load balancer for TLS. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacityunitstls Preprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs_TLS")].Values.first().first()` ⛔️Custom on fail: Discard value
Consumed LCUs UDP	The number of load balancer capacity units (LCU) used by your load balancer for UDP. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacityunitsudp Preprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs_UDP")].Values.first().first()` ⛔️Custom on fail: Discard value
New Flow Count	The total number of new flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.newflowcount Preprocessing JSON Path: `$.[?(@.Label == "NewFlowCount")].Values.first().first()` ⛔️Custom on fail: Discard value
New Flow Count TCP	The total number of new TCP flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.newflowcount_tcp Preprocessing JSON Path: `$.[?(@.Label == "NewFlowCount_TCP")].Values.first().first()` ⛔️Custom on fail: Discard value
New Flow Count TLS	The total number of new TLS flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.newflowcount_tls Preprocessing JSON Path: `$.[?(@.Label == "NewFlowCount_TLS")].Values.first().first()` ⛔️Custom on fail: Discard value
New Flow Count UDP	The total number of new UDP flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.newflowcount_udp Preprocessing JSON Path: `$.[?(@.Label == "NewFlowCount_UDP")].Values.first().first()` ⛔️Custom on fail: Discard value
Peak Packets per second	Highest average packet rate (packets processed per second), calculated every 10 seconds during the sampling window. This metric includes health check traffic.	Dependent item	aws.elb.nlb.peak_packets.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Port Allocation Error Count	The total number of ephemeral port allocation errors during a client IP translation operation. A non-zero value indicates dropped client connections. Note: Network Load Balancers support 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port) when performing client address translation. To fix port allocation errors, add more targets to the target group.	Dependent item	aws.elb.nlb.portallocationerror_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Processed Bytes	The total number of bytes processed by the load balancer, including TCP/IP headers. This count includes traffic to and from targets, minus health check traffic.	Dependent item	aws.elb.nlb.processed_bytes Preprocessing JSON Path: `$.[?(@.Label == "ProcessedBytes")].Values.first().first()` ⛔️Custom on fail: Discard value
Processed Bytes TCP	The total number of bytes processed by TCP listeners.	Dependent item	aws.elb.nlb.processedbytestcp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Processed Bytes TLS	The total number of bytes processed by TLS listeners.	Dependent item	aws.elb.nlb.processedbytestls Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Processed Bytes UDP	The total number of bytes processed by UDP listeners.	Dependent item	aws.elb.nlb.processedbytesudp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Processed Packets	The total number of packets processed by the load balancer. This count includes traffic to and from targets, including health check traffic.	Dependent item	aws.elb.nlb.processed_packets Preprocessing JSON Path: `$.[?(@.Label == "ProcessedPackets")].Values.first().first()` ⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Inbound ICMP	The number of new ICMP messages rejected by the inbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sgblockedinbound_icmp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Inbound TCP	The number of new TCP flows rejected by the inbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sgblockedinbound_tcp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Inbound UDP	The number of new UDP flows rejected by the inbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sgblockedinbound_udp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Outbound ICMP	The number of new ICMP messages rejected by the outbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sgblockedoutbound_icmp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Outbound TCP	The number of new TCP flows rejected by the outbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sgblockedoutbound_tcp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Outbound UDP	The number of new UDP flows rejected by the outbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sgblockedoutbound_udp Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Target TLS Negotiation Error Count	The total number of TLS handshakes that failed during negotiation between a TLS listener and a target.	Dependent item	aws.elb.nlb.targettlsnegotiationerrorcount Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
TCP Client Reset Count	The total number of reset (RST) packets sent from a client to a target. These resets are generated by the client and forwarded by the load balancer.	Dependent item	aws.elb.nlb.tcpclientreset_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
TCP ELB Reset Count	The total number of reset (RST) packets generated by the load balancer. For more information, see: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#elb-reset-count-metric	Dependent item	aws.elb.nlb.tcpelbreset_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
TCP Target Reset Count	The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer.	Dependent item	aws.elb.nlb.tcptargetreset_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Unhealthy Routing Flow Count	The number of flows (or connections) that are routed using the routing failover action (fail open).	Dependent item	aws.elb.nlb.unhealthyroutingflow_count Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB NLB: Failed to get metrics data	Failed to get CloudWatch metrics for Network Load Balancer.	`length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.metrics.check))>0`\|Warning
AWS ELB NLB: Failed to get alarms data	Failed to get CloudWatch alarms for Network Load Balancer.	`length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarms.check))>0`\|Warning

LLD rule Load Balancer alarm discovery

Name Description Type Key and additional info

Load Balancer alarm discovery

Used for the discovery of alarm balancers.

Dependent item

aws.elb.nlb.alarms.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Load Balancer alarm discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get metrics about the alarm state and its reason.

Dependent item

aws.elb.nlb.alarm.getmetrics["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()
⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state reason in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.nlb.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.StateReason
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The value of the alarm state. Possible values:

0 - OK;

1 - INSUFFICIENT_DATA;

2 - ALARM.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.nlb.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.StateValue
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Load Balancer alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB NLB: [{#ALARM_NAME}] has 'Alarm' state	The alarm `{#ALARM_NAME}` is in the ALARM state. Reason: `{ITEM.LASTVALUE2}`	`last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS ELB NLB: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state["{#ALARM_NAME}"])=1`\|Info

LLD rule Target groups discovery

Name Description Type Key and additional info

Target groups discovery

Used for the discovery of {$AWS.ELB.TARGET.GROUP.NAME} target groups.

Dependent item

aws.elb.nlb.target_groups.discovery

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Target groups discovery

Name Description Type Key and additional info

[{#AWS.ELB.TARGET.GROUP.NAME}]: Get metrics

Get the metrics of the ELB target group {#AWS.ELB.TARGET.GROUP.NAME}.

Full list of metrics related to AWS ELB here: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html#user-authentication-metric-table

Script

aws.elb.nlb.targetgroups.getmetrics["{#AWS.ELB.TARGET.GROUP.NAME}"]

Preprocessing

Check for not supported value: any error
⛔️Custom on fail: Discard value

[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy Host Count

The number of targets that are considered healthy.

Dependent item

aws.elb.nlb.targetgroups.healthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"]

Preprocessing

JSON Path: $.[?(@.Label == "HealthyHostCount")].Values.first().first()
⛔️Custom on fail: Discard value

[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Host Count

The number of targets that are considered unhealthy.

Dependent item

aws.elb.nlb.targetgroups.unhealthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Trigger prototypes for Target groups discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB NLB: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have become unhealthy	This trigger helps in identifying when your targets have become unhealthy.	`last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.target_groups.healthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]) = 0`\|Average
AWS ELB NLB: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have unhealthy host	This trigger allows you to become aware when there are no more registered targets.	`last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.target_groups.unhealthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]) > {$AWS.ELB.UNHEALTHY.HOST.MAX}`\|Warning	Depends on: AWS ELB NLB: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have become unhealthy

AWS Lambda by HTTP

Overview

This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS Lambda by HTTP

Configuration

Setup

The template gets AWS Lambda metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the Lambda permissions page on the AWS website.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS Lambda metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.LAMBDA.ARN}.

For more information about managing access keys, see the official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	AWS Lambda function region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.LAMBDA.ARN}	The Amazon Resource Names (ARN) of the Lambda function.
{$AWS.LAMBDA.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.LAMBDA.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.LAMBDA.LLD.FILTER.ALARMNAME.NOTMATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get Lambda function metrics. Full metrics list related to the Lambda function: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html	Script	aws.lambda.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get Lambda alarms data	`DescribeAlarms` API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.lambda.get_alarms Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get metrics check	Check that the Lambda function metrics data has been received correctly.	Dependent item	aws.lambda.metrics.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check that the alarm data has been received correctly.	Dependent item	aws.lambda.alarms.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Async events received sum	The number of events that Lambda successfully queues for processing. This metric provides insight into the number of events that a Lambda function receives.	Dependent item	aws.lambda.asynceventsreceived.sum Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Async event age average	The time between when Lambda successfully queues the event and when the function is invoked. The value of this metric increases when events are being retried due to invocation failures or throttling.	Dependent item	aws.lambda.asynceventage.avg Preprocessing JSON Path: `$.[?(@.Label == "AsyncEventAge")].Values.first().first()` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Async events dropped sum	The number of events that are dropped without successfully executing the function. If you configure a dead-letter queue (DLQ) or an `OnFailure` destination, events are sent there before they're dropped.	Dependent item	aws.lambda.asynceventsdropped.sum Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Total concurrent executions	The number of function instances that are processing events. If this number reaches your concurrent executions quota for the Region or the reserved concurrency limit on the function, then Lambda will throttle additional invocation requests.	Dependent item	aws.lambda.concurrent_executions.max Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Unreserved concurrent executions maximum	For a Region, the number of events that function without reserved concurrency are processing.	Dependent item	aws.lambda.unreservedconcurrentexecutions.max Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Invocations sum	The number of times that your function code is invoked, including successful invocations and invocations that result in a function error. Invocations aren't recorded if the invocation request is throttled or otherwise results in an invocation error. The value of `Invocations` equals the number of requests billed.	Dependent item	aws.lambda.invocations.sum Preprocessing JSON Path: `$.[?(@.Label == "Invocations")].Values.first().first()` ⛔️Custom on fail: Discard value
Errors sum	The number of invocations that result in a function error. Function errors include exceptions that your code throws and exceptions that the Lambda runtime throws. The runtime returns errors for issues such as timeouts and configuration errors.	Dependent item	aws.lambda.errors.sum Preprocessing JSON Path: `$.[?(@.Label == "Errors")].Values.first().first()` ⛔️Custom on fail: Discard value
Dead letter errors sum	For asynchronous invocation, the number of times that Lambda attempts to send an event to a dead-letter queue (DLQ) but fails. Dead-letter errors can occur due to misconfigured resources or size limits.	Dependent item	aws.lambda.deadlettererrors.sum Preprocessing JSON Path: `$.[?(@.Label == "DeadLetterErrors")].Values.first().first()` ⛔️Custom on fail: Discard value
Throttles sum	The number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a `TooManyRequestsException` error.	Dependent item	aws.lambda.throttles.sum Preprocessing JSON Path: `$.[?(@.Label == "Throttles")].Values.first().first()` ⛔️Custom on fail: Discard value
Duration average	The amount of time that your function code spends processing an event. The billed duration for an invocation is the value of `Duration` rounded up to the nearest millisecond. Duration does not include cold start time.	Dependent item	aws.lambda.duration.avg Preprocessing JSON Path: `$.[?(@.Label == "Duration")].Values.first().first()` ⛔️Custom on fail: Discard value Custom multiplier: `0.001`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS Lambda: Failed to get metrics data	Failed to get CloudWatch metrics for the Lambda function.	`length(last(/AWS Lambda by HTTP/aws.lambda.metrics.check))>0`\|Warning
AWS Lambda: Failed to get alarms data	Failed to get CloudWatch alarms for the Lambda function.	`length(last(/AWS Lambda by HTTP/aws.lambda.alarms.check))>0`\|Warning

LLD rule Lambda alarm discovery

Name Description Type Key and additional info

Lambda alarm discovery

Used for the discovery of alarm Lambda functions.

Dependent item

aws.lambda.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Lambda alarm discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get metrics about the alarm state and its reason.

Dependent item

aws.lambda.alarm.getmetrics["{#ALARMNAME}"]

Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()
⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state reason in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.lambda.alarm.statereason["{#ALARMNAME}"]

Preprocessing

JSON Path: $.StateReason
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The value of the alarm state. Possible values:

0 - OK;

1 - INSUFFICIENT_DATA;

2 - ALARM.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.lambda.alarm.state["{#ALARM_NAME}"]

Preprocessing

JSON Path: $.StateValue
⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Lambda alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS Lambda: [{#ALARM_NAME}] has 'Alarm' state	The alarm `{#ALARM_NAME}` is in the ALARM state. Reason: `{ITEM.LASTVALUE2}`	`last(/AWS Lambda by HTTP/aws.lambda.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS Lambda by HTTP/aws.lambda.alarm.state_reason["{#ALARM_NAME}"]))>0`\|Average
AWS Lambda: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS Lambda by HTTP/aws.lambda.alarm.state["{#ALARM_NAME}"])=1`\|Info

AWS Cost Explorer by HTTP

Overview

The template to monitor AWS Cost Explorer by HTTP via Zabbix, which works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the Cost Explorer API calls to list and retrieve metrics. For more information, please refer to the Cost Explorer pricing page.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

AWS by HTTP

Configuration

Setup

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

IAM policies for AWS Cost Management

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect metrics.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ce:GetDimensionValues",
                "ce:GetCostAndUsage"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ce:GetDimensionValues",
                "ce:GetCostAndUsage"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, add the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ce:GetDimensionValues",
                "ce:GetCostAndUsage",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}. Possible values: access_key, assume_role, role_base.

For more information about managing access keys, see the official documentation.

Also, see the Macros section for a list of macros used in LLD filters.

Additional information about metrics and used API methods:

Describe AWS Cost Explore API actions

Macros used

Name	Description	Default
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty, then no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.BILLING.REGION}	Amazon Billing region code.	`us-east-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.BILLING.MONTH}	Months to get historical data from AWS Cost Explore API, no more than 12 months.	`11`
{$AWS.BILLING.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable discovered billing service by name.	`.*`
{$AWS.BILLING.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered billing service by name.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Get monthly costs

Get raw data on the monthly costs by service.

Script

aws.get.monthly.costs

Preprocessing

Check for not supported value: any error
⛔️Custom on fail: Discard value

Get daily costs

Get raw data on the daily costs by service.

Script

aws.get.daily.costs

Preprocessing

Check for not supported value: any error
⛔️Custom on fail: Discard value

LLD rule AWS daily costs by services discovery

Name Description Type Key and additional info

AWS daily costs by services discovery

Discovery of daily blended costs by services.

Dependent item

aws.daily.services.costs.discovery

Preprocessing

JSON Path: $..Groups.first()

Item prototypes for AWS daily costs by services discovery

Name Description Type Key and additional info

Service [{#AWS.BILLING.SERVICE.NAME}]: Blended daily cost

The daily blended cost of the {#AWS.BILLING.SERVICE.NAME} service for the previous day.

Dependent item

aws.daily.service.cost["{#AWS.BILLING.SERVICE.NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule AWS monthly costs by services discovery

Name Description Type Key and additional info

AWS monthly costs by services discovery

Discovery of monthly costs by services.

Dependent item

aws.cost.service.monthly.discovery

Preprocessing

JSON Path: $.monthly_service_costs

Item prototypes for AWS monthly costs by services discovery

Name Description Type Key and additional info

[{#AWS.BILLING.SERVICE.NAME}]: Month [{#AWS.BILLING.MONTH}] Blended cost

The monthly cost by service {#AWS.BILLING.SERVICE.NAME}.

Dependent item

aws.monthly.service.cost["{#AWS.BILLING.SERVICE.NAME}", "{#AWS.BILLING.MONTH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

LLD rule AWS monthly costs discovery

Name Description Type Key and additional info

AWS monthly costs discovery

Discovery of monthly costs.

Dependent item

aws.monthly.cost.discovery

Preprocessing

JSON Path: $.monthly_costs

Item prototypes for AWS monthly costs discovery

Name Description Type Key and additional info

[{#AWS.BILLING.MONTH}]: Blended cost per month

The blended cost by month {#AWS.BILLING.MONTH}.

Dependent item

aws.monthly.cost["{#AWS.BILLING.MONTH}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums