This template is designed as a master template that discovers various Oracle Cloud Infrastructure (OCI) services and resources, such as:
OCI Compute;
OCI Autonomous Database (serverless);
OCI Object Storage;
OCI Virtual Cloud Networks (VCNs);
OCI Block Volumes;
OCI Boot Volumes.
For communication with OCI, this template utilizes script items which execute HTTP GET
and POST
requests.
POST
requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses an
HTTP request body for queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
For this template to work, it needs authentication details to use in requests. To acquire this information, see the following steps:
Log into your administrator account in Oracle Cloud Console.
Create a new user that will be used by Zabbix for monitoring.
Create a new security policy and assign a previously created user to it.
This policy will contain a set of rules that will give monitoring user access to specific resources in your OCI. Make sure to add the following rules to the policy:
Allow group 'zabbix_api' to read metrics in tenancy
Allow group 'zabbix_api' to read instances in tenancy
Allow group 'zabbix_api' to read subnets in tenancy
Allow group 'zabbix_api' to read vcns in tenancy
Allow group 'zabbix_api' to read vnic-attachments in tenancy
Allow group 'zabbix_api' to read volumes in tenancy
Allow group 'zabbix_api' to read objectstorage-namespaces in tenancy
Allow group 'zabbix_api' to read buckets in tenancy
Allow group 'zabbix_api' to read autonomous-databases in tenancy
In this example, zabbix_api
is the name of the previously created monitoring user. Rename it to your
monitoring user's name.
Generate an API key pair for your monitoring user - open your monitoring user profile and on the left side,
press API keys
and then, Add API key
(if generating a new key pair, do not forget to save the private key).
After this, Oracle Cloud Console will provide additional information that is required for access, such as:
Tenancy OCID;
User OCID;
Fingerprint;
Region.
Save this information somewhere or keep this window open. This information will be required in later steps.
In Zabbix, create a new host and assign this template to it (Oracle Cloud by HTTP).
Open the Macros
section of the host you created and set the following user macro values according to the
OCI configuration file (from step #6):
{$OCI.API.TENANCY}
- set the tenancy OCID value;
{$OCI.API.USER}
- set the user OCID value;
{$OCI.API.FINGERPRINT}
- set the fingerprint value;
{$OCI.API.PRIVATE.KEY}
- copy and paste the contents of private key file here.
After the authentication credentials are entered, you need to identify the OCI API endpoints that match your region (as provided by Oracle Cloud Console in step #6). To do so, you can use the OCI API Reference and Endpoints list, where each API service has a dedicated page with the respective API endpoints.
The required API service endpoints are:
When the API endpoints are identified, you need to set them in Zabbix as user macros to the host that the template is attached to (similarly to step #8):
{$OCI.API.CORE.HOST}
- Core Services API endpoint, for example, iaas.eu-stockholm-1.oraclecloud.com
;
{$OCI.API.AUTONOMOUS.DB.HOST}
- Database Service API endpoint, for example, database.eu-stockholm-1.oraclecloud.com
;
{$OCI.API.OBJECT.STORAGE.HOST}
- Object Storage Service API endpoint, for example, objectstorage.eu-stockholm-1.oraclecloud.com
;
{$OCI.API.TELEMETRY.HOST}
- Monitoring API endpoint, for example, telemetry.eu-stockholm-1.oraclecloud.com
;
IMPORTANT! API Endpoint URLs need to be entered without the HTTP scheme (
https://
).
Once you've completed adding the host to Zabbix, and it will automatically discover services and monitor them.
Every LLD rule has pre-added filtering options to avoid discovering unwanted resources, such as terminated OCI
compute instances. Most of these filters use specific service item names and states, and values of these filters
are defined by the user macros {$....MATCHES}
and {$....NOT_MATCHES}
.
To add additional filtering options, every discovery script (except VCN discovery), gathers free-form tag data about a specific resource. Since free-form tags are completely custom and format or usage will vary between users, free-from tag filters are not included under LLD filters by default, but can be easily added as they are already being collected by scripts.
In Oracle Cloud Console, add a free-form tag to a resource, for example, a compute instance.
The tag key will be location_group
and the tag value will be eu-north-1
.
Open the Oracle Cloud by HTTP template in Zabbix and go to "Discovery rules". Find "Compute instances discovery" and open it.
Under "LLD macros", add a new macro that will represent this location group tag, for example:
{#LOCATION_GROUP}
$.tags.location_group
.
Under the "Filters" tab, there will already be filters regarding the compute instance name and state.
Click "Add" to add a new filter and define the previously created LLD macro and add a matching pattern and
value, for example, {#LOCATION_GROUP}
matches
eu-north-*
.
The next time Compute instances discovery
is executed, it will only discover OCI compute instances that
have the free-form tag location_group
that matches the regex of eu-north-*
. You can also experiment with
the LLD filter pattern matching value to receive different matching results for a specified value.
If needed, you can specify an HTTP proxy for the template by changing the value of the {$OCI.HTTP.PROXY}
user
macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value.
In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
.
LLD filter values and trigger threshold values can be changed with the respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.API.CORE.HOST} | Host for OCI Core Services API endpoint. |
|
{$OCI.API.TELEMETRY.HOST} | Host for OCI Monitoring API endpoint. |
|
{$OCI.API.OBJECT.STORAGE.HOST} | Host for OCI Object Storage API endpoint. |
|
{$OCI.API.AUTONOMOUS.DB.HOST} | Host for OCI Autonomous Database API endpoint. |
|
{$OCI.API.COMPARTMENT.COMPUTE} | Compartment OCIDs for compute instances. Can be a single value or a comma separated list of values. |
|
{$OCI.API.COMPARTMENT.VCN} | Compartment OCIDs for virtual cloud networks. Can be a single value or a comma separated list of values. |
|
{$OCI.API.COMPARTMENT.VOLUME.BLOCK} | Compartment OCIDs for block volumes. Can be a single value or a comma separated list of values. |
|
{$OCI.API.COMPARTMENT.VOLUME.BOOT} | Compartment OCIDs for boot volumes. Can be a single value or a comma separated list of values. |
|
{$OCI.API.COMPARTMENT.OBJECT.STORAGE} | Compartment OCIDs for object storage buckets. Can be a single value or a comma separated list of values. |
|
{$OCI.API.COMPARTMENT.AUTONOMOUS.DB} | Compartment OCIDs for autonomous databases. Can be a single value or a comma separated list of values. |
|
{$OCI.API.TENANCY} | OCID of tenancy. |
|
{$OCI.API.USER} | OCID of user. |
|
{$OCI.API.PRIVATE.KEY} | Entire private key for API access. |
|
{$OCI.API.FINGERPRINT} | Fingerprint of private key. |
|
{$OCI.COMPUTE.DISCOVERY.STATE.MATCHES} | Sets the regex string of compute instance states to allow in discovery. |
.* |
{$OCI.COMPUTE.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of compute instance states to ignore in discovery. |
TERMINATED |
{$OCI.COMPUTE.DISCOVERY.NAME.MATCHES} | Sets the regex string of compute instance names to allow in discovery. |
.* |
{$OCI.COMPUTE.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of compute instance names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VCN.DISCOVERY.STATE.MATCHES} | Sets the regex string of virtual cloud network states to allow in discovery. |
.* |
{$OCI.VCN.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of virtual cloud network states to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VCN.DISCOVERY.NAME.MATCHES} | Sets the regex string of virtual cloud network names to allow in discovery. |
.* |
{$OCI.VCN.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of virtual cloud network names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VOLUME.BLOCK.DISCOVERY.STATE.MATCHES} | Sets the regex string of block volume states to allow in discovery. |
.* |
{$OCI.VOLUME.BLOCK.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of block volume states to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VOLUME.BLOCK.DISCOVERY.NAME.MATCHES} | Sets the regex string of block volume names to allow in discovery. |
.* |
{$OCI.VOLUME.BLOCK.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of block volume names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VOLUME.BOOT.DISCOVERY.STATE.MATCHES} | Sets the regex string of boot volume states to allow in discovery. |
.* |
{$OCI.VOLUME.BOOT.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of boot volume states to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VOLUME.BOOT.DISCOVERY.NAME.MATCHES} | Sets the regex string of boot volume names to allow in discovery. |
.* |
{$OCI.VOLUME.BOOT.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of boot volume names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.OBJECT.STORAGE.DISCOVERY.NAME.MATCHES} | Set an HTTP proxy for OCI API requests if needed. |
.* |
{$OCI.OBJECT.STORAGE.DISCOVERY.NAME.NOT_MATCHES} | Set an HTTP proxy for OCI API requests if needed. |
CHANGE_IF_NEEDED |
{$OCI.AUTONOMOUS.DB.DISCOVERY.STATE.MATCHES} | Sets the regex string of autonomous database states to allow in discovery. |
.* |
{$OCI.AUTONOMOUS.DB.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of autonomous database states to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.AUTONOMOUS.DB.DISCOVERY.NAME.MATCHES} | Sets the regex string of autonomous database names to allow in discovery. |
.* |
{$OCI.AUTONOMOUS.DB.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of autonomous database names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Compute instances discovery | Discover compute instances. |
Script | oci.compute.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Virtual cloud networks discovery | Discover virtual cloud networks (VCNs). |
Script | oci.vcn.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Block volumes discovery | Discover block volumes. |
Script | oci.block.volumes.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Boot volumes discovery | Discover boot volumes. |
Script | oci.boot.volumes.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Object storage discovery | Discover object storage. |
Script | oci.object.storage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Autonomous database discovery | Discover autonomous databases. |
Script | oci.object.autonomous.db.discovery |
This template monitors Oracle Cloud Infrastructure (OCI) single compute instance resources and discovers attached virtual network interface cards (VNICs) and monitors their resources.
This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.
For communication with OCI, this template utilizes script items which execute HTTP GET
and POST
requests.
POST
requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses
the HTTP request body for queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Oracle Cloud by HTTP
template
will discover OCI compute instances automatically, create host prototypes for each discovered instance,
and apply it to this template.
If needed, you can specify an HTTP proxy for the template to use by changing the value of the
{$OCI.HTTP.PROXY}
user macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case,
please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
.
LLD filter values and trigger threshold values can be changed with the respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
{$OCI.COMPUTE.VNIC.DISCOVERY.STATE.MATCHES} | Sets the regex string of VNIC states to allow in discovery. |
.* |
{$OCI.COMPUTE.VNIC.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of VNIC states to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.COMPUTE.VNIC.DISCOVERY.NAME.MATCHES} | Sets the regex string of VNIC names to allow in discovery. |
.* |
{$OCI.COMPUTE.VNIC.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of VNIC names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.COMPUTE.CPU.UTIL.WARN} | Sets the percentage threshold for creating a "warning" severity event about CPU resource utilization. |
75 |
{$OCI.COMPUTE.CPU.UTIL.HIGH} | Sets the percentage threshold for creating a "high" severity event about CPU resource utilization. |
90 |
{$OCI.COMPUTE.MEM.UTIL.WARN} | Sets the percentage threshold for creating a "warning" severity event about memory resource utilization. |
75 |
{$OCI.COMPUTE.MEM.UTIL.HIGH} | Sets the percentage threshold for creating a "high" severity event about memory resource utilization. |
90 |
{$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.WARN} | Sets the percentage threshold for creating a "warning" severity event about VNIC connection tracking table utilization. |
75 |
{$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.HIGH} | Sets the percentage threshold for creating a "high" severity event about VNIC connection tracking table utilization. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
OCI Compute: Get instance availability | The accessibility status of a virtual machine instance. A value of "1" indicates that the instance is unresponsive due to an issue with the infrastructure or the instance itself. A value of "0" indicates that an accessibility issue has not been detected. If the instance is stopped, then the metric does not have a value. |
Script | oci.compute.availability.get |
OCI Compute: State | The current state of the instance. |
Script | oci.compute.state.get Preprocessing
|
OCI Compute: Get VNICs | Gets information about all virtual network interface cards attached to the instance. |
Script | oci.compute.vnic.get |
OCI Compute: Get compute metrics | Gets compute instance metrics. |
Script | oci.compute.metrics.get |
OCI Compute: CPU utilization, in % | Activity level from the CPU. Expressed as a percentage of the total time. |
Dependent item | oci.compute.cpu.util Preprocessing
|
OCI Compute: Memory utilization, in % | Space currently in use, measured in pages. Expressed as a percentage of used pages. |
Dependent item | oci.compute.mem.util Preprocessing
|
OCI Compute: Memory allocation stalls | Number of times page reclaim was called directly. |
Dependent item | oci.compute.mem.stalls Preprocessing
|
OCI Compute: Load average | Average system load calculated over a 1-minute period. Expressed as a number of processes. |
Dependent item | oci.compute.load.avg Preprocessing
|
OCI Compute: Disk bytes read | Read throughput. Expressed as bytes read per interval. |
Dependent item | oci.compute.disk.read Preprocessing
|
OCI Compute: Disk bytes written | Write throughput. Expressed as bytes written per interval. |
Dependent item | oci.compute.disk.written Preprocessing
|
OCI Compute: Disk read I/O | Activity level from I/O reads. Expressed as reads per interval. |
Dependent item | oci.compute.disk.io.read Preprocessing
|
OCI Compute: Disk write I/O | Activity level from I/O writes. Expressed as writes per interval. |
Dependent item | oci.compute.disk.io.write Preprocessing
|
OCI Compute: Network bytes in | Network bytes in for the compute instance. |
Dependent item | oci.compute.network.in Preprocessing
|
OCI Compute: Network bytes out | Network bytes out for the compute instance. |
Dependent item | oci.compute.network.out Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OCI Compute: Compute instance is not available | Current instance availability. |
last(/Oracle Cloud Compute by HTTP/oci.compute.availability.get) = 1 |High |
||
OCI Compute: State has changed | Compute instance state has changed. |
last(/Oracle Cloud Compute by HTTP/oci.compute.state.get,#1)<>last(/Oracle Cloud Compute by HTTP/oci.compute.state.get,#2) |Info |
Manual close: Yes | |
OCI Compute: Current CPU utilization is too high | Current CPU utilization has exceeded |
min(/Oracle Cloud Compute by HTTP/oci.compute.cpu.util,5m) >= {$OCI.COMPUTE.CPU.UTIL.HIGH} |High |
||
OCI Compute: Current CPU utilization is high | Current CPU utilization has exceeded |
min(/Oracle Cloud Compute by HTTP/oci.compute.cpu.util,5m) >= {$OCI.COMPUTE.CPU.UTIL.WARN} |Warning |
Depends on:
|
|
OCI Compute: Current memory utilization is too high | Current memory utilization has exceeded |
min(/Oracle Cloud Compute by HTTP/oci.compute.mem.util,5m) >= {$OCI.COMPUTE.MEM.UTIL.HIGH} |High |
||
OCI Compute: Current memory utilization is high | Current memory utilization has exceeded |
min(/Oracle Cloud Compute by HTTP/oci.compute.mem.util,5m) >= {$OCI.COMPUTE.MEM.UTIL.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VNIC discovery | Discover compute instance VNICs. |
Dependent item | oci.compute.vnic.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VNIC [{#NAME}]: Attachment state | Current attachment state of the VNIC. |
Dependent item | oci.compute.vnic.attachment[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Get metrics | Gets virtual network interface card metrics. |
Script | oci.compute.vnic.metrics.get[{#ID}] |
VNIC [{#NAME}]: Egress packets dropped by security list | Packets sent by the VNIC, destined for the network, dropped due to security rule violations. |
Dependent item | oci.compute.vnic.egress.packets.dropped[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Ingress packets dropped by security list | Packets received from the network, destined for the VNIC, dropped due to security rule violations. |
Dependent item | oci.compute.vnic.ingress.packets.dropped[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Bytes from network | Bytes received at the VNIC from the network, after drops. |
Dependent item | oci.compute.vnic.net.bytes.ingr[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Bytes to network | Bytes sent from the VNIC to the network, before drops. |
Dependent item | oci.compute.vnic.net.bytes.egr[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Packets from network | Packets received at the VNIC from the network, after drops. |
Dependent item | oci.compute.vnic.net.packets.ingr[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Packets to network | Packets sent from the VNIC to the network, before drops. |
Dependent item | oci.compute.vnic.net.packets.egr[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Throttled ingress packets | Packets received from the network, destined for the VNIC, dropped due to throttling. |
Dependent item | oci.compute.vnic.net.packets.ingr.throttled[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Throttled egress packets | Packets sent from the VNIC, destined for the network, dropped due to throttling. |
Dependent item | oci.compute.vnic.net.packets.egr.throttled[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Ingress packets dropped by full connection tracking table | Packets received from the network, destined for the VNIC, dropped due to the full connection tracking table. |
Dependent item | oci.compute.vnic.net.packets.ingr.drop[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Egress packets dropped by full connection tracking table | Packets sent from the VNIC, destined for the network, dropped due to the full connection tracking table. |
Dependent item | oci.compute.vnic.net.packets.egr.drop[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Connection tracking table utilization, in % | Total utilization percentage (0-100%) of the connection tracking table. |
Dependent item | oci.compute.vnic.net.conntrack.util[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Connection tracking table full | Boolean (0/false, 1/true) that indicates the connection tracking table is full. |
Dependent item | oci.compute.vnic.net.conntrack.full[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Smartnic buffer drops from network | Number of packets dropped in SmartNIC from the network due to buffer exhaustion. This metric is available only for Bare Metal Instances. For virtual machines, these metric values are zero. |
Dependent item | oci.compute.vnic.net.smartnic.drops[{#ID}] Preprocessing
|
VNIC [{#NAME}]: Smartnic buffer drops from host | Number of packets dropped in SmartNIC from the host due to buffer exhaustion. This metric is available only for Bare Metal Instances. For virtual machines, these metric values are zero. |
Dependent item | oci.compute.vnic.host.smartnic.drops[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VNIC [{#NAME}]: VNIC is not attached | Virtual network interface card attachment status. |
min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.attachment[{#ID}],5m) >= 3 |High |
||
VNIC [{#NAME}]: Current conntrack table utilization is too high | Current conntrack table utilization has exceeded |
min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.net.conntrack.util[{#ID}],5m) >= {$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.HIGH} |High |
||
VNIC [{#NAME}]: Current conntrack table utilization is high | Current conntrack table utilization has exceeded |
min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.net.conntrack.util[{#ID}],5m) >= {$OCI.COMPUTE.VNIC.CONNTRACK.UTIL.WARN} |Warning |
Depends on:
|
|
VNIC [{#NAME}]: Conntrack table full | Virtual network interface card connection tracking table is full. |
min(/Oracle Cloud Compute by HTTP/oci.compute.vnic.net.conntrack.full[{#ID}],5m) = 1 |High |
This template monitors Oracle Cloud Infrastructure (OCI) object storage resources.
This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.
For communication with OCI, this template utilizes script items which execute HTTP GET
and POST
requests.
POST
requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses
HTTP request body for queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Oracle Cloud by HTTP
template will
discover OCI object storage buckets automatically, create host prototypes for each discovered bucket, and apply
it this template.
If needed, you can specify an HTTP proxy for the template to use by changing the value of the
{$OCI.HTTP.PROXY}
user macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value.
In that case, please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
.
LLD filter values and trigger threshold values can be changed with the respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
Name | Description | Type | Key and additional info |
---|---|---|---|
OCI Object Storage: Get frequent metrics | Gets all metrics related to a specific bucket that have frequent update time (100 milliseconds). |
Script | oci.obj.storage.metrics.frequent.get |
OCI Object Storage: All requests count | The total number of all HTTP requests made in a bucket. |
Dependent item | oci.obj.storage.requests Preprocessing
|
OCI Object Storage: Client-side error count | The total number of 4xx errors for requests made in a bucket. |
Dependent item | oci.obj.storage.client.errors Preprocessing
|
OCI Object Storage: First byte latency time | The per-request time measured from the time Object Storage receives the complete request to when Object Storage returns the first byte of the response. |
Dependent item | oci.obj.storage.latency.byte Preprocessing
|
OCI Object Storage: Post object request count | The total number of HTTP |
Dependent item | oci.obj.storage.requests.post Preprocessing
|
OCI Object Storage: Put object request count | The total number of |
Dependent item | oci.obj.storage.requests.put Preprocessing
|
OCI Object Storage: Overall latency time | The per-request time from the first byte received by Object Storage to the last byte sent from Object Storage. |
Dependent item | oci.obj.storage.latency.overall Preprocessing
|
OCI Object Storage: Get hourly metrics | Gets all metrics related to specific bucket that have update time of 1 hour. |
Script | oci.obj.storage.metrics.hourly.get |
OCI Object Storage: Number of objects | The count of objects in the bucket, excluding any multipart upload parts that have not been discarded (aborted) or committed. |
Dependent item | oci.obj.storage.objects Preprocessing
|
OCI Object Storage: Bucket size | The size of the bucket, excluding any multipart upload parts that have not been discarded (aborted) or committed. |
Dependent item | oci.obj.storage.size Preprocessing
|
OCI Object Storage: Incomplete multipart upload size | The size of any multipart upload parts that have not been discarded (aborted) or committed. |
Dependent item | oci.obj.storage.size.incomplete Preprocessing
|
OCI Object Storage: Get enabled object lifecycle management | Indicates whether a bucket has any executable Object Lifecycle Management policies configured. 1 - if policies are configured 0 - if no policies are configured |
Script | oci.obj.storage.metrics.olm.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OCI Object Storage: Object lifecycle management policy has changed | The object lifecycle management policy configuration has changed. |
last(/Oracle Cloud Object Storage by HTTP/oci.obj.storage.metrics.olm.get,#1)<>last(/Oracle Cloud Object Storage by HTTP/oci.obj.storage.metrics.olm.get,#2) and length(last(/Oracle Cloud Object Storage by HTTP/oci.obj.storage.metrics.olm.get))>0 |Info |
This template monitors Oracle Cloud Infrastructure (OCI) autonomous database (serverless) resources.
This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.
For communication with OCI, this template utilizes script items which execute HTTP GET
and POST
requests.
POST
requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses
the HTTP request body for queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Oracle Cloud by HTTP
template will
discover OCI autonomous databases automatically, create host prototypes for each discovered database, and apply
it to this template.
If needed, you can specify an HTTP proxy for the template to use by changing the value of the
{$OCI.HTTP.PROXY}
user macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case,
please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
.
The LLD filter values and trigger threshold values can be changed with the respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
{$OCI.AUTONOMOUS.DB.CPU.UTIL.WARN} | Sets the percentage threshold for creating a "warning" severity event about CPU resource utilization. |
75 |
{$OCI.AUTONOMOUS.DB.CPU.UTIL.HIGH} | Sets the percentage threshold for creating a "high" severity event about CPU resource utilization. |
90 |
{$OCI.AUTONOMOUS.DB.STORAGE.UTIL.WARN} | Sets the percentage threshold for creating a "warning" severity event about storage resource utilization. |
75 |
{$OCI.AUTONOMOUS.DB.STORAGE.UTIL.HIGH} | Sets the percentage threshold for creating a "high" severity event about storage resource utilization. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
OCI Autonomous DB: State | Gets the autonomous database state. |
Script | oci.aut.db.state Preprocessing
|
OCI Autonomous DB: Get frequent metrics | Gets all metrics related to the database that have a collection frequency of 1 minute. |
Script | oci.aut.db.metrics.frequent.get |
OCI Autonomous DB: CPU time | Average rate of accumulation of CPU time by foreground sessions in the database over the selected time interval. |
Dependent item | oci.aut.db.cpu.time Preprocessing
|
OCI Autonomous DB: CPU utilization, in % | The CPU usage expressed as a percentage, aggregated across all consumer groups. The utilization percentage is reported with respect to the number of CPUs the database is allowed to use. |
Dependent item | oci.aut.db.cpu.util Preprocessing
|
OCI Autonomous DB: Current logons | The number of successful logons during the selected time interval. |
Dependent item | oci.aut.db.logons Preprocessing
|
OCI Autonomous DB: DB block changes | The number of changes that were part of an update or delete operation that were made to all blocks in the SGA. Such changes generate redo log entries and thus become permanent changes to the database if the transaction is committed. This statistic approximates total database work and indicates the rate at which buffers are being dirtied during the selected time interval. |
Dependent item | oci.aut.db.block.changes Preprocessing
|
OCI Autonomous DB: DB time | The amount of time database user sessions spend executing database code (CPU time + wait time). Database time is used to infer database call latency as it increases in direct proportion to both database call latency (response time) and call volume. It is calculated as the average rate of accumulation of database time by foreground sessions in the database over the selected time interval. |
Dependent item | oci.aut.db.time Preprocessing
|
OCI Autonomous DB: Execute count | The number of user and recursive calls that executed SQL statements during the selected time interval. |
Dependent item | oci.aut.db.exec.count Preprocessing
|
OCI Autonomous DB: Failed connections | The number of failed database connections. |
Dependent item | oci.aut.db.conn.failed Preprocessing
|
OCI Autonomous DB: Failed logons | The number of logons that failed because of an invalid user name and/or password during the selected time interval. |
Dependent item | oci.aut.db.logons.failed Preprocessing
|
OCI Autonomous DB: Parse count (hard) | The number of parse calls (real parses) during the selected time interval. A hard parse is an expensive operation in terms of memory use as it requires Oracle to allocate a workheap and other memory structures and then build a parse tree. |
Dependent item | oci.aut.db.parse.count.hard Preprocessing
|
OCI Autonomous DB: Session logical reads | The sum of |
Dependent item | oci.aut.db.logical.reads.session Preprocessing
|
OCI Autonomous DB: Parse count (total) | The number of hard and soft parses during the selected time interval. |
Dependent item | oci.aut.db.parse.count.total Preprocessing
|
OCI Autonomous DB: Parse count (failures) | The number of parse failures during the selected time interval. |
Dependent item | oci.aut.db.parse.count.failed Preprocessing
|
OCI Autonomous DB: Physical reads | The number of data blocks read from disk during the selected time interval. |
Dependent item | oci.aut.db.physical.reads Preprocessing
|
OCI Autonomous DB: Physical read total bytes | The size in bytes of disk reads by all database instance activity including application reads, backup and recovery, and other utilities during the selected time interval. |
Dependent item | oci.aut.db.physical.read.bytes Preprocessing
|
OCI Autonomous DB: Physical writes | The number of data blocks written to disk during the selected time interval. |
Dependent item | oci.aut.db.physical.writes Preprocessing
|
OCI Autonomous DB: Physical write total bytes | The size in bytes of all disk writes for the database instance including application activity, backup and recovery, and other utilities during the selected time interval. |
Dependent item | oci.aut.db.physical.write.bytes Preprocessing
|
OCI Autonomous DB: Queued statements | The number of queued SQL statements aggregated across all consumer groups during the selected time interval. |
Dependent item | oci.aut.db.queued.statements Preprocessing
|
OCI Autonomous DB: Redo generated | Amount of redo generated in bytes during the selected time interval. |
Dependent item | oci.aut.db.redo.gen Preprocessing
|
OCI Autonomous DB: Running statements | The number of running SQL statements aggregated across all consumer groups during the selected time interval. |
Dependent item | oci.aut.db.statements.running Preprocessing
|
OCI Autonomous DB: Sessions | The number of sessions in the database. |
Dependent item | oci.aut.db.sessions Preprocessing
|
OCI Autonomous DB: Bytes received via SQL*Net from client | The number of bytes received from the client over Oracle Net Services during the selected time interval. |
Dependent item | oci.aut.db.sqlnet.bytes.recv.client Preprocessing
|
OCI Autonomous DB: Bytes received via SQL*Net from DBLink | The number of bytes received from a database link over Oracle Net Services during the selected time interval. |
Dependent item | oci.aut.db.sqlnet.bytes.recv.dblink Preprocessing
|
OCI Autonomous DB: Bytes sent via SQL*Net to client | The number of bytes sent to the client from the foreground processes during the selected time interval. |
Dependent item | oci.aut.db.sqlnet.bytes.sent.client Preprocessing
|
OCI Autonomous DB: Bytes sent via SQL*Net to DBLink | The number of bytes sent over a database link during the selected time interval. |
Dependent item | oci.aut.db.sqlnet.bytes.sent.dblink Preprocessing
|
OCI Autonomous DB: Transaction count | The combined number of user commits and user rollbacks during the selected time interval. |
Dependent item | oci.aut.db.transaction.count Preprocessing
|
OCI Autonomous DB: User calls | The combined number of logons, parses, and execute calls during the selected time interval. |
Dependent item | oci.aut.db.user.calls Preprocessing
|
OCI Autonomous DB: User commits | The number of user commits during the selected time interval. When a user commits a transaction, the generated redo that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate. |
Dependent item | oci.aut.db.user.commits Preprocessing
|
OCI Autonomous DB: User rollbacks | Number of times users manually issue the |
Dependent item | oci.aut.db.user.rollbacks Preprocessing
|
OCI Autonomous DB: Wait time | Average rate of accumulation of non-idle wait time by foreground sessions in the database over the selected time interval. |
Dependent item | oci.aut.db.wait.time Preprocessing
|
OCI Autonomous DB: Get database stats | Gets all metrics related to specific database that have a collection frequency of 5 minutes. |
Script | oci.aut.db.metrics.stats |
OCI Autonomous DB: Database availability | The database is available for connections in the given minute. |
Dependent item | oci.aut.db.availability Preprocessing
|
OCI Autonomous DB: Connection latency | The time taken to connect to an Oracle Autonomous Database Serverless instance in each region from a Compute service virtual machine in the same region. |
Dependent item | oci.aut.db.latency.conn Preprocessing
|
OCI Autonomous DB: Query latency | The time taken to display the results of a simple query on the user's screen. |
Dependent item | oci.aut.db.latency.query Preprocessing
|
OCI Autonomous DB: Get storage stats | Gets all storage metrics related to a specific database that have a collection frequency of 60 minutes. |
Script | oci.aut.db.metrics.storage.stats |
OCI Autonomous DB: Storage space allocated | Amount of space allocated to the database for all tablespaces during the selected time interval. |
Dependent item | oci.aut.db.storage.space.alloc Preprocessing
|
OCI Autonomous DB: Maximum storage space | Maximum amount of storage reserved for the database during the selected time interval. |
Dependent item | oci.aut.db.storage.space.max Preprocessing
|
OCI Autonomous DB: Storage space used | Maximum amount of space used during the selected time interval. |
Dependent item | oci.aut.db.storage.space.used Preprocessing
|
OCI Autonomous DB: Storage utilization, in % | The percentage of the reserved maximum storage currently allocated for all database tablespaces. Represents the total reserved space for all tablespaces. |
Dependent item | oci.aut.db.storage.space.util Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OCI Autonomous DB: Restore has failed | Autonomous database restore has failed. |
last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 9 |Warning |
||
OCI Autonomous DB: Database is not available or accessible | Autonomous database is not available or accessible. |
last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 19 or last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 20 |High |
||
OCI Autonomous DB: Available, needs attention | Autonomous database is available, but needs attention. |
last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 12 |Warning |
||
OCI Autonomous DB: State unknown | Autonomous database state is unknown. |
last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state) = 0 |Warning |
||
OCI Autonomous DB: State has changed | Autonomous database state has changed. |
last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state,#1)<>last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.state,#2) |Info |
Manual close: Yes Depends on:
|
|
OCI Autonomous DB: Current CPU utilization is too high | Current CPU utilization has exceeded |
min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.cpu.util,5m) >= {$OCI.AUTONOMOUS.DB.CPU.UTIL.HIGH} |High |
||
OCI Autonomous DB: Current CPU utilization is high | Current CPU utilization has exceeded |
min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.cpu.util,5m) >= {$OCI.AUTONOMOUS.DB.CPU.UTIL.WARN} |Warning |
Depends on:
|
|
OCI Autonomous DB: Database is not available | Autonomous database is not available. |
last(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.availability) = 0 |High |
Depends on:
|
|
OCI Autonomous DB: Current storage utilization is too high | Current storage utilization has exceeded |
min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.storage.space.util,5m) >= {$OCI.AUTONOMOUS.DB.STORAGE.UTIL.HIGH} |High |
||
OCI Autonomous DB: Current storage utilization is high | Current storage utilization has exceeded |
min(/Oracle Cloud Autonomous Database by HTTP/oci.aut.db.storage.space.util,5m) >= {$OCI.AUTONOMOUS.DB.STORAGE.UTIL.WARN} |Warning |
Depends on:
|
This template monitors Oracle Cloud Infrastructure (OCI) block volume resources.
This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.
For communication with OCI, this template utilizes script items which execute HTTP GET
and POST
requests.
POST
requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses
HTTP request body for queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Oracle Cloud by HTTP
template will
discover OCI block volumes automatically, create host prototypes for each discovered
block volume, and apply it this template.
If needed, you can specify an HTTP proxy for the template to use by changing the value of {$OCI.HTTP.PROXY}
user macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case,
please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
LLD filter values and trigger threshold values can be changed with respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
Name | Description | Type | Key and additional info |
---|---|---|---|
OCI Block Volume: State | Gets the block volume state. |
Script | oci.block.volume.state Preprocessing
|
OCI Block Volume: Get metrics | Gets block volume metrics. |
Script | oci.block.volume.metrics.get |
OCI Block Volume: Volume read throughput | Read throughput. Expressed as bytes read per interval. |
Dependent item | oci.block.volume.read Preprocessing
|
OCI Block Volume: Volume write throughput | Write throughput. Expressed as bytes read per interval. |
Dependent item | oci.block.volume.write Preprocessing
|
OCI Block Volume: Volume read operations | Activity level from I/O reads. Expressed as reads per interval. |
Dependent item | oci.block.volume.read.ops Preprocessing
|
OCI Block Volume: Volume write operations | Activity level from I/O writes. Expressed as writes per interval. |
Dependent item | oci.block.volume.write.ops Preprocessing
|
OCI Block Volume: Volume throttled operations | Total sum of all the I/O operations that were throttled during a given time interval. |
Dependent item | oci.block.volume.throttled.ops Preprocessing
|
OCI Block Volume: Volume guaranteed VPUs/GB | Rate of change for currently active VPUs/GB. Expressed as the average of active VPUs/GB during a given time interval. |
Dependent item | oci.block.volume.vpu Preprocessing
|
OCI Block Volume: Volume guaranteed IOPS | Rate of change for guaranteed IOPS per SLA. Expressed as the average of guaranteed IOPS during a given time interval. |
Dependent item | oci.block.volume.iops Preprocessing
|
OCI Block Volume: Volume guaranteed throughput | Rate of change for guaranteed throughput per SLA. Expressed as megabytes per interval. |
Dependent item | oci.block.volume.throughput Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OCI Block Volume: Block volume terminated or faulty | Block volume state is "terminated"/"terminating" or "faulty". |
min(/Oracle Cloud Block Volume by HTTP/oci.block.volume.state,5m) >= 4 |High |
||
OCI Block Volume: Block volume state unknown | Block volume state is unknown. |
min(/Oracle Cloud Block Volume by HTTP/oci.block.volume.state,5m) = 0 |Warning |
Depends on:
|
This template monitors Oracle Cloud Infrastructure (OCI) boot volume resources.
This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.
For communication with OCI, this template utilizes script items which execute HTTP GET
and POST
requests.
POST
requests are required for OCI Monitoring API as it utilizes Monitoring Query Language (MQL) which uses
HTTP request body for queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Oracle Cloud by HTTP
template will
discover OCI boot volumes automatically, create host prototypes for each discovered
boot volume, and apply it this template.
If needed, you can specify an HTTP proxy for the template to use by changing the value of the
{$OCI.HTTP.PROXY}
user macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case,
please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
LLD filter values and trigger threshold values can be changed with respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
Name | Description | Type | Key and additional info |
---|---|---|---|
OCI Boot Volume: State | Gets the boot volume state. |
Script | oci.boot.volume.state Preprocessing
|
OCI Boot Volume: Get metrics | Gets boot volume metrics. |
Script | oci.boot.volume.metrics.get |
OCI Boot Volume: Volume read throughput | Read throughput. Expressed as bytes read per interval. |
Dependent item | oci.boot.volume.read Preprocessing
|
OCI Boot Volume: Volume write throughput | Write throughput. Expressed as bytes read per interval. |
Dependent item | oci.boot.volume.write Preprocessing
|
OCI Boot Volume: Volume read operations | Activity level from I/O reads. Expressed as reads per interval. |
Dependent item | oci.boot.volume.read.ops Preprocessing
|
OCI Boot Volume: Volume write operations | Activity level from I/O writes. Expressed as writes per interval. |
Dependent item | oci.boot.volume.write.ops Preprocessing
|
OCI Boot Volume: Volume throttled operations | Total sum of all the I/O operations that were throttled during a given time interval. |
Dependent item | oci.boot.volume.throttled.ops Preprocessing
|
OCI Boot Volume: Volume guaranteed VPUs/GB | Rate of change for currently active VPUs/GB. Expressed as the average of active VPUs/GB during a given time interval. |
Dependent item | oci.boot.volume.vpu Preprocessing
|
OCI Boot Volume: Volume guaranteed IOPS | Rate of change for guaranteed IOPS per SLA. Expressed as the average of guaranteed IOPS during a given time interval. |
Dependent item | oci.boot.volume.iops Preprocessing
|
OCI Boot Volume: Volume guaranteed throughput | Rate of change for guaranteed throughput per SLA. Expressed as megabytes per interval. |
Dependent item | oci.boot.volume.throughput Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OCI Boot Volume: Boot volume terminated or faulty | Boot volume state is "terminated"/"terminating" or "faulty". |
min(/Oracle Cloud Boot Volume by HTTP/oci.boot.volume.state,5m) >= 4 |High |
||
OCI Boot Volume: Boot volume state unknown | Boot volume state is unknown. |
min(/Oracle Cloud Boot Volume by HTTP/oci.boot.volume.state,5m) = 0 |Warning |
Depends on:
|
This template monitors Oracle Cloud Infrastructure (OCI) single virtual network card availability and discovers attached subnets and monitors their availability.
This template is not meant to be used independently, but together with Oracle Cloud by HTTP as a template for LLD host prototypes.
For communication with OCI, this template utilizes script items which execute HTTP GET
requests.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Oracle Cloud by HTTP
template will
discover OCI virtual cloud networks (VCNs) automatically, create host prototypes for each discovered
VCN, and apply it this template.
If needed, you can specify an HTTP proxy for the template to use by changing the value of {$OCI.HTTP.PROXY}
user macro.
If using a proxy, the returned OK HTTP response could change from "200" to a different value. In that case,
please adjust the user macro {$OCI.HTTP.RETURN.CODE.OK}
LLD filter values and trigger threshold values can be changed with respective user macros.
Name | Description | Default |
---|---|---|
{$OCI.HTTP.PROXY} | Set an HTTP proxy for OCI API requests if needed. |
|
{$OCI.HTTP.RETURN.CODE.OK} | Set the HTTP return code that represents an OK response from the API. The default is "200", but can vary, for example, if a proxy is used. |
200 |
{$OCI.VCN.SUBNET.DISCOVERY.STATE.MATCHES} | Sets the regex string of VCN subnet states to allow in discovery. |
.* |
{$OCI.VCN.SUBNET.DISCOVERY.STATE.NOT_MATCHES} | Sets the regex string of VCN subnet states to ignore in discovery. |
CHANGE_IF_NEEDED |
{$OCI.VCN.SUBNET.DISCOVERY.NAME.MATCHES} | Sets the regex string of VCN subnet names to allow in discovery. |
.* |
{$OCI.VCN.SUBNET.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of VCN subnet names to ignore in discovery. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
OCI VCN: Get VCN state | State of the virtual cloud network. |
Script | oci.vcn.state.get Preprocessing
|
OCI VCN: Get subnets | Get data about subnets linked to the particular VCN. |
Script | oci.vcn.subnets.get |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OCI VCN: VCN state terminated | Virtual cloud network state is "terminated" or "terminating". |
min(/Oracle Cloud Networking by HTTP/oci.vcn.state.get,5m) = 3 or min(/Oracle Cloud Networking by HTTP/oci.vcn.state.get,5m) = 4 |High |
||
OCI VCN: VCN state unknown | Virtual cloud network state is unknown. |
min(/Oracle Cloud Networking by HTTP/oci.vcn.state.get,5m) = 0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Subnet discovery | Discover subnets linked to the particular VCN. |
Dependent item | oci.vcn.subnet.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Subnet [{#NAME}]: Get subnet state | Current state of subnet. |
Dependent item | oci.vcn.subnet.state[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Subnet [{#NAME}]: Subnet state terminated | Virtual cloud network subnet state is "terminated" or "terminating". |
min(/Oracle Cloud Networking by HTTP/oci.vcn.subnet.state[{#ID}],5m) = 3 or min(/Oracle Cloud Networking by HTTP/oci.vcn.subnet.state[{#ID}],5m) = 4 |High |
||
Subnet [{#NAME}]: Subnet state unknown | Virtual cloud network subnet state is unknown. |
min(/Oracle Cloud Networking by HTTP/oci.vcn.subnet.state[{#ID}],5m) = 0 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of OpenStack monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This is a master template that needs to be assigned to a host, and it will discover all OpenStack services supported by Zabbix automatically.
Before using this template it is recommended to create a separate monitoring user on OpenStack that will have access to specific API resources. Zabbix uses OpenStack application credentials for authorization, as it is a more secure method than a username and password-based authentication.
Below are instructions and examples on how to set up a user on OpenStack that will be used by Zabbix. Examples use the OpenStack CLI (command-line interface) tool, but this can also be done from OpenStack Horizon (web interface).
If using the CLI tool, make sure you have the OpenStack RC file for your project with a user that has rights to create other users, roles, etc., and source it, for example,
. zabbix-admin-openrc.sh
.The OpenStack RC file can be obtained from Horizon.
The project that needs to be monitored is assumed to be already present in OpenStack. In the following examples, a project named zabbix
is used:
# openstack project list
+----------------------------------+--------------------+
| ID | Name |
+----------------------------------+--------------------+
| 28d6bb25d62b4e7e8c2d59ce056a0334 | service |
| 4688a19e02324c42a34220e9b6a2407e | admin |
| bc78db4bb2044148a0abf90be512fa12 | zabbix |
+----------------------------------+--------------------+
openstack user create
command:# openstack user create --project zabbix --password-prompt zabbix-monitoring
User Password:
Repeat User Password:
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| default_project_id | bc78db4bb2044148a0abf90be512fa12 |
| domain_id | default |
| enabled | True |
| id | abd3eda9a29244568b1801e4825b6d71 |
| name | zabbix-monitoring |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
# openstack role create --description "A role for Zabbix monitoring user" monitoring
+-------------+-----------------------------------+
| Field | Value |
+-------------+-----------------------------------+
| description | A role for Zabbix monitoring user |
| domain_id | None |
| id | 93577a7f13184cf7af76f7bdecf7f6ee |
| name | monitoring |
| options | {} |
+-------------+-----------------------------------+
# openstack role add --user zabbix-monitoring --project zabbix monitoring
# openstack role assignment list --user zabbix-monitoring --project zabbix --names
+------------+---------------------------+-------+----------------+--------+--------+-----------+
| Role | User | Group | Project | Domain | System | Inherited |
+------------+---------------------------+-------+----------------+--------+--------+-----------+
| monitoring | zabbix-monitoring@Default | | zabbix@Default | | | False |
+------------+---------------------------+-------+----------------+--------+--------+-----------+
# openstack application credential create --description "Application credential for Zabbix monitoring" zabbix-app-cred
+--------------+----------------------------------------------------------------------------------------+
| Field | Value |
+--------------+----------------------------------------------------------------------------------------+
| description | Application credential for Zabbix monitoring |
| expires_at | None |
| id | c8087b91354249f3b157a50fc5ecfb3c |
| name | zabbix-app-cred |
| project_id | bc78db4bb2044148a0abf90be512fa12 |
| roles | monitoring |
| secret | E1kC-s8QTWUaIpmexF18GW-FL3TI9-HXoexdExvGsw7uOhb3SEFW1zDa1qTs80Vqn-2xgviIPRuYOCDp2NDVUg |
| system | None |
| unrestricted | False |
| user_id | abd3eda9a29244568b1801e4825b6d71 |
+--------------+----------------------------------------------------------------------------------------+
While creating the application credential, it is also possible to define access rules using the --access-rules
flag, which offers even more fine-grained access to various API endpoints.
This is optional and up to the user to decide if such rules are needed.
Once the application credential is created, the values of id
and secret
need to be set as user macro values in Zabbix:
id
in {$APP.CRED.ID}
user macro;secret
in {$APP.CRED.SECRET}
user macro.At this point, the monitoring user will not be able to access any resources on OpenStack, therefore some access rights need to be defined. Access rights are set using policies. Each service has its own policy file, therefore further steps for setting up policies, are mentioned in the template documentation of each supported service, e.g., OpenStack Nova by HTTP.
Name | Description | Default |
---|---|---|
{$OPENSTACK.KEYSTONE.API.ENDPOINT} | API endpoint for Identity Service, e.g., https://local.openstack:5000. |
|
{$OPENSTACK.AUTH.INTERVAL} | API token regeneration interval, in minutes. By default, OpenStack API tokens expire after 60m. |
50m |
{$OPENSTACK.HTTP.PROXY} | Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used. |
|
{$OPENSTACK.APP.CRED.ID} | Application credential ID for monitoring user access. |
|
{$OPENSTACK.APP.CRED.SECRET} | Application credential password for monitoring user access. |
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenStack: Get access token and service catalog | Authorizes user on the OpenStack Identity service and gets the service catalog. |
Script | openstack.identity.auth |
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenStack: Nova discovery | Discovers OpenStack services from the monitoring user's services catalog. |
Dependent item | openstack.services.nova.discovery |
This template is designed for the effortless deployment of OpenStack Nova monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the OpenStack by HTTP template will discover the Nova service automatically and create a host prototype with this template assigned to it.
If needed, you can specify an HTTP proxy for the template to use by changing the value of {$OPENSTACK.NOVA.HTTP.PROXY}
user macro.
For tenant usage statistics, it is possible to choose a custom time period for which the data will be queried. This can be set with the {$OPENSTACK.NOVA.TENANT.PERIOD}
macro value.
The value can be one of the following:
y
- current year until now;
m
- current month until now (default value);
w
- current week until now;
d
- current day until now;
This template discovers servers (instances) present in the project and monitors their statuses, but, depending on different use cases, most likely it is not necessary to monitor all servers.
To filter which servers to monitor, set the {$OPENSTACK.SERVER.DISCOVERY.NAME.MATCHES}
and {$OPENSTACK.SERVER.DISCOVERY.NAME.NOT_MATCHES}
macro values accordingly. This logic also applies to other low-level discovery rules.
OpenStack configuration
For the OpenStack monitoring user to be able to access the API resources used in this template, it is needed to configure the policy file for OpenStack Nova.
On the OpenStack server, open the /etc/nova/policy.json
file in your favorite text editor.
In this file, assign the following target resources to the role that the monitoring user uses:
{
"os_compute_api:servers:index": "role:monitoring",
"os_compute_api:servers:show": "role:monitoring",
"os_compute_api:os-services:list": "role:monitoring",
"os_compute_api:os-hypervisors:list-detail": "role:monitoring",
"os_compute_api:os-availability-zone:detail": "role:monitoring",
"os_compute_api:os-simple-tenant-usage:list": "role:monitoring"
}
If some role is already assigned to the target, it is possible to add another role with or
, for example, role:firstRole or role:monitoring
.
Note that a restart of OpenStack Nova services might be needed for these new changes to be applied.
Name | Description | Default |
---|---|---|
{$OPENSTACK.NOVA.SERVICE.URL} | API endpoint for Nova Service, e.g., https://local.openstack:8774/v2.1. |
|
{$OPENSTACK.TOKEN} | API token for the monitoring user. |
|
{$OPENSTACK.HTTP.PROXY} | Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used. |
|
{$OPENSTACK.NOVA.TENANT.PERIOD} | Period for which tenant usage statistics will be queried. Possible values are: 'y' - current year until now, 'm' - current month until now, 'w' - current week until now, 'd' - current day until now. |
m |
{$OPENSTACK.NOVA.INTERVAL.LIMITS} | Interval for absolute limit HTTP agent item query. |
3m |
{$OPENSTACK.NOVA.INTERVAL.SERVERS} | Interval for server HTTP agent item queries. |
3m |
{$OPENSTACK.NOVA.INTERVAL.SERVICES} | Interval for service HTTP agent item query. |
3m |
{$OPENSTACK.NOVA.INTERVAL.HYPERVISOR} | Interval for hypervisor HTTP agent item query. |
3m |
{$OPENSTACK.NOVA.INTERVAL.AVAILABILITY_ZONE} | Interval for availability zone HTTP agent item query. |
3m |
{$OPENSTACK.NOVA.INTERVAL.TENANTS} | Interval for tenant HTTP agent item query. |
3m |
{$OPENSTACK.NOVA.INSTANCES.UTIL.WARN} | Sets the percentage threshold for creating a warning severity event about instances resource count. |
75 |
{$OPENSTACK.NOVA.INSTANCES.UTIL.HIGH} | Sets the percentage threshold for creating a high severity event about instances resource count. |
90 |
{$OPENSTACK.NOVA.CPU.UTIL.WARN} | Sets the percentage threshold for creating a warning severity event about vCPU resource usage. |
75 |
{$OPENSTACK.NOVA.CPU.UTIL.HIGH} | Sets the percentage threshold for creating a high severity event about vCPU resource usage. |
90 |
{$OPENSTACK.NOVA.RAM.UTIL.WARN} | Sets the percentage threshold for creating a warning severity event about RAM resource usage. |
75 |
{$OPENSTACK.NOVA.RAM.UTIL.HIGH} | Sets the percentage threshold for creating a high severity event about RAM resource usage. |
90 |
{$OPENSTACK.SERVER.DISCOVERY.NAME.MATCHES} | Sets the server name regex filter to use in server discovery for including. |
.* |
{$OPENSTACK.SERVER.DISCOVERY.NAME.NOT_MATCHES} | Sets the server name regex filter to use in server discovery for excluding. |
CHANGE_IF_NEEDED |
{$OPENSTACK.SERVICES.DISCOVERY.HOST.MATCHES} | Sets the host name regex filter to use in compute services discovery for including. |
.* |
{$OPENSTACK.SERVICES.DISCOVERY.HOST.NOT_MATCHES} | Sets the host name regex filter to use in compute services discovery for excluding. |
CHANGE_IF_NEEDED |
{$OPENSTACK.SERVICES.DISCOVERY.BINARY.MATCHES} | Sets the binary name regex filter to use in compute services discovery for including. |
.* |
{$OPENSTACK.SERVICES.DISCOVERY.BINARY.NOT_MATCHES} | Sets the binary name regex filter to use in compute services discovery for excluding. |
CHANGE_IF_NEEDED |
{$OPENSTACK.HYPERVISOR.DISCOVERY.HOSTNAME.MATCHES} | Sets the hostname regex filter to use in hypervisor discovery for including. |
.* |
{$OPENSTACK.HYPERVISOR.DISCOVERY.HOSTNAME.NOT_MATCHES} | Sets the hostname regex filter to use in hypervisor discovery for excluding. |
CHANGE_IF_NEEDED |
{$OPENSTACK.HYPERVISOR.DISCOVERY.TYPE.MATCHES} | Sets the type regex filter to use in hypervisor discovery for including. |
.* |
{$OPENSTACK.HYPERVISOR.DISCOVERY.TYPE.NOT_MATCHES} | Sets the type regex filter to use in hypervisor discovery for excluding. |
CHANGE_IF_NEEDED |
{$OPENSTACK.HYPERVISOR.DISCOVERY.IP.MATCHES} | Sets the host IP address regex filter to use in hypervisor discovery for including. |
.* |
{$OPENSTACK.HYPERVISOR.DISCOVERY.IP.NOT_MATCHES} | Sets the host IP address regex filter to use in hypervisor discovery for excluding. |
CHANGE_IF_NEEDED |
{$OPENSTACK.AVAILABILITY_ZONE.DISCOVERY.NAME.MATCHES} | Sets the zone name regex filter to use in availability zone discovery for including. |
.* |
{$OPENSTACK.AVAILABILITYZONE.DISCOVERY.NAME.NOTMATCHES} | Sets the zone name regex filter to use in availability zone discovery for excluding. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nova: Get absolute limits | Gets absolute limits for the project. |
HTTP agent | openstack.nova.limits.get Preprocessing
|
Nova: Get servers | Gets a list of servers. |
HTTP agent | openstack.nova.servers.get Preprocessing
|
Nova: Get compute services | Gets a list of compute services and its data. |
HTTP agent | openstack.nova.services.get Preprocessing
|
Nova: Get hypervisors | Gets a list of hypervisors and its data. |
HTTP agent | openstack.nova.hypervisors.get Preprocessing
|
Nova: Get availability zones | Gets a list of availability zones and its data. |
HTTP agent | openstack.nova.availability_zone.get Preprocessing
|
Nova: Get tenants | Gets a list of tenants and its data. |
Script | openstack.nova.tenant.get Preprocessing
|
Nova: Instances count, current | Number of servers in each tenant. |
Dependent item | openstack.nova.limits.instances.current Preprocessing
|
Nova: Instances count, max | Number of allowed servers for each tenant. |
Dependent item | openstack.nova.limits.instances.max Preprocessing
|
Nova: Instances count, free | Number of available servers for each tenant. |
Calculated | openstack.nova.limits.instances.free Preprocessing
|
Nova: vCPUs usage, current | Number of used server cores in each tenant. |
Dependent item | openstack.nova.limits.vcpu.current Preprocessing
|
Nova: vCPUs usage, max | Number of allowed server cores for each tenant. |
Dependent item | openstack.nova.limits.vcpu.max Preprocessing
|
Nova: vCPUs usage, free | Number of available server cores for each tenant. |
Calculated | openstack.nova.limits.vcpu.free Preprocessing
|
Nova: RAM usage, current | Amount of used server RAM. |
Dependent item | openstack.nova.limits.ram.current Preprocessing
|
Nova: RAM usage, max | Amount of allowed server RAM. |
Dependent item | openstack.nova.limits.ram.max Preprocessing
|
Nova: RAM usage, free | Amount of available server RAM. |
Calculated | openstack.nova.limits.ram.free Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nova: Current instances count is too high | Current instances count has exceeded {$OPENSTACK.NOVA.INSTANCES.UTIL.HIGH}% of the max available instances count. |
last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.current) >= ({$OPENSTACK.NOVA.INSTANCES.UTIL.HIGH} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.max)) |High |
||
Nova: Current instances count is high | Current instances count has exceeded {$OPENSTACK.NOVA.INSTANCES.UTIL.WARN}% of the max available instances count. |
last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.current) >= ({$OPENSTACK.NOVA.INSTANCES.UTIL.WARN} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.instances.max)) |Warning |
Depends on:
|
|
Nova: Current vCPU usage is too high | Current vCPU usage has exceeded {$OPENSTACK.NOVA.CPU.UTIL.HIGH}% of the max available vCPU usage. |
last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.current) >= ({$OPENSTACK.NOVA.CPU.UTIL.HIGH} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.max)) |High |
||
Nova: Current vCPU usage is high | Current vCPU usage has exceeded {$OPENSTACK.NOVA.CPU.UTIL.WARN}% of the max available vCPU usage. |
last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.current) >= ({$OPENSTACK.NOVA.CPU.UTIL.WARN} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.vcpu.max)) |Warning |
Depends on:
|
|
Nova: Current RAM usage is too high | Current RAM usage has exceeded {$OPENSTACK.NOVA.RAM.UTIL.HIGH}% of the max available RAM usage. |
last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.current) >= ({$OPENSTACK.NOVA.RAM.UTIL.HIGH} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.max)) |High |
||
Nova: Current RAM usage is high | Current RAM usage has exceeded {$OPENSTACK.NOVA.RAM.UTIL.WARN}% of the max available RAM usage. |
last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.current) >= ({$OPENSTACK.NOVA.RAM.UTIL.WARN} / 100 * last(/OpenStack Nova by HTTP/openstack.nova.limits.ram.max)) |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nova: Servers discovery | Discovers OpenStack Nova servers. |
Dependent item | openstack.nova.server.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Server [{#SERVERID}]:[{#SERVERNAME}]: Status | Server status. |
HTTP agent | openstack.nova.server.status.get[{#SERVER_ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Server [{#SERVERID}]:[{#SERVERNAME}]: Status is "ERROR" | Server is in "ERROR" status. |
last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}])=5 |High |
Manual close: Yes | |
Server [{#SERVERID}]:[{#SERVERNAME}]: Status has changed | Status of the server has changed. Acknowledge to close the problem manually. |
last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}])<>last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}],#2) and length(last(/OpenStack Nova by HTTP/openstack.nova.server.status.get[{#SERVER_ID}]))>0 |Info |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nova: Compute services discovery | Discovers OpenStack compute services. |
Dependent item | openstack.nova.services.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Raw data | Raw data of the service. |
Dependent item | openstack.nova.services.raw[{#ID}] Preprocessing
|
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: State | State of the service. |
Dependent item | openstack.nova.services.state[{#ID}] Preprocessing
|
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Status | Status of the service. |
Dependent item | openstack.nova.services.status[{#ID}] Preprocessing
|
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Disabling reason | Reason for disabling a service. |
Dependent item | openstack.nova.services.disabled.reason[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: State is "down" | State of the service is "down". |
last(/OpenStack Nova by HTTP/openstack.nova.services.state[{#ID}])=0 |Warning |
Manual close: Yes | |
Compute service [{#HOST}]:[{#BINARY}]:[{#ID}]: Status is "disabled" | Status of the server is disabled. Acknowledge to close the problem manually. |
last(/OpenStack Nova by HTTP/openstack.nova.services.status[{#ID}])=0 and length(last(/OpenStack Nova by HTTP/openstack.nova.services.disabled.reason[{#ID}]))>=0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nova: Hypervisor discovery | Discovers OpenStack Nova hypervisors. |
Dependent item | openstack.nova.hypervisors.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Hypervisor [{#ID}]:[{#HOSTNAME}]: Raw data | Raw data of the hypervisor. |
Dependent item | openstack.nova.hypervisors.raw[{#ID}] Preprocessing
|
Hypervisor [{#ID}]:[{#HOSTNAME}]: State | State of the hypervisor. |
Dependent item | openstack.nova.hypervisors.state[{#ID}] Preprocessing
|
Hypervisor [{#ID}]:[{#HOSTNAME}]: Status | Status of the hypervisor. |
Dependent item | openstack.nova.hypervisors.status[{#ID}] Preprocessing
|
Hypervisor [{#ID}]:[{#HOSTNAME}]: Version | Hypervisor version. |
Dependent item | openstack.nova.hypervisors.version[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Hypervisor [{#ID}]:[{#HOSTNAME}]: State is "down" | State of the hypervisor is "down". |
last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.state[{#ID}])=0 |Warning |
Manual close: Yes | |
Hypervisor [{#ID}]:[{#HOSTNAME}]: Status is "disabled" | Status of the hypervisor is disabled. |
last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.status[{#ID}])=0 |Info |
Manual close: Yes | |
Hypervisor [{#ID}]:[{#HOSTNAME}]: Version has changed | Version of the hypervisor has changed. Acknowledge to close the problem manually. |
last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.version[{#ID}])<>last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.version[{#ID}],#2) and length(last(/OpenStack Nova by HTTP/openstack.nova.hypervisors.version[{#ID}]))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nova: Availability zones discovery | Discovers OpenStack Nova availability zones. |
Dependent item | openstack.nova.availability_zone.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Availability zone [{#ZONE_NAME}]: Raw data | Raw data of the availability zone. |
Dependent item | openstack.nova.availabilityzone.raw[{#ZONENAME}] Preprocessing
|
Availability zone [{#ZONE_NAME}]: State | Current state of the availability zone. |
Dependent item | openstack.nova.availabilityzone.state[{#ZONENAME}] Preprocessing
|
Availability zone [{#ZONE_NAME}]: Host count | Count of hosts and service objects under single availability zone. |
Dependent item | openstack.nova.availabilityzone.hostcount[{#ZONE_NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Availability zone [{#ZONE_NAME}]: Zone is unavailable | Availability zone is not available. |
last(/OpenStack Nova by HTTP/openstack.nova.availability_zone.state[{#ZONE_NAME}])=0 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nova: Tenant discovery | Discovers tenants and their usage data. |
Dependent item | openstack.nova.tenant.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Tenant [{#TENANT_ID}]: Raw data | Raw data of the tenant. |
Dependent item | openstack.nova.tenant.raw[{#TENANT_ID}] Preprocessing
|
Tenant [{#TENANT_ID}]: Total hours | Total duration that the servers exist (in hours). |
Dependent item | openstack.nova.tenant.totalhours[{#TENANTID}] Preprocessing
|
Tenant [{#TENANT_ID}]: Total vCPUs usage | Total vCPU usage hours for the current tenant (project). Multiplying the number of virtual CPUs of the server by hours the server exists, and then adding that all together for each server. |
Dependent item | openstack.nova.tenant.totalvcpu[{#TENANTID}] Preprocessing
|
Tenant [{#TENANT_ID}]: Total disk usage | Total disk usage hours for the current tenant (project). Multiplying the server disk size (in GiB) by hours the server exists, and then adding that all together for each server. |
Dependent item | openstack.nova.tenant.diskusage[{#TENANTID}] Preprocessing
|
Tenant [{#TENANT_ID}]: Total memory usage | Total memory usage hours for the current tenant (project). Multiplying the server memory size (in MiB) by hours the server exists, and then adding that all together for each server. |
Dependent item | openstack.nova.tenant.totalmemorymbusage[{#TENANTID}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor Google Cloud Platform (hereinafter - GCP) by Zabbix. It works without any external scripts and uses the script item. The template currently supports the discovery of Compute Engine/Cloud SQL instances and Compute Engine project quota metrics.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Stackdriver Monitoring API
for the GCP project you wish to monitor.
>Refer to the vendor documentation.project_id
, private_key_id
, private_key
, client_email
from the JSON key file and add them to their corresponding macros {$GCP.PROJECT.ID}
, {$GCP.PRIVATE.KEY.ID}
, {$GCP.PRIVATE.KEY}
, {$GCP.CLIENT.EMAIL}
on the template/host.Additional information:
Make sure that you're creating the service account using the credentials with the `Project Owner/Project IAM Admin/service account Admin` role.
The service account JSON key file can only be downloaded once: regenerate it if the previous key has been lost.
The service account should have `Project Viewer` permissions or granular permissions for the GCP Compute Engine API/GCP Cloud SQL.
You can copy and paste private_key string data from the Service Account JSON key file as is or replace the new line metasymbol (\n) with an actual new line.
Please, refer to the vendor documentation about the service accounts management.
IMPORTANT!!!
Secret authorization token is defined as a plain text in host prototype settings by default due to Zabbix templates export/import limits: therefore, it is highly recommended to change the user macro `{$GCP.AUTH.TOKEN}` value type to `SECRET` for all host prototypes after the template `GCP by HTTP` import.
All the instances/quotas/metrics discovered are related to a particular GCP project.
To monitor several GCP projects - create their corresponding service accounts/Zabbix hosts.
GCP Access Token is available for 1 hour (3600 seconds) after the generation request.
To avoid a GCP token inconsistency between Zabbix database and Zabbix server configuration cache, don't set Zabbix server configuration parameter CacheUpdateFrequency to a value over 45 minutes and don't set the update interval for the GCP Authorization item to more than 1 hour (maximum CacheUpdateFrequency value).
Additional information about metrics and used API methods:
Name | Description | Default |
---|---|---|
{$GCP.PROJECT.ID} | GCP project ID. |
|
{$GCP.CLIENT.EMAIL} | Service account client e-mail. |
|
{$GCP.PRIVATE.KEY.ID} | Service account private key id. |
|
{$GCP.PRIVATE.KEY} | Service account private key data. |
|
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.AUTH.FREQUENCY} | The update interval for the GCP Authorization item, which also equals to the access token regeneration request frequency. Check the template documentation notes carefully for more details. |
45m |
{$GCP.GCE.INST.NAME.MATCHES} | The filter to include GCP Compute Engine instances by namespace. |
.* |
{$GCP.GCE.INST.NAME.NOT_MATCHES} | The filter to exclude GCP Compute Engine instances by namespace. |
CHANGE_IF_NEEDED |
{$GCP.GCE.ZONE.MATCHES} | The filter to include GCP Compute Engine instances by zone. |
.* |
{$GCP.GCE.ZONE.NOT_MATCHES} | The filter to exclude GCP Compute Engine instances by zone. |
CHANGE_IF_NEEDED |
{$GCP.MYSQL.INST.NAME.MATCHES} | The filter to include GCP Cloud SQL MySQL instances by namespace. |
.* |
{$GCP.MYSQL.INST.NAME.NOT_MATCHES} | The filter to exclude GCP Cloud SQL MySQL instances by namespace. |
CHANGE_IF_NEEDED |
{$GCP.MYSQL.ZONE.MATCHES} | The filter to include GCP Cloud SQL MySQL instances by zone. |
.* |
{$GCP.MYSQL.ZONE.NOT_MATCHES} | The filter to exclude GCP Cloud SQL MySQL instances by zone. |
CHANGE_IF_NEEDED |
{$GCP.MYSQL.INST.TYPE.MATCHES} | The filter to include GCP Cloud SQL MySQL instances by type (standalone/replica). |
.* |
{$GCP.MYSQL.INST.TYPE.NOT_MATCHES} | The filter to exclude GCP Cloud SQL MySQL instances by type (standalone/replica). Set a macro value 'CLOUDSQLINSTANCE' to exclude standalone Instances or 'READREPLICAINSTANCE' to exclude read-only Replicas. |
CHANGE_IF_NEEDED |
{$GCP.PGSQL.INST.NAME.MATCHES} | The filter to include GCP Cloud SQL PostgreSQL instances by namespace. |
.* |
{$GCP.PGSQL.INST.NAME.NOT_MATCHES} | The filter to exclude GCP Cloud SQL PostgreSQL instances by namespace. |
CHANGE_IF_NEEDED |
{$GCP.PGSQL.ZONE.MATCHES} | The filter to include GCP Cloud SQL PostgreSQL instances by zone. |
.* |
{$GCP.PGSQL.ZONE.NOT_MATCHES} | The filter to exclude GCP Cloud SQL PostgreSQL instances by zone. |
CHANGE_IF_NEEDED |
{$GCP.PGSQL.INST.TYPE.MATCHES} | The filter to include GCP Cloud SQL PostgreSQL instances by type (standalone/replica). |
.* |
{$GCP.PGSQL.INST.TYPE.NOT_MATCHES} | The filter to exclude GCP Cloud SQL PostgreSQL instances by type (standalone/replica). Set a macro value 'CLOUDSQLINSTANCE' to exclude standalone Instances or 'READREPLICAINSTANCE' to exclude read-only Replicas. |
CHANGE_IF_NEEDED |
{$GCP.MSSQL.INST.NAME.MATCHES} | The filter to include GCP Cloud SQL MSSQL instances by namespace. |
.* |
{$GCP.MSSQL.INST.NAME.NOT_MATCHES} | The filter to exclude GCP Cloud SQL MSSQL instances by namespace. |
CHANGE_IF_NEEDED |
{$GCP.MSSQL.ZONE.MATCHES} | The filter to include GCP Cloud SQL MSSQL instances by zone. |
.* |
{$GCP.MSSQL.ZONE.NOT_MATCHES} | The filter to exclude GCP Cloud SQL MSSQL instances by zone. |
CHANGE_IF_NEEDED |
{$GCP.MSSQL.INST.TYPE.MATCHES} | The filter to include GCP Cloud SQL MSSQL instances by type (standalone/replica). |
.* |
{$GCP.MSSQL.INST.TYPE.NOT_MATCHES} | The filter to exclude GCP Cloud SQL MSSQL instances by type (standalone/replica). Set a macro value 'CLOUDSQLINSTANCE' to exclude standalone Instances or 'READREPLICAINSTANCE' to exclude read-only Replicas. |
CHANGE_IF_NEEDED |
{$GCP.GCE.QUOTA.MATCHES} | The filter to include GCP Compute Engine project quotas by namespace. |
.* |
{$GCP.GCE.QUOTA.NOT_MATCHES} | The filter to exclude GCP Compute Engine project quotas by namespace. |
CHANGE_IF_NEEDED |
{$GCP.GCE.QUOTA.PUSED.MIN.WARN} | GCP Compute Engine project quota warning utilization threshold. |
80 |
{$GCP.GCE.QUOTA.PUSED.MIN.CRIT} | GCP Compute Engine project quota critical quota utilization threshold. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP: Authorization | Google Cloud Platform REST authorization with service account authentication parameters and temporary-generated RSA-based JWT-token usage. The necessary scopes are pre-defined. Returns a signed authorization token with 1 hour lifetime; it is required only once, and is used for all the dependent script items. Check the template documentation for the details. |
Script | gcp.authorization |
GCP Compute Engine: Instances get | Get GCP Compute Engine instances. |
Dependent item | gcp.gce.instances.get Preprocessing
|
GCP: Authorization errors check | A list of errors from API requests. |
Dependent item | gcp.auth.err.check Preprocessing
|
GCP Cloud SQL: Instances get | GCP Cloud SQL: Instances get. |
Dependent item | gcp.cloudsql.instances.get Preprocessing
|
GCP Cloud SQL: Instances total | GCP Cloud SQL instances total count. |
Dependent item | gcp.cloudsql.instances.total Preprocessing
|
GCP Cloud SQL MSSQL: Instances count | GCP Cloud SQL MSSQL instances count. |
Dependent item | gcp.cloudsql.instances.mssql_count Preprocessing
|
GCP Cloud SQL MySQL: Instances count | GCP Cloud SQL MySQL instances count. |
Dependent item | gcp.cloudsql.instances.mysql_count Preprocessing
|
GCP Cloud SQL PostgreSQL: Instances count | GCP Cloud SQL PostgreSQL instances count. |
Dependent item | gcp.cloudsql.instances.pgsql_count Preprocessing
|
GCP Compute Engine: Instances total | GCP Compute Engine instances total count. |
Dependent item | gcp.gce.instances.total Preprocessing
|
GCP Compute Engine: Regular instances count | GCP Compute Engine: Regular instances count. |
Dependent item | gcp.gce.instances.regular_count Preprocessing
|
GCP Compute Engine: Container-Optimized instances count | GCP Compute Engine: count of instances with Container-Optimized OS used. |
Dependent item | gcp.gce.instances.cos_count Preprocessing
|
GCP Compute Engine: Project quotas get | GCP Compute Engine resource quotas available for the particular project. |
Dependent item | gcp.gce.quotas.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GCP: Authorization has failed | GCP: Authorization has failed. |
length(last(/GCP by HTTP/gcp.auth.err.check)) > 0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Compute Engine: Instances discovery | GCP Compute Engine: Instances discovery. |
Dependent item | gcp.gce.inst.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL: PostgreSQL instances discovery | GCP Cloud SQL: PostgreSQL instances discovery. |
Dependent item | gcp.cloudsql.pgsql.inst.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL: MSSQL instances discovery | GCP Cloud SQL: MSSQL instances discovery. |
Dependent item | gcp.cloudsql.mssql.inst.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL: MySQL instances discovery | GCP Cloud SQL: MySQL instances discovery. |
Dependent item | gcp.cloudsql.mysql.inst.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Compute Engine: Project quotas discovery | GCP Compute Engine: Quotas discovery. |
Dependent item | gcp.gce.quotas.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}]: Raw data | GCP Compute Engine: Get metrics for [{#GCE.QUOTA.NAME}] quota. |
Dependent item | gcp.gce.quota.single.raw[{#GCE.QUOTA.NAME}] Preprocessing
|
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}]: Usage | GCP Compute Engine: The current usage value for [{#GCE.QUOTA.NAME}] quota. |
Dependent item | gcp.gce.quota.usage[{#GCE.QUOTA.NAME}] Preprocessing
|
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}]: Limit | GCP Compute Engine: The current limit value for [{#GCE.QUOTA.NAME}] quota. |
Dependent item | gcp.gce.quota.limit[{#GCE.QUOTA.NAME}] Preprocessing
|
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}]: Percentage used | GCP Compute Engine: Percentage usage for [{#GCE.QUOTA.NAME}] quota. |
Dependent item | gcp.gce.quota.pused[{#GCE.QUOTA.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}] limit has been changed | GCP Compute Engine: The limit for the |
change(/GCP by HTTP/gcp.gce.quota.limit[{#GCE.QUOTA.NAME}]) <> 0 |Info |
Manual close: Yes | |
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}] usage is close to reaching the limit | GCP Compute Engine: The usage percentage for the |
last(/GCP by HTTP/gcp.gce.quota.pused[{#GCE.QUOTA.NAME}]) >= {$GCP.GCE.QUOTA.PUSED.MIN.WARN:"{#GCE.QUOTA.NAME}"} |Warning |
Manual close: Yes Depends on:
|
|
GCP Compute Engine: Quota [{#GCE.QUOTA.NAME}] usage is critically close to reaching the limit | GCP Compute Engine: The usage percentage for the |
last(/GCP by HTTP/gcp.gce.quota.pused[{#GCE.QUOTA.NAME}]) >= {$GCP.GCE.QUOTA.PUSED.MIN.CRIT:"{#GCE.QUOTA.NAME}"} |Average |
Manual close: Yes |
This template is designed to monitor Google Cloud Platform Compute Engine instances by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$GCE.DISK.NAME.MATCHES} | The filter to include GCP Compute Engine disks by namespace. |
.* |
{$GCE.DISK.NAME.NOT_MATCHES} | The filter to exclude GCP Compute Engine disks by namespace. |
CHANGE_IF_NEEDED |
{$GCE.DISK.DEV_TYPE.MATCHES} | The filter to include GCP Compute Engine disks by device type. |
.* |
{$GCE.DISK.DEVTYPE.NOTMATCHES} | The filter to exclude GCP Compute Engine disks by device type. |
CHANGE_IF_NEEDED |
{$GCE.DISK.STOR_TYPE.MATCHES} | The filter to include GCP Compute Engine disks by storage type. |
.* |
{$GCE.DISK.STORTYPE.NOTMATCHES} | The filter to exclude GCP Compute Engine disks by storage type. |
CHANGE_IF_NEEDED |
{$GCE.CPU.UTIL.MAX} | GCP Compute Engine instance CPU utilization threshold. |
95 |
{$GCE.RAM.UTIL.MAX} | GCP Compute Engine instance RAM utilization threshold. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Compute Engine: Metrics get | GCP Compute Engine metrics get in raw format. |
Script | gcp.gce.metrics.get Preprocessing
|
GCP Compute Engine: Firewall: Dropped packets | Count of incoming packets dropped by the firewall. |
Dependent item | gcp.gce.firewall.droppedpacketscount Preprocessing
|
GCP Compute Engine: Firewall: Dropped bytes | Count of incoming bytes dropped by the firewall. |
Dependent item | gcp.gce.firewall.droppedbytescount Preprocessing
|
GCP Compute Engine: Guest visible vCPUs | Number of vCPUs visible inside the guest. For many GCE machine types, the number of vCPUs visible inside the guest is equal to the For shared-core machine types, the number of guest-visible vCPUs differs from the number of reserved cores. For example, e2-small instances have two vCPUs visible inside the guest and 0.5 fractional vCPUs reserved. Therefore, for an e2-small instance, |
Dependent item | gcp.gce.cpu.guestvisiblevcpus Preprocessing
|
GCP Compute Engine: Reserved vCPUs | Number of vCPUs reserved on the host of the instance. |
Dependent item | gcp.gce.cpu.reserved_cores Preprocessing
|
GCP Compute Engine: Scheduler wait time | Wait time is the time a vCPU is ready to run, but unexpectedly not scheduled to run. The wait time returned here is the accumulated value for all vCPUs. The time interval for which the value was measured is returned by Monitoring in whole seconds as starttime and endtime. This metric is only available for VMs that belong to the e2 family or to overcommitted VMs on sole-tenant nodes. |
Dependent item | gcp.gce.cpu.schedulerwaittime Preprocessing
|
GCP Compute Engine: CPU usage time | Delta vCPU usage for all vCPUs, in vCPU-seconds. To compute the per-vCPU utilization fraction, divide this value by (end-start)*N, where end and start define this value's time interval and N is This value is reported by the hypervisor for the VM and can differ from |
Dependent item | gcp.gce.cpu.usage_time Preprocessing
|
GCP Compute Engine: CPU utilization | Fractional utilization of allocated CPU on this instance. This metric is reported by the hypervisor for the VM and can differ from |
Dependent item | gcp.gce.cpu.utilization Preprocessing
|
GCP Compute Engine: Memory size | Total VM memory size. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types. |
Dependent item | gcp.gce.memory.ram_size Preprocessing
|
GCP Compute Engine: Memory used | Memory currently used in the VM. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types. |
Dependent item | gcp.gce.memory.ram_used Preprocessing
|
GCP Compute Engine: Memory usage percentage | Memory usage Percentage. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types. |
Dependent item | gcp.gce.memory.ram_pused Preprocessing
|
GCP Compute Engine: VM swap in | The amount of memory read into the guest from its own swap space. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types. |
Dependent item | gcp.gce.memory.swapinbytes_count Preprocessing
|
GCP Compute Engine: VM swap out | The amount of memory written from the guest to its own swap space. This metric is only available for VMs that belong to the e2 family; returns empty value for different instance types. |
Dependent item | gcp.gce.memory.swapoutbytes_count Preprocessing
|
GCP Compute Engine: Network: Received bytes | Count of bytes received from the network without load-balancing. |
Dependent item | gcp.gce.network.lb.receivedbytescount.false Preprocessing
|
GCP Compute Engine: Network: Received bytes: Load-balanced | Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used. |
Dependent item | gcp.gce.network.lb.receivedbytescount.true Preprocessing
|
GCP Compute Engine: Network: Received packets | Count of packets received from the network without load-balancing. |
Dependent item | gcp.gce.network.lb.receivedpacketscount.false Preprocessing
|
GCP Compute Engine: Network: Received packets: Load-balanced | Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used. |
Dependent item | gcp.gce.network.lb.receivedpacketscount.true Preprocessing
|
GCP Compute Engine: Network: Sent bytes | Count of bytes sent over the network without load-balancing. |
Dependent item | gcp.gce.network.lb.sentbytescount.false Preprocessing
|
GCP Compute Engine: Network: Sent bytes: Load-balanced | Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used. |
Dependent item | gcp.gce.network.lb.sentbytescount.true Preprocessing
|
GCP Compute Engine: Network: Sent packets | Count of packets sent over the network without load-balancing. |
Dependent item | gcp.gce.network.lb.sentpacketscount.false Preprocessing
|
GCP Compute Engine: Network: Sent packets: Load-balanced | Whether traffic was received by an L3 loadbalanced IP address assigned to the VM. Traffic that is externally routed to the VM's standard internal or external IP address, such as L7 loadbalanced traffic, is not considered to be loadbalanced in this metric. The value is empty when load-balancing is not used. |
Dependent item | gcp.gce.network.lb.sentpacketscount.true Preprocessing
|
GCP Compute Engine: Network: Mirrored bytes | The count of mirrored bytes. |
Dependent item | gcp.gce.network.mirroredbytescount Preprocessing
|
GCP Compute Engine: Network: Mirrored packets | The count of mirrored packets. |
Dependent item | gcp.gce.network.mirroredpacketscount Preprocessing
|
GCP Compute Engine: Network: Mirrored packets dropped: Out of quota | The count of mirrored packets dropped. Reason - out of quota. |
Dependent item | gcp.gce.network.mirrdroppedpackets.outofquota Preprocessing
|
GCP Compute Engine: Network: Mirrored packets dropped: Unknown | The count of mirrored packets dropped. Reason - unknown. |
Dependent item | gcp.gce.network.mirrdroppedpackets.unknown Preprocessing
|
GCP Compute Engine: Network: Mirrored packets dropped: Invalid | The count of mirrored packets dropped. Reason - invalid. |
Dependent item | gcp.gce.network.mirrdroppedpackets.invalid Preprocessing
|
GCP Compute Engine: Integrity: Early boot validation status | The validation status of early boot integrity policy. Empty value if integrity monitoring isn't enabled. |
Dependent item | gcp.gce.integrity.earlybootvalidation_status Preprocessing
|
GCP Compute Engine: Integrity: Late boot validation status | The validation status of late boot integrity policy. Empty value if integrity monitoring isn't enabled. |
Dependent item | gcp.gce.integrity.latebootvalidation_status Preprocessing
|
GCP Compute Engine: Instance uptime | Elapsed time since the VM was started, in seconds. |
Dependent item | gcp.gce.instance.uptime Preprocessing
|
GCP Compute Engine: Instance state | GCP Compute Engine instance state. |
HTTP agent | gcp.gce.instance.state Preprocessing
|
GCP Compute Engine: Disks get | Disk entities and metrics related to a particular instance. |
Script | gcp.gce.disks.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GCP Compute Engine: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/GCP Compute Engine Instance by HTTP/gcp.gce.cpu.utilization,15m) >= {$GCE.CPU.UTIL.MAX} |Average |
Manual close: Yes | |
GCP Compute Engine: High memory utilization | RAM utilization is too high. The system might be slow to respond. |
min(/GCP Compute Engine Instance by HTTP/gcp.gce.memory.ram_pused,15m) >= {$GCE.RAM.UTIL.MAX} |Average |
||
GCP Compute Engine: Instance is in suspended state | The VM is in a suspended state. You can resume the VM or delete it. |
last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 7 |Info |
Manual close: Yes | |
GCP Compute Engine: The instance is in repairing state | The VM is being repaired. |
last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 4 |Warning |
Manual close: Yes | |
GCP Compute Engine: The instance is in terminated state | The VM is stopped. You stopped the VM, or the VM encountered a failure. |
last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 5 |Average |
Manual close: Yes | |
GCP Compute Engine: Failed to get the instance state | Failed to get the instance state. |
last(/GCP Compute Engine Instance by HTTP/gcp.gce.instance.state) = 10 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Compute Engine: Physical disks discovery | GCP Compute Engine: Physical disks discovery. |
Dependent item | gcp.gce.phys.disks.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Compute Engine: Disk [{#GCE.DISK.NAME}]: Raw data | Data in raw format for the disk with the name [{#GCE.DISK.NAME}]. |
Dependent item | gcp.gce.quota.single.raw[{#GCE.DISK.NAME}] Preprocessing
|
GCP Compute Engine: Disk [{#GCE.DISK.NAME}]: Read bytes | Count of bytes read from [{#GCE.DISK.NAME}] disk. |
Dependent item | gcp.gce.disk.readbytescount[{#GCE.DISK.NAME}] Preprocessing
|
GCP Compute Engine: Disk [{#GCE.DISK.NAME}]: Read operations | Count of read IO operations from [{#GCE.DISK.NAME}] disk. |
Dependent item | gcp.gce.disk.readopscount[{#GCE.DISK.NAME}] Preprocessing
|
GCP Compute Engine: Disk [{#GCE.DISK.NAME}]: Write bytes | Count of bytes written to {#GCE.DISK.NAME}] disk. |
Dependent item | gcp.gce.disk.writebytescount[{#GCE.DISK.NAME}] Preprocessing
|
GCP Compute Engine: Disk [{#GCE.DISK.NAME}]: Write operations | Count of write IO operations to [{#GCE.DISK.NAME}] disk. |
Dependent item | gcp.gce.disk.writeopscount[{#GCE.DISK.NAME}] Preprocessing
|
This template is designed to monitor Google Cloud Platform Cloud SQL MySQL instances by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$CLOUD_SQL.MYSQL.DISK.UTIL.WARN} | GCP Cloud SQL MySQL instance warning disk usage threshold. |
80 |
{$CLOUD_SQL.MYSQL.DISK.UTIL.CRIT} | GCP Cloud SQL MySQL instance critical disk usage threshold. |
90 |
{$CLOUD_SQL.MYSQL.CPU.UTIL.MAX} | GCP Cloud SQL MySQL instance CPU usage threshold. |
95 |
{$CLOUD_SQL.MYSQL.RAM.UTIL.MAX} | GCP Cloud SQL MySQL instance RAM usage threshold. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MySQL: Metrics get | MySQL metrics in raw format. |
Script | gcp.cloudsql.mysql.metrics.get Preprocessing
|
GCP Cloud SQL MySQL: Reserved CPU cores | Number of cores reserved for the database. |
Dependent item | gcp.cloudsql.mysql.cpu.reserved_cores Preprocessing
|
GCP Cloud SQL MySQL: CPU usage time | Cumulative CPU usage time in seconds. |
Dependent item | gcp.cloudsql.mysql.cpu.usage_time Preprocessing
|
GCP Cloud SQL MySQL: CPU utilization | Current CPU utilization represented as a percentage of the reserved CPU that is currently in use. |
Dependent item | gcp.cloudsql.mysql.cpu.utilization Preprocessing
|
GCP Cloud SQL MySQL: Disk size | Maximum data disk size in bytes. |
Dependent item | gcp.cloudsql.mysql.disk.quota Preprocessing
|
GCP Cloud SQL MySQL: Disk bytes used | Data utilization in bytes. |
Dependent item | gcp.cloudsql.mysql.disk.bytes_used Preprocessing
|
GCP Cloud SQL MySQL: Disk read I/O | Delta count of data disk read I/O operations. |
Dependent item | gcp.cloudsql.mysql.disk.readopscount Preprocessing
|
GCP Cloud SQL MySQL: Disk write I/O | Delta count of data disk write I/O operations. |
Dependent item | gcp.cloudsql.mysql.disk.writeopscount Preprocessing
|
GCP Cloud SQL MySQL: Disk utilization | The fraction of the disk quota that is currently in use. Shown as percentage. |
Dependent item | gcp.cloudsql.mysql.disk.utilization Preprocessing
|
GCP Cloud SQL MySQL: Memory size | Maximum RAM size in bytes. |
Dependent item | gcp.cloudsql.mysql.memory.quota Preprocessing
|
GCP Cloud SQL MySQL: Memory used by DB engine | Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache. |
Dependent item | gcp.cloudsql.mysql.memory.total_usage Preprocessing
|
GCP Cloud SQL MySQL: Memory usage | The RAM usage in bytes. This metric reports the RAM usage of the server, excluding the buffer/cache. |
Dependent item | gcp.cloudsql.mysql.memory.usage Preprocessing
|
GCP Cloud SQL MySQL: Memory utilization | The fraction of the memory quota that is currently in use. Shown as percentage. |
Dependent item | gcp.cloudsql.mysql.memory.utilization Preprocessing
|
GCP Cloud SQL MySQL: Network: Received bytes | Delta count of bytes received through the network. |
Dependent item | gcp.cloudsql.mysql.network.receivedbytescount Preprocessing
|
GCP Cloud SQL MySQL: Network: Sent bytes | Delta count of bytes sent through the network. |
Dependent item | gcp.cloudsql.mysql.network.sentbytescount Preprocessing
|
GCP Cloud SQL MySQL: Connections | Number of connections to the databases on the Cloud SQL instance. |
Dependent item | gcp.cloudsql.mysql.network.connections Preprocessing
|
GCP Cloud SQL MySQL: Instance state | GCP Cloud SQL MySQL Current instance state. |
HTTP agent | gcp.cloudsql.mysql.inst.state Preprocessing
|
GCP Cloud SQL MySQL: DB engine state | GCP Cloud SQL MySQL DB Engine State. |
HTTP agent | gcp.cloudsql.mysql.db.state Preprocessing
|
GCP Cloud SQL MySQL: InnoDB dirty pages | Number of unflushed pages in the InnoDB buffer pool. |
Dependent item | gcp.cloudsql.mysql.innodbbufferpoolpagesdirty Preprocessing
|
GCP Cloud SQL MySQL: InnoDB free pages | Number of unused pages in the InnoDB buffer pool. |
Dependent item | gcp.cloudsql.mysql.innodbbufferpoolpagesfree Preprocessing
|
GCP Cloud SQL MySQL: InnoDB total pages | Total number of pages in the InnoDB buffer pool. |
Dependent item | gcp.cloudsql.mysql.innodbbufferpoolpagestotal Preprocessing
|
GCP Cloud SQL MySQL: InnoDB fsync calls | Delta count of InnoDB fsync() calls. |
Dependent item | gcp.cloudsql.mysql.innodbdatafsyncs Preprocessing
|
GCP Cloud SQL MySQL: InnoDB log fsync calls | Delta count of InnoDB fsync() calls to the log file. |
Dependent item | gcp.cloudsql.mysql.innodboslog_fsyncs Preprocessing
|
GCP Cloud SQL MySQL: InnoDB pages read | Delta count of InnoDB pages read. |
Dependent item | gcp.cloudsql.mysql.innodbpagesread Preprocessing
|
GCP Cloud SQL MySQL: InnoDB pages written | Delta count of InnoDB pages written. |
Dependent item | gcp.cloudsql.mysql.innodbpageswritten Preprocessing
|
GCP Cloud SQL MySQL: Open tables | The number of tables that are currently open. |
Dependent item | gcp.cloudsql.mysql.open_tables Preprocessing
|
GCP Cloud SQL MySQL: Open table definitions | The number of table definitions that are currently cached. |
Dependent item | gcp.cloudsql.mysql.opentabledefinitions Preprocessing
|
GCP Cloud SQL MySQL: Queries | Delta of statements executed by the server. |
Dependent item | gcp.cloudsql.queries Preprocessing
|
GCP Cloud SQL MySQL: Questions | Delta of statements executed by the server sent by the client. |
Dependent item | gcp.cloudsql.questions Preprocessing
|
GCP Cloud SQL MySQL: Network: Bytes received by MySQL | Delta count of bytes received by MySQL process. |
Dependent item | gcp.cloudsql.mysqlreceivedbytes_count Preprocessing
|
GCP Cloud SQL MySQL: Network: Bytes sent by MySQL | Delta count of bytes sent by MySQL process. |
Dependent item | gcp.cloudsql.mysqlsentbytes_count Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GCP Cloud SQL MySQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.cpu.utilization,5m) >= {$CLOUD_SQL.MYSQL.CPU.UTIL.MAX} |Average |
||
GCP Cloud SQL MySQL: Disk space is low | High utilization of the storage space. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.disk.utilization) >= {$CLOUD_SQL.MYSQL.DISK.UTIL.WARN} |Warning |
Depends on:
|
|
GCP Cloud SQL MySQL: Disk space is critically low | Critical utilization of the disk space. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.disk.utilization) >= {$CLOUD_SQL.MYSQL.DISK.UTIL.CRIT} |Average |
||
GCP Cloud SQL MySQL: High memory utilization | RAM utilization is too high. The system might be slow to respond. |
min(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.memory.utilization,5m) >= {$CLOUD_SQL.MYSQL.RAM.UTIL.MAX} |High |
||
GCP Cloud SQL MySQL: Instance is in suspended state | The instance is in suspended state. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 1 |Warning |
||
GCP Cloud SQL MySQL: Instance is stopped by the owner | The instance has been stopped by the owner. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 2 |Info |
||
GCP Cloud SQL MySQL: Instance is in maintenance | The instance is down for maintenance. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 4 |Info |
||
GCP Cloud SQL MySQL: Instance is in failed state | The instance creation failed, or an operation left the instance in an own bad state. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 5 |Average |
||
GCP Cloud SQL MySQL: Instance is in unknown state | The state of the instance is unknown. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 6 |Average |
||
GCP Cloud SQL MySQL: Failed to get the instance state | Failed to get the instance state. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.inst.state) = 10 |Average |
||
GCP Cloud SQL MySQL: Database engine is down | Database engine is down. |
last(/GCP Cloud SQL MySQL by HTTP/gcp.cloudsql.mysql.db.state)=0 |Average |
Depends on:
|
This template is designed to monitor Google Cloud Platform Cloud SQL metrics for the MySQL read-only replica instances by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MySQL: Replica metrics get | MySQL replication metrics data in raw format. |
Script | gcp.cloudsql.mysql.repl.metrics.get Preprocessing
|
GCP Cloud SQL MySQL: Last I/O thread error number | The error number of the most recent error that caused the I/O thread to stop. |
Dependent item | gcp.cloudsql.mysql.repl.lastioerrno Preprocessing
|
GCP Cloud SQL MySQL: Last SQL thread error number | The error number of the most recent error that caused the SQL thread to stop. |
Dependent item | gcp.cloudsql.mysql.repl.lastsqlerrno Preprocessing
|
GCP Cloud SQL MySQL: Replication lag | Number of seconds the read replica is behind its primary (approximation). |
Dependent item | gcp.cloudsql.mysql.repl.replica_lag Preprocessing
|
GCP Cloud SQL MySQL: Network lag | Indicates time taken from primary binary log to IO thread on replica. |
Dependent item | gcp.cloudsql.mysql.repl.network_lag Preprocessing
|
GCP Cloud SQL MySQL: Replication state | The current serving state of replication. This metric is only available for the MySQL/PostgreSQL instances. |
Dependent item | gcp.cloudsql.mysql.repl.state Preprocessing
|
GCP Cloud SQL MySQL: Slave I/O thread running | Indicates whether the I/O thread for reading the primary's binary log is running. Possible values are Yes, No and Connecting. |
Dependent item | gcp.cloudsql.mysql.repl.slaveiorunning Preprocessing
|
GCP Cloud SQL MySQL: Slave SQL thread running | Indicates whether the SQL thread for executing events in the relay log is running. |
Dependent item | gcp.cloudsql.mysql.repl.slavesqlrunning Preprocessing
|
This template is designed to monitor Google Cloud Platform Cloud SQL PostgreSQL database metrics by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$GCP.CLOUD_SQL.DB.NAME.MATCHES} | The filter to include GCP Cloud SQL PostgreSQL databases by namespace. |
.* |
{$GCP.CLOUDSQL.DB.NAME.NOTMATCHES} | The filter to exclude GCP Cloud SQL PostgreSQL databases by namespace. |
CHANGE_IF_NEEDED |
{$CLOUD_SQL.PGSQL.DISK.UTIL.WARN} | GCP Cloud SQL PostgreSQL instance warning disk usage threshold. |
80 |
{$CLOUD_SQL.PGSQL.DISK.UTIL.CRIT} | GCP Cloud SQL PostgreSQL instance critical disk usage threshold. |
90 |
{$CLOUD_SQL.PGSQL.CPU.UTIL.MAX} | GCP Cloud SQL PostgreSQL instance CPU usage threshold. |
95 |
{$CLOUD_SQL.PGSQL.RAM.UTIL.MAX} | GCP Cloud SQL PostgreSQL instance RAM usage threshold. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL PostgreSQL: Metrics get | PostgreSQL metrics data in raw format. |
Script | gcp.cloudsql.pgsql.metrics.get Preprocessing
|
GCP Cloud SQL PostgreSQL: Reserved CPU cores | Number of cores reserved for the database. |
Dependent item | gcp.cloudsql.pgsql.cpu.reserved_cores Preprocessing
|
GCP Cloud SQL PostgreSQL: CPU usage time | Cumulative CPU usage time in seconds. |
Dependent item | gcp.cloudsql.pgsql.cpu.usage_time Preprocessing
|
GCP Cloud SQL PostgreSQL: CPU utilization | Current CPU utilization represented as a percentage of the reserved CPU that is currently in use. |
Dependent item | gcp.cloudsql.pgsql.cpu.utilization Preprocessing
|
GCP Cloud SQL PostgreSQL: Disk size | Maximum data disk size in bytes. |
Dependent item | gcp.cloudsql.pgsql.disk.quota Preprocessing
|
GCP Cloud SQL PostgreSQL: Disk bytes used | Data utilization in bytes. |
Dependent item | gcp.cloudsql.pgsql.disk.bytes_used Preprocessing
|
GCP Cloud SQL PostgreSQL: Disk read I/O | Delta count of data disk read I/O operations. |
Dependent item | gcp.cloudsql.pgsql.disk.readopscount Preprocessing
|
GCP Cloud SQL PostgreSQL: Disk write I/O | Delta count of data disk write I/O operations. |
Dependent item | gcp.cloudsql.pgsql.disk.writeopscount Preprocessing
|
GCP Cloud SQL PostgreSQL: Disk utilization | The fraction of the disk quota that is currently in use. Shown as percentage. |
Dependent item | gcp.cloudsql.pgsql.disk.utilization Preprocessing
|
GCP Cloud SQL PostgreSQL: Memory size | Maximum RAM size in bytes. |
Dependent item | gcp.cloudsql.pgsql.memory.quota Preprocessing
|
GCP Cloud SQL PostgreSQL: Memory used by DB engine | Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache. |
Dependent item | gcp.cloudsql.pgsql.memory.total_usage Preprocessing
|
GCP Cloud SQL PostgreSQL: Memory usage | The RAM usage in bytes. This metric reports the RAM usage of the server, excluding the buffer/cache. |
Dependent item | gcp.cloudsql.pgsql.memory.usage Preprocessing
|
GCP Cloud SQL PostgreSQL: Memory utilization | The fraction of the memory quota that is currently in use. Shown as percentage. |
Dependent item | gcp.cloudsql.pgsql.memory.utilization Preprocessing
|
GCP Cloud SQL PostgreSQL: Network: Received bytes | Delta count of bytes received through the network. |
Dependent item | gcp.cloudsql.pgsql.network.receivedbytescount Preprocessing
|
GCP Cloud SQL PostgreSQL: Network: Sent bytes | Delta count of bytes sent through the network. |
Dependent item | gcp.cloudsql.pgsql.network.sentbytescount Preprocessing
|
GCP Cloud SQL PostgreSQL: Instance state | GCP Cloud SQL PostgreSQL Current instance state. |
HTTP agent | gcp.cloudsql.pgsql.inst.state Preprocessing
|
GCP Cloud SQL PostgreSQL: DB engine state | GCP Cloud SQL PostgreSQL DB Engine State. |
HTTP agent | gcp.cloudsql.pgsql.db.state Preprocessing
|
GCP Cloud SQL PostgreSQL: Transaction ID utilization | Current utilization represented as a percentage of transaction IDs consumed by the Cloud SQL PostgreSQL instance. |
Dependent item | gcp.cloudsql.pgsql.transactionidutilization Preprocessing
|
GCP Cloud SQL PostgreSQL: Assigned transactions | Delta count of assigned transaction IDs. |
Dependent item | gcp.cloudsql.pgsql.transactionidcount_assigned Preprocessing
|
GCP Cloud SQL PostgreSQL: Frozen transactions | Delta count of frozen transaction IDs. |
Dependent item | gcp.cloudsql.pgsql.transactionidcount_frozen Preprocessing
|
GCP Cloud SQL PostgreSQL: Data written to temporary | Total data size (in bytes) written to temporary files by the queries. |
Dependent item | gcp.cloudsql.pgsql.tempbyteswritten_count Preprocessing
|
GCP Cloud SQL PostgreSQL: Temporary files used for writing data | Total number of temporary files used for writing data while performing algorithms such as join and sort. |
Dependent item | gcp.cloudsql.pgsql.tempfileswritten_count Preprocessing
|
GCP Cloud SQL PostgreSQL: Oldest running transaction age | Age of the oldest running transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type. |
Dependent item | gcp.cloudsql.pgsql.oldest_transaction.running Preprocessing
|
GCP Cloud SQL PostgreSQL: Oldest prepared transaction age | Age of the oldest prepared transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type. |
Dependent item | gcp.cloudsql.pgsql.oldest_transaction.prepared Preprocessing
|
GCP Cloud SQL PostgreSQL: Oldest replication slot transaction age | Age of the oldest replication slot transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type. |
Dependent item | gcp.cloudsql.pgsql.oldesttransaction.replicationslot Preprocessing
|
GCP Cloud SQL PostgreSQL: Oldest replica transaction age | Age of the oldest replica transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance, measured in number of transactions that have happened since the oldest transaction. Empty value when there is no such transaction type. |
Dependent item | gcp.cloudsql.pgsql.oldest_transaction.replica Preprocessing
|
GCP Cloud SQL PostgreSQL: Connections | The number of the connections to the Cloud SQL PostgreSQL instance. Includes connections to the system databases, which aren't visible by default. |
Dependent item | gcp.cloudsql.pgsql.num_backends Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GCP Cloud SQL PostgreSQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.cpu.utilization,5m) >= {$CLOUD_SQL.PGSQL.CPU.UTIL.MAX} |Average |
||
GCP Cloud SQL PostgreSQL: Disk space is low | High utilization of the storage space. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.disk.utilization) >= {$CLOUD_SQL.PGSQL.DISK.UTIL.WARN} |Warning |
Depends on:
|
|
GCP Cloud SQL PostgreSQL: Disk space is critically low | Critical utilization of the disk space. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.disk.utilization) >= {$CLOUD_SQL.PGSQL.DISK.UTIL.CRIT} |Average |
||
GCP Cloud SQL PostgreSQL: High memory utilization | RAM utilization is too high. The system might be slow to respond. |
min(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.memory.utilization,5m) >= {$CLOUD_SQL.PGSQL.RAM.UTIL.MAX} |High |
||
GCP Cloud SQL PostgreSQL: Instance is in suspended state | The instance is in suspended state. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 1 |Warning |
||
GCP Cloud SQL PostgreSQL: Instance is stopped by the owner | The instance has been stopped by the owner. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 2 |Info |
||
GCP Cloud SQL PostgreSQL: Instance is in maintenance | The instance is down for maintenance. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 4 |Info |
||
GCP Cloud SQL PostgreSQL: Instance is in failed state | The instance creation failed, or an operation left the instance in an own bad state. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 5 |Average |
||
GCP Cloud SQL PostgreSQL: Instance is in unknown state | The state of the instance is unknown. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 6 |Average |
||
GCP Cloud SQL PostgreSQL: Failed to get the instance state | Failed to get the instance state. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.inst.state) = 10 |Average |
||
GCP Cloud SQL PostgreSQL: Database engine is down | Database engine is down. |
last(/GCP Cloud SQL PostgreSQL by HTTP/gcp.cloudsql.pgsql.db.state)=0 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL PostgreSQL: Databases discovery | Databases discovery for the particular PostgreSQL instance. |
HTTP agent | gcp.cloudsql.pgsql.db.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Metrics raw | PostgreSQL metrics in raw format. |
Script | gcp.cloudsql.pgsql.db.metrics.get[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Deadlocks count | Number of deadlocks detected in the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.deadlock_count[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Tuples returned | Total number of rows scanned while processing the queries of the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.tuplesreturnedcount[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Tuples fetched | Total number of rows fetched as a result of queries to the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.tuplesfetchedcount[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Committed transactions | Delta count of number of committed transactions to the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.transactioncountcommit[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Rolled-back transactions | Delta count of number of rolled-back transactions in the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.transactioncountrollback[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Buffer cache blocks read. | Number of buffer cache blocks read by the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.blocksreadcountbuffercache[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Disk blocks read. | Number of disk blocks read by the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.blocksreadcount_disk[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Inserted rows processed. | Number of tuples(rows) processed for insert operations for the database with the name [{#PGSQL.DB.NAME}]. |
Dependent item | gcp.cloudsql.pgsql.tuplesprocessedcount_insert[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Deleted rows processed | Number of tuples(rows) processed for delete operations for the database with the name [{#PGSQL.DB.NAME}]. |
Dependent item | gcp.cloudsql.pgsql.tuplesprocessedcount_delete[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Updated rows processed | Number of tuples(rows) processed for update operations for the database with the name [{#PGSQL.DB.NAME}]. |
Dependent item | gcp.cloudsql.pgsql.tuplesprocessedcount_update[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Live tuples | Number of live tuples(rows) in the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.tuplesizelive[{#PGSQL.DB.NAME}] Preprocessing
|
GCP Cloud SQL PostgreSQL: Database [{#PGSQL.DB.NAME}]: Dead tuples | Number of live tuples(rows) in the [{#PGSQL.DB.NAME}] database. |
Dependent item | gcp.cloudsql.pgsql.tuplesizedead[{#PGSQL.DB.NAME}] Preprocessing
|
This template is designed to monitor Google Cloud Platform Cloud SQL PostgreSQL read-only replica instances by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL PostgreSQL: Replica metrics get | PostgreSQL replica metrics data in raw format. |
Script | gcp.cloudsql.pgsql.repl.metrics.get Preprocessing
|
GCP Cloud SQL PostgreSQL: Network lag | Indicates time taken from primary binary log to IO thread on replica. |
Dependent item | gcp.cloudsql.pgsql.repl.network_lag Preprocessing
|
GCP Cloud SQL PostgreSQL: Replication lag | Number of seconds the read replica is behind its primary (approximation). |
Dependent item | gcp.cloudsql.pgsql.repl.replica_lag Preprocessing
|
GCP Cloud SQL PostgreSQL: Replication state | The current serving state of replication. This metric is only available for the MySQL/PostgreSQL instances. |
Dependent item | gcp.cloudsql.pgsql.repl.state Preprocessing
|
GCP Cloud SQL PostgreSQL: Replay location lag | Replay location replication lag in bytes. |
Dependent item | gcp.cloudsql.pgsql.repl.replay_location Preprocessing
|
GCP Cloud SQL PostgreSQL: Write location lag | Write location replication lag in bytes. |
Dependent item | gcp.cloudsql.pgsql.repl.write_location Preprocessing
|
GCP Cloud SQL PostgreSQL: Flush location lag | Flush location replication lag in bytes. |
Dependent item | gcp.cloudsql.pgsql.repl.flush_location Preprocessing
|
GCP Cloud SQL PostgreSQL: Sent location lag | Sent location replication lag in bytes. |
Dependent item | gcp.cloudsql.pgsql.repl.sent_location Preprocessing
|
GCP Cloud SQL PostgreSQL: Number of log archival failures | Number of failed attempts for archiving replication log files. |
Dependent item | gcp.cloudsql.pgsql.repl.logarchivefailure_count Preprocessing
|
GCP Cloud SQL PostgreSQL: Number of log archival successes | Number of failed attempts for archiving replication log files. |
Dependent item | gcp.cloudsql.pgsql.repl.logarchivesuccess_count Preprocessing
|
This template is designed to monitor Google Cloud Platform Cloud SQL MSSQL instances by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$CLOUD_SQL.MSSQL.RES.NAME.MATCHES} | The filter to include GCP Cloud SQL MSSQL resources by namespace. |
.* |
{$CLOUDSQL.MSSQL.RES.NAME.NOTMATCHES} | The filter to exclude GCP Cloud SQL MSSQL resources by namespace. |
CHANGE_IF_NEEDED |
{$CLOUD_SQL.MSSQL.DB.NAME.MATCHES} | The filter to include GCP Cloud SQL MSSQL databases by namespace. |
.* |
{$CLOUDSQL.MSSQL.DB.NAME.NOTMATCHES} | The filter to exclude GCP Cloud SQL MSSQL databases by namespace. |
CHANGE_IF_NEEDED |
{$CLOUD_SQL.MSSQL.SCHEDULER.ID.MATCHES} | The filter to include GCP Cloud SQL MSSQL schedulers by namespace. |
.* |
{$CLOUDSQL.MSSQL.SCHEDULER.ID.NOTMATCHES} | The filter to exclude GCP Cloud SQL MSSQL schedulers by namespace. |
CHANGE_IF_NEEDED |
{$CLOUD_SQL.MSSQL.DISK.UTIL.WARN} | GCP Cloud SQL MSSQL instance warning disk usage threshold. |
80 |
{$CLOUD_SQL.MSSQL.DISK.UTIL.CRIT} | GCP Cloud SQL MSSQL instance critical disk usage threshold. |
90 |
{$CLOUD_SQL.MSSQL.CPU.UTIL.MAX} | GCP Cloud SQL MSSQL instance CPU usage threshold. |
95 |
{$CLOUD_SQL.MSSQL.RAM.UTIL.MAX} | GCP Cloud SQL MSSQL instance RAM usage threshold. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MSSQL: Metrics get | MSSQL metrics data in raw format. |
Script | gcp.cloudsql.mssql.metrics.get Preprocessing
|
GCP Cloud SQL MSSQL: Reserved CPU cores | Number of cores reserved for the database. |
Dependent item | gcp.cloudsql.mssql.cpu.reserved_cores Preprocessing
|
GCP Cloud SQL MSSQL: CPU usage time | Cumulative CPU usage time in seconds. |
Dependent item | gcp.cloudsql.mssql.cpu.usage_time Preprocessing
|
GCP Cloud SQL MSSQL: CPU utilization | Current CPU utilization represented as a percentage of the reserved CPU that is currently in use. |
Dependent item | gcp.cloudsql.mssql.cpu.utilization Preprocessing
|
GCP Cloud SQL MSSQL: Disk size | Maximum data disk size in bytes. |
Dependent item | gcp.cloudsql.mssql.disk.quota Preprocessing
|
GCP Cloud SQL MSSQL: Disk bytes used | Data utilization in bytes. |
Dependent item | gcp.cloudsql.mssql.disk.bytes_used Preprocessing
|
GCP Cloud SQL MSSQL: Disk read I/O | Delta count of data disk read I/O operations. |
Dependent item | gcp.cloudsql.mssql.disk.readopscount Preprocessing
|
GCP Cloud SQL MSSQL: Disk write I/O | Delta count of data disk write I/O operations. |
Dependent item | gcp.cloudsql.mssql.disk.writeopscount Preprocessing
|
GCP Cloud SQL MSSQL: Disk utilization | The fraction of the disk quota that is currently in use. Shown as percentage. |
Dependent item | gcp.cloudsql.mssql.disk.utilization Preprocessing
|
GCP Cloud SQL MSSQL: Memory size | Maximum RAM size in bytes. |
Dependent item | gcp.cloudsql.mssql.memory.quota Preprocessing
|
GCP Cloud SQL MSSQL: Memory used by DB engine | Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache. |
Dependent item | gcp.cloudsql.mssql.memory.total_usage Preprocessing
|
GCP Cloud SQL MSSQL: Memory usage | The RAM usage in bytes. This metric reports the RAM usage of the server, excluding the buffer/cache. |
Dependent item | gcp.cloudsql.mssql.memory.usage Preprocessing
|
GCP Cloud SQL MSSQL: Memory utilization | The fraction of the memory quota that is currently in use. Shown as percentage. |
Dependent item | gcp.cloudsql.mssql.memory.utilization Preprocessing
|
GCP Cloud SQL MSSQL: Network: Received bytes | Delta count of bytes received through the network. |
Dependent item | gcp.cloudsql.mssql.network.receivedbytescount Preprocessing
|
GCP Cloud SQL MSSQL: Network: Sent bytes | Delta count of bytes sent through the network. |
Dependent item | gcp.cloudsql.mssql.network.sentbytescount Preprocessing
|
GCP Cloud SQL MSSQL: Connections | Number of connections to the databases on the Cloud SQL instance. |
Dependent item | gcp.cloudsql.mssql.network.connections Preprocessing
|
GCP Cloud SQL MSSQL: Instance state | GCP Cloud SQL MSSQL Current instance state. |
HTTP agent | gcp.cloudsql.mssql.inst.state Preprocessing
|
GCP Cloud SQL MSSQL: DB engine state | GCP Cloud SQL MSSQL DB Engine State. |
HTTP agent | gcp.cloudsql.mssql.db.state Preprocessing
|
GCP Cloud SQL MSSQL: Connection resets | Total number of login operations started from the connection pool since the last restart of SQL Server service. |
Dependent item | gcp.cloudsql.mssql.conn.connectionresetcount Preprocessing
|
GCP Cloud SQL MSSQL: Login attempts | Total number of login attempts since the last restart of SQL Server service. This does not include pooled connections. |
Dependent item | gcp.cloudsql.mssql.conn.loginattemptcount Preprocessing
|
GCP Cloud SQL MSSQL: Logouts | Total number of logout operations since the last restart of SQL Server service. |
Dependent item | gcp.cloudsql.mssql.conn.logout_count Preprocessing
|
GCP Cloud SQL MSSQL: Processes blocked | Current number of blocked processes. |
Dependent item | gcp.cloudsql.mssql.conn.processes_blocked Preprocessing
|
GCP Cloud SQL MSSQL: Buffer cache hit ratio | Current percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups. |
Dependent item | gcp.cloudsql.mssql.memory.buffercachehit_ratio Preprocessing
|
GCP Cloud SQL MSSQL: Checkpoint pages | Total number of pages flushed to disk by a checkpoint or other operation that requires all dirty pages to be flushed. |
Dependent item | gcp.cloudsql.mssql.memory.checkpointpagecount Preprocessing
|
GCP Cloud SQL MSSQL: Free list stalls | Total number of requests that had to wait for a free page. |
Dependent item | gcp.cloudsql.mssql.memory.freeliststall_count Preprocessing
|
GCP Cloud SQL MSSQL: Lazy writes | Total number of buffers written by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and makes them available to user processes. |
Dependent item | gcp.cloudsql.mssql.memory.lazywritecount Preprocessing
|
GCP Cloud SQL MSSQL: Memory grants pending | Current number of processes waiting for a workspace memory grant. |
Dependent item | gcp.cloudsql.mssql.memory.memorygrantspending Preprocessing
|
GCP Cloud SQL MSSQL: Page life expectancy | Current number of seconds a page will stay in the buffer pool without references. |
Dependent item | gcp.cloudsql.mssql.memory.pagelifeexpectancy Preprocessing
|
GCP Cloud SQL MSSQL: Batch requests | Total number of Transact-SQL command batches received. |
Dependent item | gcp.cloudsql.mssql.trans.batchrequestcount Preprocessing
|
GCP Cloud SQL MSSQL: Forwarded records | Total number of records fetched through forwarded record pointers. |
Dependent item | gcp.cloudsql.mssql.trans.forwardedrecordcount Preprocessing
|
GCP Cloud SQL MSSQL: Full scans | Total number of unrestricted full scans. These can be either base-table or full-index scans. |
Dependent item | gcp.cloudsql.mssql.trans.fullscancount Preprocessing
|
GCP Cloud SQL MSSQL: Page splits | Total number of page splits that occur as the result of overflowing index pages. |
Dependent item | gcp.cloudsql.mssql.trans.pagesplitcount Preprocessing
|
GCP Cloud SQL MSSQL: Probe scans | Total number of probe scans that are used to find at least one single qualified row in an index or base table directly. |
Dependent item | gcp.cloudsql.mssql.trans.probescancount Preprocessing
|
GCP Cloud SQL MSSQL: SQL compilations | Total number of SQL compilations. |
Dependent item | gcp.cloudsql.mssql.trans.sqlcompilationcount Preprocessing
|
GCP Cloud SQL MSSQL: SQL recompilations | Total number of SQL recompilations. |
Dependent item | gcp.cloudsql.mssql.trans.sqlrecompilationcount Preprocessing
|
GCP Cloud SQL MSSQL: Read page operations | Total number of physical database page reads. This metric counts physical page reads across all databases. |
Dependent item | gcp.cloudsql.mssql.memory.page_ops.read Preprocessing
|
GCP Cloud SQL MSSQL: Write age operations | Total number of physical database page writes. This metric counts physical page writes across all databases. |
Dependent item | gcp.cloudsql.mssql.memory.page_ops.write Preprocessing
|
GCP Cloud SQL MSSQL: Audits size | Tracks the size in bytes of stored SQLServer audit files on an instance. Empty value if there are no audits enabled. |
Dependent item | gcp.cloudsql.mssql.audits_size Preprocessing
|
GCP Cloud SQL MSSQL: Audits successfully uploaded | Tracks the size in bytes of stored SQLServer audit files on an instance. Empty value if there are no audits enabled. |
Dependent item | gcp.cloudsql.mssql.auditsuploadcount Preprocessing
|
GCP Cloud SQL MSSQL: Resources get | MSSQL resources data in raw format. |
Script | gcp.cloudsql.mssql.resources.get Preprocessing
|
GCP Cloud SQL MSSQL: Databases get | MSSQL databases data in raw format. |
Script | gcp.cloudsql.mssql.db.get Preprocessing
|
GCP Cloud SQL MSSQL: Schedulers get | MSSQL schedulers data in raw format. |
Script | gcp.cloudsql.mssql.schedulers.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GCP Cloud SQL MSSQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.cpu.utilization,5m) >= {$CLOUD_SQL.MSSQL.CPU.UTIL.MAX} |Average |
||
GCP Cloud SQL MSSQL: Disk space is low | High utilization of the storage space. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.disk.utilization) >= {$CLOUD_SQL.MSSQL.DISK.UTIL.WARN} |Warning |
Depends on:
|
|
GCP Cloud SQL MSSQL: Disk space is critically low | Critical utilization of the disk space. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.disk.utilization) >= {$CLOUD_SQL.MSSQL.DISK.UTIL.CRIT} |Average |
||
GCP Cloud SQL MSSQL: High memory utilization | RAM utilization is too high. The system might be slow to respond. |
min(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.memory.utilization,5m) >= {$CLOUD_SQL.MSSQL.RAM.UTIL.MAX} |High |
||
GCP Cloud SQL MSSQL: Instance is in suspended state | The instance is in suspended state. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 1 |Warning |
||
GCP Cloud SQL MSSQL: Instance is stopped by the owner | The instance has been stopped by the owner. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 2 |Info |
||
GCP Cloud SQL MSSQL: Instance is in maintenance | The instance is down for maintenance. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 4 |Info |
||
GCP Cloud SQL MSSQL: Instance is in failed state | The instance creation failed, or an operation left the instance in an own bad state. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 5 |Average |
||
GCP Cloud SQL MSSQL: Instance is in unknown state | The state of the instance is unknown. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 6 |Average |
||
GCP Cloud SQL MSSQL: Failed to get the instance state | Failed to get the instance state. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.inst.state) = 10 |Average |
||
GCP Cloud SQL MSSQL: Database engine is down | Database engine is down. |
last(/GCP Cloud SQL MSSQL by HTTP/gcp.cloudsql.mssql.db.state)=0 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Resources discovery | Resources discovery. |
Dependent item | gcp.cloudsql.resources.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MSSQL: Resource [{#RESOURCE.NAME}]: Raw data | Data in raw format for the [{#RESOURCE.NAME}] resource. |
Dependent item | gcp.cloudsql.mssql.resource.raw[{#RESOURCE.NAME}] Preprocessing
|
GCP Cloud SQL MSSQL: Resource [{#RESOURCE.NAME}]: Deadlocks | Total number of lock requests that resulted in a deadlock for the [{#RESOURCE.NAME}] resource. |
Dependent item | gcp.cloudsql.mssql.resource.deadlock_count[{#RESOURCE.NAME}] Preprocessing
|
GCP Cloud SQL MSSQL: Resource [{#RESOURCE.NAME}]: Lock waits | Total number of lock requests that required the caller to wait for the [{#RESOURCE.NAME}] resource. |
Dependent item | gcp.cloudsql.mssql.resource.lockwaitcount[{#RESOURCE.NAME}] Preprocessing
|
GCP Cloud SQL MSSQL: Resource [{#RESOURCE.NAME}]: Lock wait time | Total time lock requests were waiting for locks for the [{#RESOURCE.NAME}] resource. |
Dependent item | gcp.cloudsql.mssql.resource.lockwaittime[{#RESOURCE.NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Databases discovery. |
Dependent item | gcp.cloudsql.db.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MSSQL: Database [{#DB.NAME}]: Raw data | Data in raw format for the [{#DB.NAME}] database. |
Dependent item | gcp.cloudsql.mssql.db.raw[{#DB.NAME}] Preprocessing
|
GCP Cloud SQL MSSQL: Database [{#DB.NAME}]: Log bytes flushed | Total number of log bytes flushed for the [{#DB.NAME}] database. |
Dependent item | gcp.cloudsql.mssql.db.logbytesflushed_count[{#DB.NAME}] Preprocessing
|
GCP Cloud SQL MSSQL: Database [{#DB.NAME}]: Transactions started | Total number of transactions started for the [{#DB.NAME}] database. |
Dependent item | gcp.cloudsql.mssql.db.transaction_count[{#DB.NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Schedulers discovery | Schedulers discovery. |
Dependent item | gcp.cloudsql.schedulers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Raw data | Data in raw format associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. |
Dependent item | gcp.cloudsql.mssql.scheduler.raw[{#SCHEDULER.ID}] Preprocessing
|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Active workers | Current number of active workers associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. An active worker is never preemptive, must have an associated task, and is either running, runnable, or suspended. |
Dependent item | gcp.cloudsql.mssql.scheduler.active_workers[{#SCHEDULER.ID}] Preprocessing
|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Current tasks | Current number of present tasks associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. This count includes tasks that are waiting for a worker to execute them and tasks that are currently waiting or running (in SUSPENDED or RUNNABLE state). |
Dependent item | gcp.cloudsql.mssql.scheduler.current_tasks[{#SCHEDULER.ID}] Preprocessing
|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Current workers | Current number of workers associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. It includes workers that are not assigned any task. |
Dependent item | gcp.cloudsql.mssql.scheduler.current_workers[{#SCHEDULER.ID}] Preprocessing
|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Pending I/O operations | Current number of pending I/Os waiting to be completed that are associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. Each scheduler has a list of pending I/Os that are checked to determine whether they have been completed every time there is a context switch. The count is incremented when the request is inserted. This count is decremented when the request is completed. This number does not indicate the state of the I/Os. |
Dependent item | gcp.cloudsql.mssql.scheduler.pendingdiskio[{#SCHEDULER.ID}] Preprocessing
|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Runnable tasks | Current number of workers that are associated with the scheduler that goes by its ID [{#SCHEDULER.ID}] and have assigned tasks waiting to be scheduled on the runnable queue. |
Dependent item | gcp.cloudsql.mssql.scheduler.runnable_tasks[{#SCHEDULER.ID}] Preprocessing
|
GCP Cloud SQL MSSQL: Scheduler [{#SCHEDULER.ID}]: Work queue | Current number of tasks in the pending queue associated with the scheduler that goes by its ID [{#SCHEDULER.ID}]. These tasks are waiting for a worker to pick them up. |
Dependent item | gcp.cloudsql.mssql.scheduler.work_queue[{#SCHEDULER.ID}] Preprocessing
|
This template is designed to monitor Google Cloud Platform Cloud SQL MSSQL read-only replica instances by Zabbix.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template will be automatically connected to discovered entities with all their required parameters pre-defined.
Name | Description | Default |
---|---|---|
{$GCP.DATA.TIMEOUT} | A response timeout for an API. |
15s |
{$GCP.TIME.WINDOW} | Time interval for the data requests. Supported usage type: 1. The default update interval for most of the items. 2. The minimal time window for the data requested in the Monitoring Query Language REST API request. |
5m |
{$GCP.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
Name | Description | Type | Key and additional info |
---|---|---|---|
GCP Cloud SQL MSSQL: Replica metrics get | MSSQL replica metrics data in raw format. |
Script | gcp.cloudsql.mssql.repl.metrics.get Preprocessing
|
GCP Cloud SQL MSSQL: Bytes sent to replica | Total number of bytes sent to the remote availability replica. For an async replica, returns the number of bytes before compression. For a sync replica without compression, returns the actual number of bytes. |
Dependent item | gcp.cloudsql.mssql.repl.bytessenttoreplicacount Preprocessing
|
GCP Cloud SQL MSSQL: Resent messages | Total count of Always On messages to resend. This includes messages that were attempted to be sent but failed and require resending. |
Dependent item | gcp.cloudsql.mssql.repl.resentmessagecount Preprocessing
|
GCP Cloud SQL MSSQL: Log apply pending queue | Current number of log blocks that are waiting to be applied to replica. |
Dependent item | gcp.cloudsql.mssql.repl.logapplypending_queue Preprocessing
|
GCP Cloud SQL MSSQL: Log bytes received | Total size of log records received by the replica. |
Dependent item | gcp.cloudsql.mssql.repl.logbytesreceived_count Preprocessing
|
GCP Cloud SQL MSSQL: Recovery queue | Current size of log records in bytes in the replica's log files that have not been redone. |
Dependent item | gcp.cloudsql.mssql.repl.recovery_queue Preprocessing
|
GCP Cloud SQL MSSQL: Redone bytes | Total size in bytes of redone log records. |
Dependent item | gcp.cloudsql.mssql.repl.redonebytescount Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, and {$AZURE.SUBSCRIPTION.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.VM.NAME.MATCHES} | This macro is used in virtual machines discovery rule. |
.* |
{$AZURE.VM.NAME.NOT.MATCHES} | This macro is used in virtual machines discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.VM.LOCATION.MATCHES} | This macro is used in virtual machines discovery rule. |
.* |
{$AZURE.VM.LOCATION.NOT.MATCHES} | This macro is used in virtual machines discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.SCALESET.NAME.MATCHES} | This macro is used in virtual machine scale sets discovery rule. |
.* |
{$AZURE.SCALESET.NAME.NOT.MATCHES} | This macro is used in virtual machine scale sets discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.SCALESET.LOCATION.MATCHES} | This macro is used in virtual machine scale sets discovery rule. |
.* |
{$AZURE.SCALESET.LOCATION.NOT.MATCHES} | This macro is used in virtual machine scale sets discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.SQL.INST.NAME.MATCHES} | This macro is used in Azure SQL Managed Instance discovery rule. |
.* |
{$AZURE.SQL.INST.NAME.NOT.MATCHES} | This macro is used in Azure SQL Managed Instance discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.SQL.INST.LOCATION.MATCHES} | This macro is used in Azure SQL Managed Instance discovery rule. |
.* |
{$AZURE.SQL.INST.LOCATION.NOT.MATCHES} | This macro is used in Azure SQL Managed Instance discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.STORAGE.ACC.NAME.MATCHES} | This macro is used in storage accounts discovery rule. |
.* |
{$AZURE.STORAGE.ACC.NAME.NOT.MATCHES} | This macro is used in storage accounts discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.STORAGE.ACC.LOCATION.MATCHES} | This macro is used in storage accounts discovery rule. |
.* |
{$AZURE.STORAGE.ACC.LOCATION.NOT.MATCHES} | This macro is used in storage accounts discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.STORAGE.ACC.AVAILABILITY} | The warning threshold of the storage account availability. |
70 |
{$AZURE.STORAGE.ACC.BLOB.AVAILABILITY} | The warning threshold of the storage account blob services availability. |
70 |
{$AZURE.STORAGE.ACC.TABLE.AVAILABILITY} | The warning threshold of the storage account table services availability. |
70 |
{$AZURE.RESOURCE.GROUP.MATCHES} | This macro is used in discovery rules. |
.* |
{$AZURE.RESOURCE.GROUP.NOT.MATCHES} | This macro is used in discovery rules. |
CHANGE_IF_NEEDED |
{$AZURE.MYSQL.DB.NAME.MATCHES} | This macro is used in MySQL servers discovery rule. |
.* |
{$AZURE.MYSQL.DB.NAME.NOT.MATCHES} | This macro is used in MySQL servers discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.MYSQL.DB.LOCATION.MATCHES} | This macro is used in MySQL servers discovery rule. |
.* |
{$AZURE.MYSQL.DB.LOCATION.NOT.MATCHES} | This macro is used in MySQL servers discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.PGSQL.DB.NAME.MATCHES} | This macro is used in PostgreSQL servers discovery rule. |
.* |
{$AZURE.PGSQL.DB.NAME.NOT.MATCHES} | This macro is used in PostgreSQL servers discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.PGSQL.DB.LOCATION.MATCHES} | This macro is used in PostgreSQL servers discovery rule. |
.* |
{$AZURE.PGSQL.DB.LOCATION.NOT.MATCHES} | This macro is used in PostgreSQL servers discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.MSSQL.DB.NAME.MATCHES} | This macro is used in Microsoft SQL databases discovery rule. |
.* |
{$AZURE.MSSQL.DB.NAME.NOT.MATCHES} | This macro is used in Microsoft SQL databases discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.MSSQL.DB.LOCATION.MATCHES} | This macro is used in Microsoft SQL databases discovery rule. |
.* |
{$AZURE.MSSQL.DB.LOCATION.NOT.MATCHES} | This macro is used in Microsoft SQL databases discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.MSSQL.DB.SIZE.NOT.MATCHES} | This macro is used in Microsoft SQL databases discovery rule. |
^System$ |
{$AZURE.COSMOS.MONGO.DB.NAME.MATCHES} | This macro is used in Microsoft Cosmos DB account discovery rule. |
.* |
{$AZURE.COSMOS.MONGO.DB.NAME.NOT.MATCHES} | This macro is used in Microsoft Cosmos DB account discovery rule. |
CHANGE_IF_NEEDED |
{$AZURE.COSMOS.MONGO.DB.LOCATION.MATCHES} | This macro is used in Microsoft Cosmos DB account discovery rule. |
.* |
{$AZURE.COSMOS.MONGO.DB.LOCATION.NOT.MATCHES} | This macro is used in Microsoft Cosmos DB account discovery rule. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure: Get resources | The result of API requests is expressed in the JSON. |
Script | azure.get.resources |
Azure: Get errors | A list of errors from API requests. |
Dependent item | azure.get.errors Preprocessing
|
Azure: Get storage accounts | The result of API requests is expressed in the JSON. |
Script | azure.get.storage.acc |
Azure: Get storage accounts errors | The errors from API requests. |
Dependent item | azure.get.storage.acc.errors Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure by HTTP/azure.get.errors))>0 |Average |
||
Azure: There are errors in storages requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure by HTTP/azure.get.storage.acc.errors))>0 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage accounts discovery | The list of all storage accounts available under the subscription. |
Dependent item | azure.storage.acc.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure: Storage account [{#NAME}]: Get data | The HTTP API endpoint that returns storage metrics with the name |
Script | azure.get.storage.acc[{#NAME}] |
Azure: Storage account [{#NAME}]: Used Capacity | The amount of storage used by the storage account with the name For standard storage accounts, it's the sum of capacity used by blob, table, file, and queue. For premium storage accounts and blob storage accounts, it is the same as BlobCapacity or FileCapacity. |
Dependent item | azure.storage.used.capacity[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Transactions | The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use |
Dependent item | azure.storage.transactions[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Ingress | The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure. |
Dependent item | azure.storage.ingress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Egress | The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress. |
Dependent item | azure.storage.engress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Success Server Latency | The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in |
Dependent item | azure.storage.success.server.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Success E2E Latency | The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
Dependent item | azure.storage.success.e2e.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Availability | The percentage of availability for the storage service or a specified API operation. Availability is calculated by taking the All unexpected errors result in reduced availability for the storage service or the specified API operation. |
Dependent item | azure.storage.availability[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Capacity | The amount of storage used by the blob service of the storage account with the name |
Dependent item | azure.storage.blob.capacity[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Count | The number of blob objects stored in the storage account with the name |
Dependent item | azure.storage.blob.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Container Count | The number of containers in the storage account with the name |
Dependent item | azure.storage.blob.container.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Index Capacity | The amount of storage with the name |
Dependent item | azure.storage.blob.index.capacity[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Transactions | The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use |
Dependent item | azure.storage.blob.transactions[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Ingress | The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure. |
Dependent item | azure.storage.blob.ingress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Egress | The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress. |
Dependent item | azure.storage.blob.engress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Success Server Latency | The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in |
Dependent item | azure.storage.blob.success.server.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Success E2E Latency | The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
Dependent item | azure.storage.blob.success.e2e.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Blob Availability | The percentage of availability for the storage service or a specified API operation. Availability is calculated by taking the All unexpected errors result in reduced availability for the storage service or the specified API operation. |
Dependent item | azure.storage.blob.availability[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Capacity | The amount of storage used by the table service of the storage account with the name |
Dependent item | azure.storage.table.capacity[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Count | The number of tables in the storage account with the name |
Dependent item | azure.storage.table.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Entity Count | The number of table entities in the storage account with the name |
Dependent item | azure.storage.table.entity.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Transactions | The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use |
Dependent item | azure.storage.table.transactions[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Ingress | The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure. |
Dependent item | azure.storage.table.ingress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Egress | The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress. |
Dependent item | azure.storage.table.engress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Success Server Latency | The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in |
Dependent item | azure.storage.table.success.server.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Success E2E Latency | The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
Dependent item | azure.storage.table.success.e2e.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Table Availability | The percentage of availability for the storage service or a specified API operation. Availability is calculated by taking the All unexpected errors result in reduced availability for the storage service or the specified API operation. |
Dependent item | azure.storage.table.availability[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Capacity | The amount of file storage used by the storage account with the name |
Dependent item | azure.storage.file.capacity[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Count | The number of files in the storage account with the name |
Dependent item | azure.storage.file.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Share Count | The number of file shares in the storage account. |
Dependent item | azure.storage.file.share.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Share Snapshot Count | The number of snapshots present on the share in storage account's Files Service. |
Dependent item | azure.storage.file.shares.snapshot.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Share Snapshot Size | The amount of storage used by the snapshots in storage account's File service, in bytes. |
Dependent item | azure.storage.file.share.snapshot.size[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Share Capacity Quota | The upper limit on the amount of storage that can be used by Azure Files Service, in bytes. |
Dependent item | azure.storage.file.share.capacity.quota[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Transactions | The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use |
Dependent item | azure.storage.file.transactions[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Ingress | The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure. |
Dependent item | azure.storage.file.ingress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Egress | The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress. |
Dependent item | azure.storage.file.engress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Success Server Latency | The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in |
Dependent item | azure.storage.file.success.server.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: File Success E2E Latency | The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
Dependent item | azure.storage.file.success.e2e.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Capacity | The amount of queue storage used by the storage account with the name |
Dependent item | azure.storage.queue.capacity[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Count | The number of queues in the storage account with the name |
Dependent item | azure.storage.queue.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Message Count | The number of unexpired queue messages in the storage account with the name |
Dependent item | azure.storage.queue.message.count[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Transactions | The number of requests made to the storage service or a specified API operation. This number includes successful and failed requests and also requests that produced errors. Use |
Dependent item | azure.storage.queue.transactions[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Ingress | The amount of ingress data, expressed in bytes. This number includes ingress from an external client into Azure Storage and also ingress within Azure. |
Dependent item | azure.storage.queue.ingress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Egress | The amount of egress data. This number includes egress to external client from Azure Storage and also egress within Azure. As a result, this number does not reflect billable egress. |
Dependent item | azure.storage.queue.engress[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Success Server Latency | The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in |
Dependent item | azure.storage.queue.success.server.latency[{#NAME}] Preprocessing
|
Azure: Storage account [{#NAME}]: Queue Success E2E Latency | The average end-to-end latency of successful requests made to a storage service or the specified API operation, expressed in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
Dependent item | azure.storage.queue.success.e2e.latency[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure: Storage account [{#NAME}]: Availability is low | (min(/Azure by HTTP/azure.storage.availability[{#NAME}],#3))<{$AZURE.STORAGE.ACC.AVAILABILITY:"{#NAME}"} |Warning |
|||
Azure: Storage account [{#NAME}]: Blob Availability is low | (min(/Azure by HTTP/azure.storage.blob.availability[{#NAME}],#3))<{$AZURE.STORAGE.ACC.BLOB.AVAILABILITY:"{#NAME}"} |Warning |
|||
Azure: Storage account [{#NAME}]: Table Availability is low | (min(/Azure by HTTP/azure.storage.table.availability[{#NAME}],#3))<{$AZURE.STORAGE.ACC.TABLE.AVAILABILITY:"{#NAME}"} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Virtual machines discovery | The list of virtual machines provided by the subscription. |
Dependent item | azure.vm.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Virtual machine scale set discovery | The list of virtual machine scale sets provided by the subscription. |
Dependent item | azure.scaleset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure SQL managed instance discovery | The list of Azure SQL managed instances provided by the subscription. |
Dependent item | azure.sql_inst.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL servers discovery | The list of MySQL servers provided by the subscription. |
Dependent item | azure.mysql.servers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PostgreSQL servers discovery | The list of PostgreSQL servers provided by the subscription. |
Dependent item | azure.pgsql.servers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Microsoft SQL databases discovery | The list of Microsoft SQL databases provided by the subscription. |
Dependent item | azure.mssql.databases.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cosmos DB account discovery | The list of Cosmos databases provided by the subscription. |
Dependent item | azure.cosmos.mongo.db.discovery Preprocessing
|
This template is designed to monitor Microsoft Azure virtual machine scale sets by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure virtual machine ID. |
|
{$AZURE.SCALESET.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.SCALESET.VM.COUNT.CRIT} | The critical amount of virtual machines in the scale set. |
100 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure VMSS: Get data | Gathers data of the virtual machine scale set. |
Script | azure.scaleset.data.get |
Azure VMSS: Get errors | A list of errors from API requests. |
Dependent item | azure.scaleset.data.errors Preprocessing
|
Azure VMSS: Availability state | The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes. |
Dependent item | azure.scaleset.availability.state Preprocessing
|
Azure VMSS: Availability status detailed | The summary description of availability status. |
Dependent item | azure.scaleset.availability.details Preprocessing
|
Azure VMSS: Virtual machine count | Current amount of virtual machines in the scale set. |
Dependent item | azure.scaleset.vm.count Preprocessing
|
Azure VMSS: Available memory | Amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the virtual machine. |
Dependent item | azure.scaleset.vm.memory Preprocessing
|
Azure VMSS: CPU credits consumed | Total number of credits consumed by the virtual machine. Only available on B-series burstable VMs. |
Dependent item | azure.scaleset.cpu.credits.consumed Preprocessing
|
Azure VMSS: CPU credits remaining | Total number of credits available to burst. Only available on B-series burstable VMs. |
Dependent item | azure.scaleset.cpu.credits.remaining Preprocessing
|
Azure VMSS: CPU utilization | The percentage of allocated compute units that are currently in use by the virtual machine(s). |
Dependent item | azure.scaleset.cpu.utilization Preprocessing
|
Azure VMSS: Data disk bandwidth consumed | Percentage of data disk bandwidth consumed per minute. |
Dependent item | azure.scaleset.data.disk.bandwidth.consumed Preprocessing
|
Azure VMSS: Data disk IOPS consumed | Percentage of data disk I/Os consumed per minute. |
Dependent item | azure.scaleset.data.disk.iops.consumed Preprocessing
|
Azure VMSS: Data disk read rate | Bytes/sec read from a single disk during the monitoring period. |
Dependent item | azure.scaleset.data.disk.read.bps Preprocessing
|
Azure VMSS: Data disk IOPS read | Read IOPS from a single disk during the monitoring period. |
Dependent item | azure.scaleset.data.disk.read.ops Preprocessing
|
Azure VMSS: Data disk used burst BPS credits | Percentage of data disk burst bandwidth credits used so far. |
Dependent item | azure.scaleset.data.disk.bandwidth.burst.used Preprocessing
|
Azure VMSS: Data disk used burst IO credits | Percentage of data disk burst I/O credits used so far. |
Dependent item | azure.scaleset.data.disk.iops.burst.used Preprocessing
|
Azure VMSS: Data disk write rate | Bytes/sec written to a single disk during the monitoring period. |
Dependent item | azure.scaleset.data.disk.write.bps Preprocessing
|
Azure VMSS: Data disk IOPS write | Write IOPS from a single disk during the monitoring period. |
Dependent item | azure.scaleset.data.disk.write.ops Preprocessing
|
Azure VMSS: Data disk queue depth | Data disk queue depth (or queue length). |
Dependent item | azure.scaleset.data.disk.queue.depth Preprocessing
|
Azure VMSS: Data disk target bandwidth | Baseline byte-per-second throughput the data disk can achieve without bursting. |
Dependent item | azure.scaleset.data.disk.bandwidth.target Preprocessing
|
Azure VMSS: Data disk target IOPS | Baseline IOPS the data disk can achieve without bursting. |
Dependent item | azure.scaleset.data.disk.iops.target Preprocessing
|
Azure VMSS: Data disk max burst bandwidth | Maximum byte-per-second throughput the data disk can achieve with bursting. |
Dependent item | azure.scaleset.data.disk.bandwidth.burst.max Preprocessing
|
Azure VMSS: Data disk max burst IOPS | Maximum IOPS the data disk can achieve with bursting. |
Dependent item | azure.scaleset.data.disk.iops.burst.max Preprocessing
|
Azure VMSS: Disk read | Bytes read from the disk during the monitoring period. |
Dependent item | azure.scaleset.disk.read Preprocessing
|
Azure VMSS: Disk IOPS read | Disk read IOPS. |
Dependent item | azure.scaleset.disk.read.ops Preprocessing
|
Azure VMSS: Disk write | Bytes written to the disk during the monitoring period. |
Dependent item | azure.scaleset.disk.write Preprocessing
|
Azure VMSS: Disk IOPS write | Write IOPS from a single disk during the monitoring period. |
Dependent item | azure.scaleset.disk.write.ops Preprocessing
|
Azure VMSS: Inbound flows | Inbound Flows are the number of current flows in the inbound direction (traffic going into the VMs). |
Dependent item | azure.scaleset.flows.inbound Preprocessing
|
Azure VMSS: Outbound flows | Outbound Flows are the number of current flows in the outbound direction (traffic going out of the VMs). |
Dependent item | azure.scaleset.flows.outbound Preprocessing
|
Azure VMSS: Network in total | The number of bytes received on all network interfaces by the virtual machine(s) (incoming traffic). |
Dependent item | azure.scaleset.network.in.total Preprocessing
|
Azure VMSS: Network out total | The number of bytes out on all network interfaces by the virtual machine(s) (outgoing traffic). |
Dependent item | azure.scaleset.network.out.total Preprocessing
|
Azure VMSS: Inbound flow maximum creation rate | The maximum creation rate of inbound flows (traffic going into the VM). |
Dependent item | azure.scaleset.flows.inbound.max Preprocessing
|
Azure VMSS: Outbound flow maximum creation rate | The maximum creation rate of outbound flows (traffic going out of the VM). |
Dependent item | azure.scaleset.flows.outbound.max Preprocessing
|
Azure VMSS: OS disk read rate | Bytes/sec read from a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.scaleset.os.disk.read.bps Preprocessing
|
Azure VMSS: OS disk write rate | Bytes/sec written to a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.scaleset.os.disk.write.bps Preprocessing
|
Azure VMSS: OS disk IOPS read | Read IOPS from a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.scaleset.os.disk.read.ops Preprocessing
|
Azure VMSS: OS disk IOPS write | Write IOPS from a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.scaleset.os.disk.write.ops Preprocessing
|
Azure VMSS: OS disk queue depth | OS Disk queue depth (or queue length). |
Dependent item | azure.scaleset.os.disk.queue.depth Preprocessing
|
Azure VMSS: OS disk bandwidth consumed | Percentage of operating system disk bandwidth consumed per minute. |
Dependent item | azure.scaleset.os.disk.bandwidth.consumed Preprocessing
|
Azure VMSS: OS disk IOPS consumed | Percentage of operating system disk I/Os consumed per minute. |
Dependent item | azure.scaleset.os.disk.iops.consumed Preprocessing
|
Azure VMSS: OS disk target bandwidth | Baseline byte-per-second throughput the OS Disk can achieve without bursting. |
Dependent item | azure.scaleset.os.disk.bandwidth.target Preprocessing
|
Azure VMSS: OS disk target IOPS | Baseline IOPS the OS disk can achieve without bursting. |
Dependent item | azure.scaleset.os.disk.iops.target Preprocessing
|
Azure VMSS: OS disk max burst bandwidth | Maximum byte-per-second throughput the OS Disk can achieve with bursting. |
Dependent item | azure.scaleset.os.disk.bandwidth.max Preprocessing
|
Azure VMSS: OS disk max burst IOPS | Maximum IOPS the OS Disk can achieve with bursting. |
Dependent item | azure.scaleset.os.disk.iops.max Preprocessing
|
Azure VMSS: OS disk used burst BPS credits | Percentage of OS Disk burst bandwidth credits used so far. |
Dependent item | azure.scaleset.os.disk.bandwidth.burst.used Preprocessing
|
Azure VMSS: OS disk used burst IO credits | Percentage of OS Disk burst I/O credits used so far. |
Dependent item | azure.scaleset.os.disk.iops.burst.used Preprocessing
|
Azure VMSS: Premium data disk cache read hit in % | Percentage of premium data disk cache read hit. |
Dependent item | azure.scaleset.premium.data.disk.cache.read.hit Preprocessing
|
Azure VMSS: Premium data disk cache read miss in % | Percentage of premium data disk cache read miss. |
Dependent item | azure.scaleset.premium.data.disk.cache.read.miss Preprocessing
|
Azure VMSS: Premium OS disk cache read hit in % | Percentage of premium OS disk cache read hit. |
Dependent item | azure.scaleset.premium.os.disk.cache.read.hit Preprocessing
|
Azure VMSS: Premium OS disk cache read miss in % | Percentage of premium OS disk cache read miss. |
Dependent item | azure.scaleset.premium.os.disk.cache.read.miss Preprocessing
|
Azure VMSS: VM cached bandwidth consumed | Percentage of cached disk bandwidth consumed by the VM. |
Dependent item | azure.scaleset.vm.cached.bandwidth.consumed Preprocessing
|
Azure VMSS: VM cached IOPS consumed | Percentage of cached disk IOPS consumed by the VM. |
Dependent item | azure.scaleset.vm.cached.iops.consumed Preprocessing
|
Azure VMSS: VM uncached bandwidth consumed | Percentage of uncached disk bandwidth consumed by the VM. |
Dependent item | azure.scaleset.vm.uncached.bandwidth.consumed Preprocessing
|
Azure VMSS: VM uncached IOPS consumed | Percentage of uncached disk IOPS consumed by the VM. |
Dependent item | azure.scaleset.vm.uncached.iops.consumed Preprocessing
|
Azure VMSS: VM availability metric | Measure of availability of the virtual machines over time. |
Dependent item | azure.scaleset.availability Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure VMSS: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure VM Scale Set by HTTP/azure.scaleset.data.errors))>0 |Average |
||
Azure VMSS: Virtual machine scale set is unavailable | The resource state is unavailable. |
last(/Azure VM Scale Set by HTTP/azure.scaleset.availability.state)=2 |High |
||
Azure VMSS: Virtual machine scale set is degraded | The resource is in a degraded state. |
last(/Azure VM Scale Set by HTTP/azure.scaleset.availability.state)=1 |Average |
||
Azure VMSS: Virtual machine scale set is in unknown state | The resource state is unknown. |
last(/Azure VM Scale Set by HTTP/azure.scaleset.availability.state)=3 |Warning |
||
Azure VMSS: High amount of VMs in the scale set | High amount of VMs in the scale set. |
min(/Azure VM Scale Set by HTTP/azure.scaleset.vm.count,5m)>{$AZURE.SCALESET.VM.COUNT.CRIT} |High |
||
Azure VMSS: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure VM Scale Set by HTTP/azure.scaleset.cpu.utilization,5m)>{$AZURE.SCALESET.CPU.UTIL.CRIT} |High |
This template is designed to monitor Microsoft Azure virtual machines (VMs) by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure virtual machine ID. |
|
{$AZURE.VM.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure: Get data | The result of API requests is expressed in the JSON. |
Script | azure.vm.data.get |
Azure: Get errors | A list of errors from API requests. |
Dependent item | azure.vm.data.errors Preprocessing
|
Azure: Availability state | The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes. |
Dependent item | azure.vm.availability.state Preprocessing
|
Azure: Availability status detailed | The summary description of availability status. |
Dependent item | azure.vm.availability.details Preprocessing
|
Azure: CPU utilization | Percentage of allocated compute units that are currently in use by virtual machine. |
Dependent item | azure.vm.cpu.utilization Preprocessing
|
Azure: Disk read | Bytes read from the disk during the monitoring period. |
Dependent item | azure.vm.disk.read.bytes Preprocessing
|
Azure: Disk write | Bytes written to the disk during the monitoring period. |
Dependent item | azure.vm.disk.write.bytes Preprocessing
|
Azure: Disk IOPS read | The count of read operations from the disk per second. |
Dependent item | azure.vm.disk.read.ops Preprocessing
|
Azure: Disk IOPS write | The count of write operations to the disk per second. |
Dependent item | azure.vm.disk.write.ops Preprocessing
|
Azure: CPU credits remaining | Total number of credits available to burst. Available only on B-series burstable VMs. |
Dependent item | azure.vm.cpu.credits.remaining Preprocessing
|
Azure: CPU credits consumed | Total number of credits consumed by the virtual machine. Only available on B-series burstable VMs. |
Dependent item | azure.vm.cpu.credits.consumed Preprocessing
|
Azure: Data disk read rate | Bytes per second read from a single disk during the monitoring period. |
Dependent item | azure.vm.data.disk.read.bps Preprocessing
|
Azure: Data disk write rate | Bytes per second written to a single disk during the monitoring period. |
Dependent item | azure.vm.data.disk.write.bps Preprocessing
|
Azure: Data disk IOPS read | Read IOPS from a single disk during the monitoring period. |
Dependent item | azure.vm.data.disk.read.ops Preprocessing
|
Azure: Data disk IOPS write | Write IOPS from a single disk during the monitoring period. |
Dependent item | azure.vm.data.disk.write.ops Preprocessing
|
Azure: Data disk queue depth | The number of outstanding IO requests that are waiting to be performed on a disk. |
Dependent item | azure.vm.data.disk.queue.depth Preprocessing
|
Azure: Data disk bandwidth consumed | Percentage of the data disk bandwidth consumed per minute. |
Dependent item | azure.vm.data.disk.bandwidth.consumed Preprocessing
|
Azure: Data disk IOPS consumed | Percentage of the data disk input/output (I/O) consumed per minute. |
Dependent item | azure.vm.data.disk.iops.consumed Preprocessing
|
Azure: Data disk target bandwidth | Baseline byte-per-second throughput that the data disk can achieve without bursting. |
Dependent item | azure.vm.data.disk.bandwidth.target Preprocessing
|
Azure: Data disk target IOPS | Baseline IOPS that the data disk can achieve without bursting. |
Dependent item | azure.vm.data.disk.iops.target Preprocessing
|
Azure: Data disk max burst bandwidth | Maximum byte-per-second throughput that the data disk can achieve with bursting. |
Dependent item | azure.vm.data.disk.bandwidth.max Preprocessing
|
Azure: Data disk max burst IOPS | Maximum IOPS that the data disk can achieve with bursting. |
Dependent item | azure.vm.data.disk.iops.max Preprocessing
|
Azure: Data disk used burst BPS credits | Percentage of the data disk burst bandwidth credits used so far. |
Dependent item | azure.vm.data.disk.bandwidth.burst.used Preprocessing
|
Azure: Data disk used burst IO credits | Percentage of the data disk burst I/O credits used so far. |
Dependent item | azure.vm.data.disk.iops.burst.used Preprocessing
|
Azure: OS disk read rate | Bytes/sec read from a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.vm.os.disk.read.bps Preprocessing
|
Azure: OS disk write rate | Bytes/sec written to a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.vm.os.disk.write.bps Preprocessing
|
Azure: OS disk IOPS read | Read IOPS from a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.vm.os.disk.read.ops Preprocessing
|
Azure: OS disk IOPS write | Write IOPS from a single disk during the monitoring period - for an OS disk. |
Dependent item | azure.vm.os.disk.write.ops Preprocessing
|
Azure: OS disk queue depth | The OS disk queue depth (or queue length). |
Dependent item | azure.vm.os.disk.queue.depth Preprocessing
|
Azure: OS disk bandwidth consumed | Percentage of the operating system disk bandwidth consumed per minute. |
Dependent item | azure.vm.os.disk.bandwidth.consumed Preprocessing
|
Azure: OS disk IOPS consumed | Percentage of the operating system disk I/Os consumed per minute. |
Dependent item | azure.vm.os.disk.iops.consumed Preprocessing
|
Azure: OS disk target bandwidth | Baseline byte-per-second throughput that the OS disk can achieve without bursting. |
Dependent item | azure.vm.os.disk.bandwidth.target Preprocessing
|
Azure: OS disk target IOPS | Baseline IOPS that the OS disk can achieve without bursting. |
Dependent item | azure.vm.os.disk.iops.target Preprocessing
|
Azure: OS disk max burst bandwidth | Maximum byte-per-second throughput that the OS disk can achieve with bursting. |
Dependent item | azure.vm.os.disk.bandwidth.max Preprocessing
|
Azure: OS disk max burst IOPS | Maximum IOPS that the OS disk can achieve with bursting. |
Dependent item | azure.vm.os.disk.iops.max Preprocessing
|
Azure: OS disk used burst BPS credits | Percentage of the OS disk burst bandwidth credits used so far. |
Dependent item | azure.vm.os.disk.bandwidth.burst.used Preprocessing
|
Azure: OS disk used burst IO credits | Percentage of the OS disk burst I/O credits used so far. |
Dependent item | azure.vm.os.disk.iops.burst.used Preprocessing
|
Azure: Inbound flows | The number of current flows in the inbound direction (the traffic going into the VM). |
Dependent item | azure.vm.flows.inbound Preprocessing
|
Azure: Outbound flows | The number of current flows in the outbound direction (the traffic going out of the VM). |
Dependent item | azure.vm.flows.outbound Preprocessing
|
Azure: Inbound flows max creation rate | Maximum creation rate of the inbound flows (the traffic going into the VM). |
Dependent item | azure.vm.flows.inbound.max Preprocessing
|
Azure: Outbound flows max creation rate | Maximum creation rate of the outbound flows (the traffic going out of the VM). |
Dependent item | azure.vm.flows.outbound.max Preprocessing
|
Azure: Premium data disk cache read hit in % | Percentage of premium data disk cache read hit. |
Dependent item | azure.vm.premium.data.disk.cache.read.hit Preprocessing
|
Azure: Premium data disk cache read miss in % | Percentage of premium data disk cache read miss. |
Dependent item | azure.vm.premium.data.disk.cache.read.miss Preprocessing
|
Azure: Premium OS disk cache read hit in % | Percentage of premium OS disk cache read hit. |
Dependent item | azure.vm.premium.os.disk.cache.read.hit Preprocessing
|
Azure: Premium OS disk cache read miss in % | Percentage of premium OS disk cache read miss. |
Dependent item | azure.vm.premium.os.disk.cache.read.miss Preprocessing
|
Azure: VM cached bandwidth consumed | Percentage of the cached disk bandwidth consumed by the VM. |
Dependent item | azure.vm.cached.bandwidth.consumed Preprocessing
|
Azure: VM cached IOPS consumed | Percentage of the cached disk IOPS consumed by the VM. |
Dependent item | azure.vm.cached.iops.consumed Preprocessing
|
Azure: VM uncached bandwidth consumed | Percentage of the uncached disk bandwidth consumed by the VM. |
Dependent item | azure.vm.uncached.bandwidth.consumed Preprocessing
|
Azure: VM uncached IOPS consumed | Percentage of the uncached disk IOPS consumed by the VM. |
Dependent item | azure.vm.uncached.iops.consumed Preprocessing
|
Azure: Network in total | The number of bytes received by the VM via all network interfaces (incoming traffic). |
Dependent item | azure.vm.network.in.total Preprocessing
|
Azure: Network out total | The number of bytes sent by the VM via all network interfaces (outgoing traffic). |
Dependent item | azure.vm.network.out.total Preprocessing
|
Azure: Available memory | Amount of physical memory, in bytes, immediately available for the allocation to a process or for a system use in the virtual machine. |
Dependent item | azure.vm.memory.available Preprocessing
|
Azure: Data disk latency | Average time to complete each IO during the monitoring period for Data Disk. |
Dependent item | azure.vm.disk.latency Preprocessing
|
Azure: OS disk latency | Average time to complete each IO during the monitoring period for OS Disk. |
Dependent item | azure.vm.os.disk.latency Preprocessing
|
Azure: Temp disk latency | Average time to complete each IO during the monitoring period for temp disk. |
Dependent item | azure.vm.temp.disk.latency Preprocessing
|
Azure: Temp disk read rate | Bytes/Sec read from a single disk during the monitoring period for temp disk. |
Dependent item | azure.vm.temp.disk.read.bps Preprocessing
|
Azure: Temp disk write rate | Bytes/Sec written to a single disk during the monitoring period for temp disk. |
Dependent item | azure.vm.temp.disk.write.bps Preprocessing
|
Azure: Temp disk IOPS read | Read IOPS from a single disk during the monitoring period for temp disk. |
Dependent item | azure.vm.temp.disk.read.ops Preprocessing
|
Azure: Temp disk IOPS write | Bytes/Sec written to a single disk during the monitoring period for temp disk. |
Dependent item | azure.vm.temp.disk.write.ops Preprocessing
|
Azure: Temp disk queue depth | Temp Disk queue depth (or queue length). |
Dependent item | azure.vm.temp.disk.queue.depth Preprocessing
|
Azure: VM availability metric | Measure of availability of virtual machine over time. |
Dependent item | azure.vm.availability Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure Virtual Machine by HTTP/azure.vm.data.errors))>0 |Average |
||
Azure: Virtual machine is unavailable | The resource state is unavailable. |
last(/Azure Virtual Machine by HTTP/azure.vm.availability.state)=2 |High |
||
Azure: Virtual machine is degraded | The resource is in a degraded state. |
last(/Azure Virtual Machine by HTTP/azure.vm.availability.state)=1 |Average |
||
Azure: Virtual machine is in unknown state | The resource state is unknown. |
last(/Azure Virtual Machine by HTTP/azure.vm.availability.state)=3 |Warning |
||
Azure: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure Virtual Machine by HTTP/azure.vm.cpu.utilization,5m)>{$AZURE.VM.CPU.UTIL.CRIT} |High |
This template is designed to monitor Microsoft Azure MySQL flexible servers by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure MySQL server ID. |
|
{$AZURE.DB.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.DB.STORAGE.PUSED.WARN} | The warning threshold of the storage utilization, expressed in %. |
80 |
{$AZURE.DB.STORAGE.PUSED.CRIT} | The critical threshold of the storage utilization, expressed in %. |
90 |
{$AZURE.DB.ABORTED.CONN.MAX.WARN} | The number of failed attempts to connect to the MySQL server for a trigger expression. |
25 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure MySQL: Get data | The result of API requests is expressed in the JSON. |
Script | azure.db.mysql.data.get |
Azure MySQL: Get errors | A list of errors from API requests. |
Dependent item | azure.db.mysql.data.errors Preprocessing
|
Azure MySQL: Availability state | The availability status of the resource. |
Dependent item | azure.db.mysql.availability.state Preprocessing
|
Azure MySQL: Availability status detailed | The summary description of the availability status. |
Dependent item | azure.db.mysql.availability.details Preprocessing
|
Azure MySQL: Percentage CPU | The CPU percent of a host. |
Dependent item | azure.db.mysql.cpu.percentage Preprocessing
|
Azure MySQL: Memory utilization | The memory percent of a host. |
Dependent item | azure.db.mysql.memory.percentage Preprocessing
|
Azure MySQL: Network out | Network egress of a host, expressed in bytes. |
Dependent item | azure.db.mysql.network.egress Preprocessing
|
Azure MySQL: Network in | Network ingress of a host, expressed in bytes. |
Dependent item | azure.db.mysql.network.ingress Preprocessing
|
Azure MySQL: Connections active | The count of active connections. |
Dependent item | azure.db.mysql.connections.active Preprocessing
|
Azure MySQL: Connections total | The count of total connections. |
Dependent item | azure.db.mysql.connections.total Preprocessing
|
Azure MySQL: Connections aborted | The count of aborted connections. |
Dependent item | azure.db.mysql.connections.aborted Preprocessing
|
Azure MySQL: Queries | The count of queries. |
Dependent item | azure.db.mysql.queries Preprocessing
|
Azure MySQL: IO consumption percent | The consumption percent of I/O. |
Dependent item | azure.db.mysql.io.consumption.percent Preprocessing
|
Azure MySQL: Storage percent | The storage utilization, expressed in %. |
Dependent item | azure.db.mysql.storage.percent Preprocessing
|
Azure MySQL: Storage used | Used storage space, expressed in bytes. |
Dependent item | azure.db.mysql.storage.used Preprocessing
|
Azure MySQL: Storage limit | The storage limit, expressed in bytes. |
Dependent item | azure.db.mysql.storage.limit Preprocessing
|
Azure MySQL: Backup storage used | Used backup storage, expressed in bytes. |
Dependent item | azure.db.mysql.storage.backup.used Preprocessing
|
Azure MySQL: Replication lag | The replication lag, expressed in seconds. |
Dependent item | azure.db.mysql.replication.lag Preprocessing
|
Azure MySQL: CPU credits remaining | The remaining CPU credits. |
Dependent item | azure.db.mysql.cpu.credits.remaining Preprocessing
|
Azure MySQL: CPU credits consumed | The consumed CPU credits. |
Dependent item | azure.db.mysql.cpu.credits.consumed Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure MySQL: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.data.errors))>0 |Average |
||
Azure MySQL: MySQL server is unavailable | The resource state is unavailable. |
last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.availability.state)=2 |High |
||
Azure MySQL: MySQL server is degraded | The resource is in a degraded state. |
last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.availability.state)=1 |Average |
||
Azure MySQL: MySQL server is in unknown state | The resource state is unknown. |
last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.availability.state)=3 |Warning |
||
Azure MySQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT} |High |
||
Azure MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than |
min(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.connections.aborted,5m)>{$AZURE.DB.ABORTED.CONN.MAX.WARN} |Average |
||
Azure MySQL: Storage space is critically low | Critical utilization of the storage space. |
last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT} |Average |
||
Azure MySQL: Storage space is low | High utilization of the storage space. |
last(/Azure MySQL Flexible Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN} |Warning |
This template is designed to monitor Microsoft Azure MySQL single servers by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure MySQL server ID. |
|
{$AZURE.DB.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.DB.MEMORY.UTIL.CRIT} | The critical threshold of memory utilization, expressed in %. |
90 |
{$AZURE.DB.STORAGE.PUSED.WARN} | The warning threshold of storage utilization, expressed in %. |
80 |
{$AZURE.DB.STORAGE.PUSED.CRIT} | The critical threshold of storage utilization, expressed in %. |
90 |
{$AZURE.DB.FAILED.CONN.MAX.WARN} | The number of failed attempts to connect to the MySQL server for trigger expression. |
25 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure MySQL: Get data | The result of API requests is expressed in the JSON. |
Script | azure.db.mysql.data.get |
Azure MySQL: Get errors | A list of errors from API requests. |
Dependent item | azure.db.mysql.data.errors Preprocessing
|
Azure MySQL: Availability state | The availability status of the resource. |
Dependent item | azure.db.mysql.availability.state Preprocessing
|
Azure MySQL: Availability status detailed | The summary description of the availability status. |
Dependent item | azure.db.mysql.availability.details Preprocessing
|
Azure MySQL: Percentage CPU | The CPU percent of a host. |
Dependent item | azure.db.mysql.cpu.percentage Preprocessing
|
Azure MySQL: Memory utilization | The memory percent of a host. |
Dependent item | azure.db.mysql.memory.percentage Preprocessing
|
Azure MySQL: Network out | The network outbound traffic across the active connections. |
Dependent item | azure.db.mysql.network.egress Preprocessing
|
Azure MySQL: Network in | The network inbound traffic across the active connections. |
Dependent item | azure.db.mysql.network.ingress Preprocessing
|
Azure MySQL: Connections active | The count of active connections. |
Dependent item | azure.db.mysql.connections.active Preprocessing
|
Azure MySQL: Connections failed | The count of failed connections. |
Dependent item | azure.db.mysql.connections.failed Preprocessing
|
Azure MySQL: IO consumption percent | The consumption percent of I/O. |
Dependent item | azure.db.mysql.io.consumption.percent Preprocessing
|
Azure MySQL: Storage percent | The storage utilization, expressed in %. |
Dependent item | azure.db.mysql.storage.percent Preprocessing
|
Azure MySQL: Storage used | Used storage space, expressed in bytes. |
Dependent item | azure.db.mysql.storage.used Preprocessing
|
Azure MySQL: Storage limit | The storage limit, expressed in bytes. |
Dependent item | azure.db.mysql.storage.limit Preprocessing
|
Azure MySQL: Backup storage used | Used backup storage, expressed in bytes. |
Dependent item | azure.db.mysql.storage.backup.used Preprocessing
|
Azure MySQL: Replication lag | The replication lag, expressed in seconds. |
Dependent item | azure.db.mysql.replication.lag Preprocessing
|
Azure MySQL: Server log storage percent | The storage utilization by server log, expressed in %. |
Dependent item | azure.db.mysql.storage.server.log.percent Preprocessing
|
Azure MySQL: Server log storage used | The storage space used by server log, expressed in bytes. |
Dependent item | azure.db.mysql.storage.server.log.used Preprocessing
|
Azure MySQL: Server log storage limit | The storage limit of server log, expressed in bytes. |
Dependent item | azure.db.mysql.storage.server.log.limit Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure MySQL: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure MySQL Single Server by HTTP/azure.db.mysql.data.errors))>0 |Average |
||
Azure MySQL: MySQL server is unavailable | The resource state is unavailable. |
last(/Azure MySQL Single Server by HTTP/azure.db.mysql.availability.state)=2 |High |
||
Azure MySQL: MySQL server is degraded | The resource is in a degraded state. |
last(/Azure MySQL Single Server by HTTP/azure.db.mysql.availability.state)=1 |Average |
||
Azure MySQL: MySQL server is in unknown state | The resource state is unknown. |
last(/Azure MySQL Single Server by HTTP/azure.db.mysql.availability.state)=3 |Warning |
||
Azure MySQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure MySQL Single Server by HTTP/azure.db.mysql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT} |High |
||
Azure MySQL: High memory utilization | The system is running out of free memory. |
min(/Azure MySQL Single Server by HTTP/azure.db.mysql.memory.percentage,5m)>{$AZURE.DB.MEMORY.UTIL.CRIT} |Average |
||
Azure MySQL: Server has failed connections | The number of failed attempts to connect to the MySQL server is more than |
min(/Azure MySQL Single Server by HTTP/azure.db.mysql.connections.failed,5m)>{$AZURE.DB.FAILED.CONN.MAX.WARN} |Average |
||
Azure MySQL: Storage space is critically low | Critical utilization of the storage space. |
last(/Azure MySQL Single Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT} |Average |
||
Azure MySQL: Storage space is low | High utilization of the storage space. |
last(/Azure MySQL Single Server by HTTP/azure.db.mysql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN} |Warning |
This template is designed to monitor Microsoft Azure PostgreSQL flexible servers by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure PostgreSQL server ID. |
|
{$AZURE.DB.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.DB.MEMORY.UTIL.CRIT} | The critical threshold of memory utilization, expressed in %. |
90 |
{$AZURE.DB.STORAGE.PUSED.WARN} | The warning threshold of storage utilization, expressed in %. |
80 |
{$AZURE.DB.STORAGE.PUSED.CRIT} | The critical threshold of storage utilization, expressed in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure PostgreSQL: Get data | The result of API requests is expressed in the JSON. |
Script | azure.db.pgsql.data.get |
Azure PostgreSQL: Get errors | A list of errors from API requests. |
Dependent item | azure.db.pgsql.data.errors Preprocessing
|
Azure PostgreSQL: Availability state | The availability status of the resource. |
Dependent item | azure.db.pgsql.availability.state Preprocessing
|
Azure PostgreSQL: Availability status detailed | The summary description of the availability status. |
Dependent item | azure.db.pgsql.availability.details Preprocessing
|
Azure PostgreSQL: Percentage CPU | The CPU percent of a host. |
Dependent item | azure.db.pgsql.cpu.percentage Preprocessing
|
Azure PostgreSQL: Memory utilization | The memory percent of a host. |
Dependent item | azure.db.pgsql.memory.percentage Preprocessing
|
Azure PostgreSQL: Network out | The network outbound traffic across the active connections. |
Dependent item | azure.db.pgsql.network.egress Preprocessing
|
Azure PostgreSQL: Network in | The network inbound traffic across the active connections. |
Dependent item | azure.db.pgsql.network.ingress Preprocessing
|
Azure PostgreSQL: Connections active | The count of active connections. |
Dependent item | azure.db.pgsql.connections.active Preprocessing
|
Azure PostgreSQL: Connections succeeded | The count of succeeded connections. |
Dependent item | azure.db.pgsql.connections.succeeded Preprocessing
|
Azure PostgreSQL: Connections failed | The count of failed connections. |
Dependent item | azure.db.pgsql.connections.failed Preprocessing
|
Azure PostgreSQL: Storage percent | The storage utilization, expressed in %. |
Dependent item | azure.db.pgsql.storage.percent Preprocessing
|
Azure PostgreSQL: Storage used | Used storage space, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.used Preprocessing
|
Azure PostgreSQL: Storage free | Free storage space, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.free Preprocessing
|
Azure PostgreSQL: Backup storage used | Used backup storage, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.backup.used Preprocessing
|
Azure PostgreSQL: CPU credits remaining | The total number of credits available to burst. |
Dependent item | azure.db.pgsql.cpu.credits.remaining Preprocessing
|
Azure PostgreSQL: CPU credits consumed | The total number of credits consumed by the database server. |
Dependent item | azure.db.pgsql.cpu.credits.consumed Preprocessing
|
Azure PostgreSQL: Data disk queue depth | The number of outstanding I/O operations to the data disk. |
Dependent item | azure.db.pgsql.disk.queue.depth Preprocessing
|
Azure PostgreSQL: Data disk IOPS | I/O operations per second. |
Dependent item | azure.db.pgsql.iops Preprocessing
|
Azure PostgreSQL: Data disk read IOPS | The number of the data disk I/O read operations per second. |
Dependent item | azure.db.pgsql.iops.read Preprocessing
|
Azure PostgreSQL: Data disk write IOPS | The number of the data disk I/O write operations per second. |
Dependent item | azure.db.pgsql.iops.write Preprocessing
|
Azure PostgreSQL: Data disk read Bps | Bytes read per second from the data disk during the monitoring period. |
Dependent item | azure.db.pgsql.disk.bps.read Preprocessing
|
Azure PostgreSQL: Data disk write Bps | Bytes written per second to the data disk during the monitoring period. |
Dependent item | azure.db.pgsql.disk.bps.write Preprocessing
|
Azure PostgreSQL: Transaction log storage used | The storage space used by transaction log, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.txlogs.used Preprocessing
|
Azure PostgreSQL: Maximum used transaction IDs | The maximum number of used transaction IDs. |
Dependent item | azure.db.pgsql.txid.used.max Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure PostgreSQL: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.data.errors))>0 |Average |
||
Azure PostgreSQL: PostgreSQL server is unavailable | The resource state is unavailable. |
last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.availability.state)=2 |High |
||
Azure PostgreSQL: PostgreSQL server is degraded | The resource is in a degraded state. |
last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.availability.state)=1 |Average |
||
Azure PostgreSQL: PostgreSQL server is in unknown state | The resource state is unknown. |
last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.availability.state)=3 |Warning |
||
Azure PostgreSQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT} |High |
||
Azure PostgreSQL: High memory utilization | The system is running out of free memory. |
min(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.memory.percentage,5m)>{$AZURE.DB.MEMORY.UTIL.CRIT} |Average |
||
Azure PostgreSQL: Storage space is critically low | Critical utilization of the storage space. |
last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT} |Average |
||
Azure PostgreSQL: Storage space is low | High utilization of the storage space. |
last(/Azure PostgreSQL Flexible Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN} |Warning |
This template is designed to monitor Microsoft Azure PostgreSQL servers by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure PostgreSQL server ID. |
|
{$AZURE.DB.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.DB.MEMORY.UTIL.CRIT} | The critical threshold of memory utilization, expressed in %. |
90 |
{$AZURE.DB.STORAGE.PUSED.WARN} | The warning threshold of storage utilization, expressed in %. |
80 |
{$AZURE.DB.STORAGE.PUSED.CRIT} | The critical threshold of storage utilization, expressed in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure PostgreSQL: Get data | The result of API requests is expressed in the JSON. |
Script | azure.db.pgsql.data.get |
Azure PostgreSQL: Get errors | A list of errors from API requests. |
Dependent item | azure.db.pgsql.data.errors Preprocessing
|
Azure PostgreSQL: Availability state | The availability status of the resource. |
Dependent item | azure.db.pgsql.availability.state Preprocessing
|
Azure PostgreSQL: Availability status detailed | The summary description of the availability status. |
Dependent item | azure.db.pgsql.availability.details Preprocessing
|
Azure PostgreSQL: Percentage CPU | The CPU percent of a host. |
Dependent item | azure.db.pgsql.cpu.percentage Preprocessing
|
Azure PostgreSQL: Memory utilization | The memory percent of a host. |
Dependent item | azure.db.pgsql.memory.percentage Preprocessing
|
Azure PostgreSQL: Network out | The network outbound traffic across the active connections. |
Dependent item | azure.db.pgsql.network.egress Preprocessing
|
Azure PostgreSQL: Network in | The network inbound traffic across the active connections. |
Dependent item | azure.db.pgsql.network.ingress Preprocessing
|
Azure PostgreSQL: Connections active | The count of active connections. |
Dependent item | azure.db.pgsql.connections.active Preprocessing
|
Azure PostgreSQL: Connections failed | The count of failed connections. |
Dependent item | azure.db.pgsql.connections.failed Preprocessing
|
Azure PostgreSQL: IO consumption percent | The consumption percent of I/O. |
Dependent item | azure.db.pgsql.io.consumption.percent Preprocessing
|
Azure PostgreSQL: Storage percent | The storage utilization, expressed in %. |
Dependent item | azure.db.pgsql.storage.percent Preprocessing
|
Azure PostgreSQL: Storage used | Used storage space, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.used Preprocessing
|
Azure PostgreSQL: Storage limit | The storage limit, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.limit Preprocessing
|
Azure PostgreSQL: Backup storage used | Used backup storage, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.backup.used Preprocessing
|
Azure PostgreSQL: Replication lag | The replication lag, expressed in seconds. |
Dependent item | azure.db.pgsql.replica.log.delay Preprocessing
|
Azure PostgreSQL: Max lag across replicas in bytes | Lag for the most lagging replica, expressed in bytes. |
Dependent item | azure.db.pgsql.replica.log.delay.bytes Preprocessing
|
Azure PostgreSQL: Server log storage percent | The storage utilization by server log, expressed in %. |
Dependent item | azure.db.pgsql.storage.server.log.percent Preprocessing
|
Azure PostgreSQL: Server log storage used | The storage space used by server log, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.server.log.used Preprocessing
|
Azure PostgreSQL: Server log storage limit | The storage limit of server log, expressed in bytes. |
Dependent item | azure.db.pgsql.storage.server.log.limit Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure PostgreSQL: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.data.errors))>0 |Average |
||
Azure PostgreSQL: PostgreSQL server is unavailable | The resource state is unavailable. |
last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.availability.state)=2 |High |
||
Azure PostgreSQL: PostgreSQL server is degraded | The resource is in a degraded state. |
last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.availability.state)=1 |Average |
||
Azure PostgreSQL: PostgreSQL server is in unknown state | The resource state is unknown. |
last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.availability.state)=3 |Warning |
||
Azure PostgreSQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT} |High |
||
Azure PostgreSQL: High memory utilization | The system is running out of free memory. |
min(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.memory.percentage,5m)>{$AZURE.DB.MEMORY.UTIL.CRIT} |Average |
||
Azure PostgreSQL: Storage space is critically low | Critical utilization of the storage space. |
last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT} |Average |
||
Azure PostgreSQL: Storage space is low | High utilization of the storage space. |
last(/Azure PostgreSQL Single Server by HTTP/azure.db.pgsql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN} |Warning |
This template is designed to monitor Microsoft SQL serverless databases by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure Microsoft SQL database ID. |
|
{$AZURE.DB.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.DB.MEMORY.UTIL.CRIT} | The critical threshold of memory utilization, expressed in %. |
90 |
{$AZURE.DB.STORAGE.PUSED.WARN} | The warning threshold of storage utilization, expressed in %. |
80 |
{$AZURE.DB.STORAGE.PUSED.CRIT} | The critical threshold of storage utilization, expressed in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Microsoft SQL: Get data | The result of API requests is expressed in the JSON. |
Script | azure.db.mssql.data.get |
Azure Microsoft SQL: Get errors | A list of errors from API requests. |
Dependent item | azure.db.mssql.data.errors Preprocessing
|
Azure Microsoft SQL: Availability state | The availability status of the resource. |
Dependent item | azure.db.mssql.availability.state Preprocessing
|
Azure Microsoft SQL: Availability status detailed | The summary description of the availability status. |
Dependent item | azure.db.mssql.availability.details Preprocessing
|
Azure Microsoft SQL: Percentage CPU | The CPU percent of a host. |
Dependent item | azure.db.mssql.cpu.percentage Preprocessing
|
Azure Microsoft SQL: Data IO percentage | The physical data read percentage. |
Dependent item | azure.db.mssql.data.read.percentage Preprocessing
|
Azure Microsoft SQL: Log IO percentage | The percentage of I/O log. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.log.write.percentage Preprocessing
|
Azure Microsoft SQL: Data space used | Data space used. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.storage.used Preprocessing
|
Azure Microsoft SQL: Connections successful | The count of successful connections. |
Dependent item | azure.db.mssql.connections.successful Preprocessing
|
Azure Microsoft SQL: Connections failed: System errors | The count of failed connections with system errors. |
Dependent item | azure.db.mssql.connections.failed.system Preprocessing
|
Azure Microsoft SQL: Connections blocked by firewall | The count of connections blocked by firewall. |
Dependent item | azure.db.mssql.firewall.blocked Preprocessing
|
Azure Microsoft SQL: Deadlocks | The count of deadlocks. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.deadlocks Preprocessing
|
Azure Microsoft SQL: Data space used percent | The percentage of used data space. Not applicable to the data warehouses or Hyperscale databases. |
Dependent item | azure.db.mssql.storage.percent Preprocessing
|
Azure Microsoft SQL: In-Memory OLTP storage percent | In-Memory OLTP storage percent. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.storage.xtp.percent Preprocessing
|
Azure Microsoft SQL: Workers percentage | The percentage of workers. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.workers.percent Preprocessing
|
Azure Microsoft SQL: Sessions percentage | The percentage of sessions. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.sessions.percent Preprocessing
|
Azure Microsoft SQL: CPU limit | The CPU limit. Applies to the vCore-based databases. |
Dependent item | azure.db.mssql.cpu.limit Preprocessing
|
Azure Microsoft SQL: CPU used | The CPU used. Applies to the vCore-based databases. |
Dependent item | azure.db.mssql.cpu.used Preprocessing
|
Azure Microsoft SQL: SQL Server process core percent | The CPU usage as a percentage of the SQL DB process. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.server.cpu.percent Preprocessing
|
Azure Microsoft SQL: SQL Server process memory percent | Memory usage as a percentage of the SQL DB process. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.server.memory.percent Preprocessing
|
Azure Microsoft SQL: Tempdb data file size | Space used in |
Dependent item | azure.db.mssql.tempdb.data.size Preprocessing
|
Azure Microsoft SQL: Tempdb log file size | Space used in |
Dependent item | azure.db.mssql.tempdb.log.size Preprocessing
|
Azure Microsoft SQL: Tempdb log used percent | The percentage of space used in |
Dependent item | azure.db.mssql.tempdb.log.percent Preprocessing
|
Azure Microsoft SQL: App CPU billed | App CPU billed. Applies to serverless databases. |
Dependent item | azure.db.mssql.app.cpu.billed Preprocessing
|
Azure Microsoft SQL: App CPU percentage | App CPU percentage. Applies to serverless databases. |
Dependent item | azure.db.mssql.app.cpu.percent Preprocessing
|
Azure Microsoft SQL: App memory percentage | App memory percentage. Applies to serverless databases. |
Dependent item | azure.db.mssql.app.memory.percent Preprocessing
|
Azure Microsoft SQL: Data space allocated | The allocated data storage. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.storage.allocated Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure Microsoft SQL: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.data.errors))>0 |Average |
||
Azure Microsoft SQL: Microsoft SQL database is unavailable | The resource state is unavailable. |
last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.availability.state)=2 |High |
||
Azure Microsoft SQL: Microsoft SQL database is degraded | The resource is in a degraded state. |
last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.availability.state)=1 |Average |
||
Azure Microsoft SQL: Microsoft SQL database is in unknown state | The resource state is unknown. |
last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.availability.state)=3 |Warning |
||
Azure Microsoft SQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT} |High |
||
Azure Microsoft SQL: Storage space is critically low | Critical utilization of the storage space. |
last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT} |Average |
||
Azure Microsoft SQL: Storage space is low | High utilization of the storage space. |
last(/Azure Microsoft SQL Serverless Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN} |Warning |
This template is designed to monitor Microsoft SQL databases by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure Microsoft SQL database ID. |
|
{$AZURE.DB.CPU.UTIL.CRIT} | The critical threshold of CPU utilization, expressed in %. |
90 |
{$AZURE.DB.MEMORY.UTIL.CRIT} | The critical threshold of memory utilization, expressed in %. |
90 |
{$AZURE.DB.STORAGE.PUSED.WARN} | The warning threshold of storage utilization, expressed in %. |
80 |
{$AZURE.DB.STORAGE.PUSED.CRIT} | The critical threshold of storage utilization, expressed in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Microsoft SQL: Get data | The result of API requests is expressed in the JSON. |
Script | azure.db.mssql.data.get |
Azure Microsoft SQL: Get errors | A list of errors from API requests. |
Dependent item | azure.db.mssql.data.errors Preprocessing
|
Azure Microsoft SQL: Availability state | The availability status of the resource. |
Dependent item | azure.db.mssql.availability.state Preprocessing
|
Azure Microsoft SQL: Availability status detailed | The summary description of the availability status. |
Dependent item | azure.db.mssql.availability.details Preprocessing
|
Azure Microsoft SQL: Percentage CPU | The CPU percent of a host. |
Dependent item | azure.db.mssql.cpu.percentage Preprocessing
|
Azure Microsoft SQL: Data IO percentage | The percentage of physical data read. |
Dependent item | azure.db.mssql.data.read.percentage Preprocessing
|
Azure Microsoft SQL: Log IO percentage | The percentage of I/O log. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.log.write.percentage Preprocessing
|
Azure Microsoft SQL: Data space used | Data space used. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.storage.used Preprocessing
|
Azure Microsoft SQL: Connections successful | The count of successful connections. |
Dependent item | azure.db.mssql.connections.successful Preprocessing
|
Azure Microsoft SQL: Connections failed: System errors | The count of failed connections with system errors. |
Dependent item | azure.db.mssql.connections.failed.system Preprocessing
|
Azure Microsoft SQL: Connections blocked by firewall | The count of connections blocked by firewall. |
Dependent item | azure.db.mssql.firewall.blocked Preprocessing
|
Azure Microsoft SQL: Deadlocks | The count of deadlocks. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.deadlocks Preprocessing
|
Azure Microsoft SQL: Data space used percent | Data space used percent. Not applicable to the data warehouses or Hyperscale databases. |
Dependent item | azure.db.mssql.storage.percent Preprocessing
|
Azure Microsoft SQL: In-Memory OLTP storage percent | In-Memory OLTP storage percent. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.storage.xtp.percent Preprocessing
|
Azure Microsoft SQL: Workers percentage | The percentage of workers. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.workers.percent Preprocessing
|
Azure Microsoft SQL: Sessions percentage | The percentage of sessions. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.sessions.percent Preprocessing
|
Azure Microsoft SQL: Sessions count | The number of active sessions. Not applicable to Synapse DW Analytics. |
Dependent item | azure.db.mssql.sessions.count Preprocessing
|
Azure Microsoft SQL: CPU limit | The CPU limit. Applies to the vCore-based databases. |
Dependent item | azure.db.mssql.cpu.limit Preprocessing
|
Azure Microsoft SQL: CPU used | The CPU used. Applies to the vCore-based databases. |
Dependent item | azure.db.mssql.cpu.used Preprocessing
|
Azure Microsoft SQL: SQL Server process core percent | The CPU usage as a percentage of the SQL DB process. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.server.cpu.percent Preprocessing
|
Azure Microsoft SQL: SQL Server process memory percent | Memory usage as a percentage of the SQL DB process. Not applicable to data warehouses. |
Dependent item | azure.db.mssql.server.memory.percent Preprocessing
|
Azure Microsoft SQL: Tempdb data file size | The space used in |
Dependent item | azure.db.mssql.tempdb.data.size Preprocessing
|
Azure Microsoft SQL: Tempdb log file size | The space used in |
Dependent item | azure.db.mssql.tempdb.log.size Preprocessing
|
Azure Microsoft SQL: Tempdb log used percent | The percentage of space used in |
Dependent item | azure.db.mssql.tempdb.log.percent Preprocessing
|
Azure Microsoft SQL: Data space allocated | The allocated data storage. Not applicable to the data warehouses. |
Dependent item | azure.db.mssql.storage.allocated Preprocessing
|
Azure Microsoft SQL: Full backup storage size | Cumulative full backup storage size. Applies to the vCore-based databases. Not applicable to the Hyperscale databases. |
Dependent item | azure.db.mssql.storage.backup.size Preprocessing
|
Azure Microsoft SQL: Differential backup storage size | Cumulative differential backup storage size. Applies to the vCore-based databases. Not applicable to the Hyperscale databases. |
Dependent item | azure.db.mssql.storage.backup.diff.size Preprocessing
|
Azure Microsoft SQL: Log backup storage size | Cumulative log backup storage size. Applies to the vCore-based and Hyperscale databases. |
Dependent item | azure.db.mssql.storage.backup.log.size Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure Microsoft SQL: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.data.errors))>0 |Average |
||
Azure Microsoft SQL: Microsoft SQL database is unavailable | The resource state is unavailable. |
last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.availability.state)=2 |High |
||
Azure Microsoft SQL: Microsoft SQL database is degraded | The resource is in a degraded state. |
last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.availability.state)=1 |Average |
||
Azure Microsoft SQL: Microsoft SQL database is in unknown state | The resource state is unknown. |
last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.availability.state)=3 |Warning |
||
Azure Microsoft SQL: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.cpu.percentage,5m)>{$AZURE.DB.CPU.UTIL.CRIT} |High |
||
Azure Microsoft SQL: Storage space is critically low | Critical utilization of the storage space. |
last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.CRIT} |Average |
||
Azure Microsoft SQL: Storage space is low | High utilization of the storage space. |
last(/Azure Microsoft SQL Database by HTTP/azure.db.mssql.storage.percent)>{$AZURE.DB.STORAGE.PUSED.WARN} |Warning |
This template is designed for the effortless deployment of Azure Cosmos DB for MongoDB monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure Cosmos DB ID. |
|
{$AZURE.DB.COSMOS.MONGO.AVAILABILITY} | The warning threshold of the Cosmos DB for MongoDB service availability. |
70 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure MongoDB: Get data | The result of API requests is expressed in the JSON. |
Script | azure.cosmosdb.data.get |
Azure MongoDB: Get errors | A list of errors from API requests. |
Dependent item | azure.cosmosdb.data.errors Preprocessing
|
Azure MongoDB: Total requests | Number of requests per minute. |
Dependent item | azure.cosmosdb.total.requests Preprocessing
|
Azure MongoDB: Total request units | The request units consumed per minute. |
Dependent item | azure.cosmosdb.total.request.units Preprocessing
|
Azure MongoDB: Metadata requests | The count of metadata requests. Cosmos DB maintains system metadata collection for each account, which allows you to enumerate collections, databases, etc., and their configurations, free of charge. |
Dependent item | azure.cosmosdb.metadata.requests Preprocessing
|
Azure MongoDB: Mongo requests | The number of Mongo requests made. |
Dependent item | azure.cosmosdb.mongo.requests Preprocessing
|
Azure MongoDB: Mongo request charge | The Mongo request units consumed. |
Dependent item | azure.cosmosdb.mongo.requests.charge Preprocessing
|
Azure MongoDB: Server side latency | The server side latency. |
Dependent item | azure.cosmosdb.server.side.latency Preprocessing
|
Azure MongoDB: Server side latency, gateway | The server side latency in gateway connection mode. |
Dependent item | azure.cosmosdb.server.side.latency.gateway Preprocessing
|
Azure MongoDB: Server side latency, direct | The server side latency in direct connection mode. |
Dependent item | azure.cosmosdb.server.side.latency.direct Preprocessing
|
Azure MongoDB: Replication latency, P99 | The P99 replication latency across source and target regions for geo-enabled account. |
Dependent item | azure.cosmosdb.replication.latency Preprocessing
|
Azure MongoDB: Service availability | The account requests availability at one hour granularity. |
Dependent item | azure.cosmosdb.service.availability Preprocessing
|
Azure MongoDB: Data usage | The total data usage. |
Dependent item | azure.cosmosdb.data.usage Preprocessing
|
Azure MongoDB: Index usage | The total index usage. |
Dependent item | azure.cosmosdb.index.usage Preprocessing
|
Azure MongoDB: Document quota | The total storage quota. |
Dependent item | azure.cosmosdb.document.quota Preprocessing
|
Azure MongoDB: Document count | The total document count. |
Dependent item | azure.cosmosdb.document.count Preprocessing
|
Azure MongoDB: Normalized RU consumption | The max RU consumption percentage per minute. |
Dependent item | azure.cosmosdb.normalized.ru.consumption Preprocessing
|
Azure MongoDB: Physical partition throughput | The physical partition throughput. |
Dependent item | azure.cosmosdb.physical.partition.throughput Preprocessing
|
Azure MongoDB: Autoscale max throughput | The autoscale max throughput. |
Dependent item | azure.cosmosdb.autoscale.max.throughput Preprocessing
|
Azure MongoDB: Provisioned throughput | The provisioned throughput. |
Dependent item | azure.cosmosdb.provisioned.throughput Preprocessing
|
Azure MongoDB: Physical partition size | The physical partition size in bytes. |
Dependent item | azure.cosmosdb.physical.partition.size Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Azure MongoDB: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure Cosmos DB for MongoDB by HTTP/azure.cosmosdb.data.errors))>0 |Average |
||
Azure MongoDB: Cosmos DB for MongoDB account: Availability is low | (min(/Azure Cosmos DB for MongoDB by HTTP/azure.cosmosdb.service.availability,#3))<{$AZURE.DB.COSMOS.MONGO.AVAILABILITY} |Warning |
This template is designed to monitor Microsoft Cost Management by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
60s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.BILLING.MONTH} | Months to get historical data from Azure Cost Management API, no more than 11 (plus current month). The time period for pulling the data cannot exceed 1 year. |
11 |
{$AZURE.LLD.FILTER.SERVICE.MATCHES} | Filter of discoverable services by name. |
.* |
{$AZURE.LLD.FILTER.SERVICE.NOT_MATCHES} | Filter to exclude discovered services by name. |
CHANGE_IF_NEEDED |
{$AZURE.LLD.FILTER.RESOURCE.LOCATION.MATCHES} | Filter of discoverable locations by name. |
.* |
{$AZURE.LLD.FILTER.RESOURCE.LOCATION.NOT_MATCHES} | Filter to exclude discovered locations by name. |
CHANGE_IF_NEEDED |
{$AZURE.LLD.FILTER.RESOURCE.GROUP.MATCHES} | Filter of discoverable resource groups by name. |
.* |
{$AZURE.LLD.FILTER.RESOURCE.GROUP.NOT_MATCHES} | Filter to exclude discovered resource groups by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Cost: Get monthly costs | The result of API requests is expressed in the JSON. |
Script | azure.get.monthly.costs |
Azure Cost: Get daily costs | The result of API requests is expressed in the JSON. |
Script | azure.get.daily.costs |
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure daily costs by services discovery | Discovery of daily costs by services. |
Dependent item | azure.daily.services.costs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Cost: Service ["{#AZURE.SERVICE.NAME}"]: Meter ["{#AZURE.BILLING.METER}"]: Subcategory ["{#AZURE.BILLING.METER.SUBCATEGORY}"] daily cost | The daily cost by service {#AZURE.SERVICE.NAME}, meter {#AZURE.BILLING.METER}, subcategory {#AZURE.BILLING.METER.SUBCATEGORY}. |
Dependent item | azure.daily.cost["{#AZURE.SERVICE.NAME}", "{#AZURE.BILLING.METER}", "{#AZURE.BILLING.METER.SUBCATEGORY}","{#AZURE.RESOURCE.GROUP}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure monthly costs by services discovery | Discovery of monthly costs by services. |
Dependent item | azure.monthly.services.costs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Cost: Service ["{#AZURE.SERVICE.NAME}"]: Month ["{#AZURE.BILLING.MONTH}"] cost | The monthly cost by service {#AZURE.SERVICE.NAME}. |
Dependent item | azure.monthly.service.cost["{#AZURE.SERVICE.NAME}", "{#AZURE.BILLING.MONTH}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure monthly costs by location discovery | Discovery of monthly costs by location. |
Dependent item | azure.monthly.location.costs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Cost: Location: ["{#AZURE.RESOURCE.LOCATION}"]: Month ["{#AZURE.BILLING.MONTH}"] cost | The monthly cost by location {#AZURE.RESOURCE.LOCATION}. |
Dependent item | azure.monthly.location.cost["{#AZURE.RESOURCE.LOCATION}", "{#AZURE.BILLING.MONTH}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure monthly costs by resource group discovery | Discovery of monthly costs by resource group. |
Dependent item | azure.monthly.resource.group.costs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Cost: Resource group: ["{#AZURE.RESOURCE.GROUP}"]: Month ["{#AZURE.BILLING.MONTH}"] cost | The monthly cost by resource group {#AZURE.RESOURCE.GROUP}. |
Dependent item | azure.monthly.resource.group.cost["{#AZURE.RESOURCE.GROUP}", "{#AZURE.BILLING.MONTH}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure monthly costs discovery | Discovery of monthly costs. |
Dependent item | azure.monthly.costs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Azure Cost: Month ["{#AZURE.BILLING.MONTH}"] cost | The monthly cost. |
Dependent item | azure.monthly.cost["{#AZURE.BILLING.MONTH}"] Preprocessing
|
This template is designed to monitor Microsoft Azure SQL Managed Instance by HTTP. It works without any external scripts and uses the script item.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Azure service principal via the Azure command-line interface (Azure CLI) for your subscription.
az ad sp create-for-rbac --name zabbix --role reader --scope /subscriptions/<subscription_id>
See Azure documentation for more details.
{$AZURE.APP.ID}
, {$AZURE.PASSWORD}
, {$AZURE.TENANT.ID}
, {$AZURE.SUBSCRIPTION.ID}
, and {$AZURE.RESOURCE.ID}
.Name | Description | Default |
---|---|---|
{$AZURE.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AZURE.APP.ID} | The App ID of Microsoft Azure. |
|
{$AZURE.PASSWORD} | Microsoft Azure password. |
|
{$AZURE.DATA.TIMEOUT} | API response timeout. |
15s |
{$AZURE.TENANT.ID} | Microsoft Azure tenant ID. |
|
{$AZURE.SUBSCRIPTION.ID} | Microsoft Azure subscription ID. |
|
{$AZURE.RESOURCE.ID} | Microsoft Azure SQL managed instance ID. |
|
{$AZURE.SQL.INST.SPACE.CRIT} | Storage space critical threshold, expressed in %. |
90 |
{$AZURE.SQL.INST.SPACE.WARN} | Storage space warning threshold, expressed in %. |
80 |
{$AZURE.SQL.INST.CPU.WARN} | CPU utilization warning threshold, expressed in %. |
80 |
{$AZURE.SQL.INST.CPU.CRIT} | CPU utilization critical threshold, expressed in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get data | Gathers data of the Azure SQL managed instance. |
Script | azure.sql_inst.data.get |
Get errors | A list of errors from API requests. |
Dependent item | azure.sql_inst.data.errors Preprocessing
|
Availability state | The availability status of the resource. 0 - Available - no events detected that affect the health of the resource. 1 - Degraded - your resource detected a loss in performance, although it's still available for use. 2 - Unavailable - the service detected an ongoing platform or non-platform event that affects the health of the resource. 3 - Unknown - Resource Health hasn't received information about the resource for more than 10 minutes. |
Dependent item | azure.sql_inst.availability.state Preprocessing
|
Availability status detailed | The summary description of the availability status. |
Dependent item | azure.sql_inst.availability.details Preprocessing
|
Average CPU utilization | Average CPU utilization of the instance. |
Dependent item | azure.sql_inst.cpu Preprocessing
|
IO bytes read | Bytes read by the managed instance. |
Dependent item | azure.sql_inst.bytes.read Preprocessing
|
IO bytes write | Bytes written by the managed instance. |
Dependent item | azure.sql_inst.bytes.write Preprocessing
|
IO request count | IO request count by the managed instance. |
Dependent item | azure.sql_inst.requests Preprocessing
|
Storage space reserved | Storage space reserved by the managed instance. |
Dependent item | azure.sql_inst.storage.reserved Preprocessing
|
Storage space used | Storage space used by the managed instance. |
Dependent item | azure.sql_inst.storage.used Preprocessing
|
Storage space utilization | Managed instance storage space utilization, in percent. |
Calculated | azure.sql_inst.storage.utilization |
Virtual core count | Virtual core count available to the managed instance. |
Dependent item | azure.sql_inst.core.count Preprocessing
|
Instance state | State of the managed instance. |
Dependent item | azure.sql_inst.state Preprocessing
|
Instance collation | Collation of the managed instance. |
Dependent item | azure.sql_inst.collation Preprocessing
|
Instance provisioning state | Provisioning state of the managed instance. |
Dependent item | azure.sql_inst.provision Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.data.errors))>0 |Average |
||
Azure SQL managed instance is unavailable | The resource state is unavailable. |
last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.availability.state)=2 |High |
||
Azure SQL managed instance is degraded | The resource is in a degraded state. |
last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.availability.state)=1 |Average |
||
Azure SQL managed instance is in unknown state | The resource state is unknown. |
last(/Azure SQL Managed Instance by HTTP/azure.sql_inst.availability.state)=3 |Warning |
||
Critically high CPU utilization | CPU utilization is critically high. |
min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.cpu, 10m)>={$AZURE.SQL.INST.CPU.CRIT} |Average |
Depends on:
|
|
High CPU utilization | CPU utilization is too high. |
min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.cpu, 10m)>={$AZURE.SQL.INST.CPU.WARN} |Warning |
||
Storage free space is critically low | The free storage space has been less than |
min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.storage.utilization,5m)>{$AZURE.SQL.INST.SPACE.CRIT} |Average |
Manual close: Yes Depends on:
|
|
Storage free space is low | The free storage space has been less than |
min(/Azure SQL Managed Instance by HTTP/azure.sql_inst.storage.utilization,5m)>{$AZURE.SQL.INST.SPACE.WARN} |Warning |
Manual close: Yes | |
Instance state has changed | Azure SQL managed instance state has changed. |
change(/Azure SQL Managed Instance by HTTP/azure.sql_inst.state)=1 |Warning |
||
Instance collation has changed | Azure SQL managed instance collation has changed. |
change(/Azure SQL Managed Instance by HTTP/azure.sql_inst.collation)=1 |Average |
||
Instance provisioning state has changed | Azure SQL managed instance provisioning state has changed. |
change(/Azure SQL Managed Instance by HTTP/azure.sql_inst.provision)=1 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of AWS monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect metrics.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ec2:DescribeInstances",
"ec2:DescribeVolumes",
"ec2:DescribeRegions",
"rds:DescribeEvents",
"rds:DescribeDBInstances",
"ecs:DescribeClusters",
"ecs:ListServices",
"ecs:ListTasks",
"ecs:ListClusters",
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"s3:GetMetricsConfiguration",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeTargetGroups",
"ec2:DescribeSecurityGroups",
"lambda:ListFunctions"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ec2:DescribeInstances",
"ec2:DescribeVolumes",
"ec2:DescribeRegions",
"rds:DescribeEvents",
"rds:DescribeDBInstances",
"ecs:DescribeClusters",
"ecs:ListServices",
"ecs:ListTasks",
"ecs:ListClusters",
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"s3:GetMetricsConfiguration",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeTargetGroups",
"ec2:DescribeSecurityGroups",
"lambda:ListFunctions"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, add the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ec2:DescribeInstances",
"ec2:DescribeVolumes",
"ec2:DescribeRegions",
"rds:DescribeEvents",
"rds:DescribeDBInstances",
"ecs:DescribeClusters",
"ecs:ListServices",
"ecs:ListTasks",
"ecs:ListClusters",
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"s3:GetMetricsConfiguration",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeTargetGroups",
"ec2:DescribeSecurityGroups",
"lambda:ListFunctions"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
To gather Request metrics, enable Requests metrics on your Amazon S3 buckets from the AWS console.
Set the macros: {$AWS.AUTH_TYPE}
. Possible values: access_key
, assume_role
, role_base
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see official documentation.
Refer to the Macros section for a list of macros used for LLD filters.
Additional information about the metrics and used API methods:
Name | Description | Default |
---|---|---|
{$AWS.DATA.TIMEOUT} | A response timeout for an API. |
60s |
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.REQUEST.REGION} | Region used in GET request |
us-east-1 |
{$AWS.DESCRIBE.REGION} | Region used in POST request |
us-east-1 |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.EC2.LLD.FILTER.NAME.MATCHES} | Filter of discoverable EC2 instances by namespace. |
.* |
{$AWS.EC2.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered EC2 instances by namespace. |
CHANGE_IF_NEEDED |
{$AWS.EC2.LLD.FILTER.REGION.MATCHES} | Filter of discoverable EC2 instances by region. |
.* |
{$AWS.EC2.LLD.FILTER.REGION.NOT_MATCHES} | Filter to exclude discovered EC2 instances by region. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.NAME.MATCHES} | Filter of discoverable ECS clusters by name. |
.* |
{$AWS.ECS.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered ECS clusters by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.STATUS.MATCHES} | Filter of discoverable ECS clusters by status. |
ACTIVE |
{$AWS.ECS.LLD.FILTER.STATUS.NOT_MATCHES} | Filter to exclude discovered ECS clusters by status. |
CHANGE_IF_NEEDED |
{$AWS.S3.LLD.FILTER.NAME.MATCHES} | Filter of discoverable S3 buckets by namespace. |
.* |
{$AWS.S3.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered S3 buckets by namespace. |
CHANGE_IF_NEEDED |
{$AWS.RDS.LLD.FILTER.NAME.MATCHES} | Filter of discoverable RDS instances by namespace. |
.* |
{$AWS.RDS.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered RDS instances by namespace. |
CHANGE_IF_NEEDED |
{$AWS.RDS.LLD.FILTER.REGION.MATCHES} | Filter of discoverable RDS instances by region. |
.* |
{$AWS.RDS.LLD.FILTER.REGION.NOT_MATCHES} | Filter to exclude discovered RDS instances by region. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.REGION.MATCHES} | Filter of discoverable ECS clusters by region. |
.* |
{$AWS.ECS.LLD.FILTER.REGION.NOT_MATCHES} | Filter to exclude discovered ECS clusters by region. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.NAME.MATCHES} | Filter of discoverable ELB load balancers by name. |
.* |
{$AWS.ELB.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered ELB load balancers by name. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.REGION.MATCHES} | Filter of discoverable ELB load balancers by region. |
.* |
{$AWS.ELB.LLD.FILTER.REGION.NOT_MATCHES} | Filter to exclude discovered ELB load balancers by region. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.STATE.MATCHES} | Filter of discoverable ELB load balancers by status. |
active |
{$AWS.ELB.LLD.FILTER.STATE.NOT_MATCHES} | Filter to exclude discovered ELB load balancer by status. |
CHANGE_IF_NEEDED |
{$AWS.LAMBDA.LLD.FILTER.REGION.MATCHES} | Filter of discoverable Lambda functions by region. |
.* |
{$AWS.LAMBDA.LLD.FILTER.REGION.NOT_MATCHES} | Filter to exclude discovered Lambda functions by region. |
CHANGE_IF_NEEDED |
{$AWS.LAMBDA.LLD.FILTER.RUNTIME.MATCHES} | Filter of discoverable Lambda functions by Runtime. |
.* |
{$AWS.LAMBDA.LLD.FILTER.RUNTIME.NOT_MATCHES} | Filter to exclude discovered Lambda functions by Runtime. |
CHANGE_IF_NEEDED |
{$AWS.LAMBDA.LLD.FILTER.NAME.MATCHES} | Filter of discoverable Lambda functions by name. |
.* |
{$AWS.LAMBDA.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered Lambda functions by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
S3 buckets discovery | Get S3 bucket instances. |
Script | aws.s3.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
EC2 instances discovery | Get EC2 instances. |
Script | aws.ec2.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
RDS instances discovery | Get RDS instances. |
Script | aws.rds.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ECS clusters discovery | Get ECS clusters. |
Script | aws.ecs.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ELB load balancers discovery | Get ELB load balancers. |
Script | aws.elb.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Lambda discovery | Get Lambda functions. |
Script | aws.lambda.discovery |
The template to monitor AWS EC2 and attached AWS EBS volumes by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about metrics and used API methods:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template get AWS EC2 and attached AWS EBS volumes metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon EC2 metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"ec2:DescribeVolumes",
"cloudwatch:"DescribeAlarms",
"cloudwatch:GetMetricData"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"cloudwatch:"DescribeAlarms",
"cloudwatch:GetMetricData"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"cloudwatch:"DescribeAlarms",
"cloudwatch:GetMetricData"
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
For more information, see the EC2 policies on the AWS website.
Set the macros: {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, {$AWS.EC2.INSTANCE.ID}
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about manage access keys, see official documentation
Also, see the Macros section for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | Amazon EC2 Region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.EC2.INSTANCE.ID} | EC2 instance ID. |
|
{$AWS.EC2.LLD.FILTER.VOLUME_TYPE.MATCHES} | Filter of discoverable volumes by type. |
.* |
{$AWS.EC2.LLD.FILTER.VOLUMETYPE.NOTMATCHES} | Filter to exclude discovered volumes by type. |
CHANGE_IF_NEEDED |
{$AWS.EC2.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.EC2.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.EC2.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.EC2.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.EC2.CPU.UTIL.WARN.MAX} | The warning threshold of the CPU utilization expressed in %. |
85 |
{$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN} | Minimum number of free earned CPU credits for trigger expression. |
50 |
{$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN} | Maximum number of spent CPU Surplus credits for trigger expression. |
100 |
{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN} | Minimum percentage of I/O credits remaining for trigger expression. |
20 |
{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN} | Minimum percentage of Byte credits remaining for trigger expression. |
20 |
{$AWS.EBS.BURST.CREDIT.BALANCE.MIN.WARN} | Minimum percentage of Byte credits remaining for trigger expression. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS EC2: Get metrics data | Get instance metrics. Full metrics list related to EC2: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewingmetricswith_cloudwatch.html |
Script | aws.ec2.get_metrics Preprocessing
|
AWS CloudWatch: Get instance alarms data | DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.ec2.get_alarms Preprocessing
|
AWS EBS: Get volumes data | Get volumes attached to instance. DescribeVolumes API method: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeVolumes.html |
Script | aws.ec2.get_volumes Preprocessing
|
AWS EC2: Get metrics check | Check result of the instance metric data has been got correctly. |
Dependent item | aws.ec2.metrics.check Preprocessing
|
AWS EC2: Get alarms check | Check result of the alarm data has been got correctly. |
Dependent item | aws.ec2.alarms.check Preprocessing
|
AWS EC2: Get volumes info check | Check result of the volume information has been got correctly. |
Dependent item | aws.ec2.volumes.check Preprocessing
|
AWS EC2: Credit CPU: Balance | The number of earned CPU credits that an instance has accrued since it was launched or started. For T2 Standard, the CPUCreditBalance also includes the number of launch credits that have been accrued. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. The credit balance has a maximum limit, determined by the instance size. After the limit is reached, any new credits that are earned are discarded. For T2 Standard, launch credits do not count towards the limit. The credits in the CPUCreditBalance are available for the instance to spend to burst beyond its baseline CPU utilization. When an instance is running, credits in the CPUCreditBalance do not expire. When a T3 or T3a instance stops, the CPUCreditBalance value persists for seven days. Thereafter, all accrued credits are lost. When a T2 instance stops, the CPUCreditBalance value does not persist, and all accrued credits are lost. |
Dependent item | aws.ec2.cpu.credit_balance Preprocessing
|
AWS EC2: Credit CPU: Usage | The number of CPU credits spent by the instance for CPU utilization. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes). |
Dependent item | aws.ec2.cpu.credit_usage Preprocessing
|
AWS EC2: Credit CPU: Surplus balance | The number of surplus credits that have been spent by an unlimited instance when its CPUCreditBalance value is zero. The CPUSurplusCreditBalance value is paid down by earned CPU credits. If the number of surplus credits exceeds the maximum number of credits that the instance can earn in a 24-hour period, the spent surplus credits above the maximum incur an additional charge. |
Dependent item | aws.ec2.cpu.surpluscreditbalance Preprocessing
|
AWS EC2: Credit CPU: Surplus charged | The number of spent surplus credits that are not paid down by earned CPU credits, and which thus incur an additional charge. Spent surplus credits are charged when any of the following occurs: - The spent surplus credits exceed the maximum number of credits that the instance can earn in a 24-hour period. Spent surplus credits above the maximum are charged at the end of the hour; - The instance is stopped or terminated; - The instance is switched from unlimited to standard. |
Dependent item | aws.ec2.cpu.surpluscreditcharged Preprocessing
|
AWS EC2: CPU: Utilization | The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application on a selected instance. Depending on the instance type, tools in your operating system can show a lower percentage than CloudWatch when the instance is not allocated a full processor core. |
Dependent item | aws.ec2.cpu_utilization Preprocessing
|
AWS EC2: Disk: Read bytes, rate | Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application. If there are no instance store volumes, either the value is 0 or the metric is not reported. |
Dependent item | aws.ec2.disk.read_bytes.rate Preprocessing
|
AWS EC2: Disk: Read, rate | Completed read operations from all instance store volumes available to the instance in a specified period of time. If there are no instance store volumes, either the value is 0 or the metric is not reported. |
Dependent item | aws.ec2.disk.read_ops.rate Preprocessing
|
AWS EC2: Disk: Write bytes, rate | Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. If there are no instance store volumes, either the value is 0 or the metric is not reported. |
Dependent item | aws.ec2.diskwritebytes.rate Preprocessing
|
AWS EC2: Disk: Write ops, rate | Completed write operations to all instance store volumes available to the instance in a specified period of time. If there are no instance store volumes, either the value is 0 or the metric is not reported. |
Dependent item | aws.ec2.diskwriteops.rate Preprocessing
|
AWS EC2: EBS: Byte balance | Percentage of throughput credits remaining in the burst bucket for Nitro-based instances. |
Dependent item | aws.ec2.ebs.byte_balance Preprocessing
|
AWS EC2: EBS: IO balance | Percentage of I/O credits remaining in the burst bucket for Nitro-based instances. |
Dependent item | aws.ec2.ebs.io_balance Preprocessing
|
AWS EC2: EBS: Read bytes, rate | Bytes read from all EBS volumes attached to the instance for Nitro-based instances. |
Dependent item | aws.ec2.ebs.read_bytes.rate Preprocessing
|
AWS EC2: EBS: Read, rate | Completed read operations from all Amazon EBS volumes attached to the instance for Nitro-based instances. |
Dependent item | aws.ec2.ebs.read_ops.rate Preprocessing
|
AWS EC2: EBS: Write bytes, rate | Bytes written to all EBS volumes attached to the instance for Nitro-based instances. |
Dependent item | aws.ec2.ebs.write_bytes.rate Preprocessing
|
AWS EC2: EBS: Write, rate | Completed write operations to all EBS volumes attached to the instance in a specified period of time. |
Dependent item | aws.ec2.ebs.write_ops.rate Preprocessing
|
AWS EC2: Metadata: No token | The number of times the instance metadata service was successfully accessed using a method that does not use a token. This metric is used to determine if there are any processes accessing instance metadata that are using Instance Metadata Service Version 1, which does not use a token. If all requests use token-backed sessions, i.e., Instance Metadata Service Version 2, the value is 0. |
Dependent item | aws.ec2.metadata.no_token Preprocessing
|
AWS EC2: Network: Bytes in, rate | The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance. |
Dependent item | aws.ec2.network_in.rate Preprocessing
|
AWS EC2: Network: Bytes out, rate | The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance. |
Dependent item | aws.ec2.network_out.rate Preprocessing
|
AWS EC2: Network: Packets in, rate | The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. |
Dependent item | aws.ec2.packets_in.rate Preprocessing
|
AWS EC2: Network: Packets out, rate | The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. |
Dependent item | aws.ec2.packets_out.rate Preprocessing
|
AWS EC2: Status: Check failed | Reports whether the instance has passed both the instance status check and the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed). |
Dependent item | aws.ec2.statuscheckfailed Preprocessing
|
AWS EC2: Status: Check failed, instance | Reports whether the instance has passed the instance status check in the last minute. This metric can be either 0 (passed) or 1 (failed). |
Dependent item | aws.ec2.statuscheckfailed_instance Preprocessing
|
AWS EC2: Status: Check failed, system | Reports whether the instance has passed the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed). |
Dependent item | aws.ec2.statuscheckfailed_system Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS EC2: Failed to get metrics data | Failed to get CloudWatch metrics for EC2. |
length(last(/AWS EC2 by HTTP/aws.ec2.metrics.check))>0 |Warning |
||
AWS EC2: Failed to get alarms data | Failed to get CloudWatch alarms for EC2. |
length(last(/AWS EC2 by HTTP/aws.ec2.alarms.check))>0 |Warning |
||
AWS EC2: Failed to get volumes info | Failed to get CloudWatch volumes for EC2. |
length(last(/AWS EC2 by HTTP/aws.ec2.volumes.check))>0 |Warning |
||
AWS EC2: Instance CPU Credit balance is too low | The number of earned CPU credits has been less than {$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN} in the last 5 minutes. |
max(/AWS EC2 by HTTP/aws.ec2.cpu.credit_balance,5m)<{$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN} |Warning |
||
AWS EC2: Instance has spent too many CPU surplus credits | The number of spent surplus credits that are not paid down and which thus incur an additional charge is over {$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}. |
last(/AWS EC2 by HTTP/aws.ec2.cpu.surplus_credit_charged)>{$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN} |Warning |
||
AWS EC2: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS EC2 by HTTP/aws.ec2.cpu_utilization,15m)>{$AWS.EC2.CPU.UTIL.WARN.MAX} |Warning |
||
AWS EC2: Byte Credit balance is too low | max(/AWS EC2 by HTTP/aws.ec2.ebs.byte_balance,5m)<{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN} |Warning |
|||
AWS EC2: I/O Credit balance is too low | max(/AWS EC2 by HTTP/aws.ec2.ebs.io_balance,5m)<{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN} |Warning |
|||
AWS EC2: Instance status check failed | These checks detect problems that require your involvement to repair. |
last(/AWS EC2 by HTTP/aws.ec2.status_check_failed_instance)=1 |Average |
||
AWS EC2: System status check failed | These checks detect underlying problems with your instance that require AWS involvement to repair. |
last(/AWS EC2 by HTTP/aws.ec2.status_check_failed_system)=1 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Instance Alarms discovery | Discovery instance and attached EBS volumes alarms. |
Dependent item | aws.ec2.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS EC2 Alarms: [{#ALARM_NAME}]: Get metrics | Get alarm metrics about the state and its reason. |
Dependent item | aws.ec2.alarm.getmetrics["{#ALARMNAME}"] Preprocessing
|
AWS EC2 Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ec2.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS EC2 Alarms: [{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM). Alarm description: {#ALARMDESCRIPTION} |
Dependent item | aws.ec2.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS EC2 Alarms: [{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has 'Alarm' state. |
last(/AWS EC2 by HTTP/aws.ec2.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS EC2 by HTTP/aws.ec2.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS EC2 Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS EC2 by HTTP/aws.ec2.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Instance Volumes discovery | Discovery attached EBS volumes. |
Dependent item | aws.ec2.volumes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS EBS: [{#VOLUME_ID}]: Get volume data | Get data of the "{#VOLUME_ID}" volume. |
Dependent item | aws.ec2.ebs.getvolume["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Create time | The time stamp when volume creation was initiated. |
Dependent item | aws.ec2.ebs.createtime["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Status | The state of the volume. Possible values: 0 (creating), 1 (available), 2 (in-use), 3 (deleting), 4 (deleted), 5 (error). |
Dependent item | aws.ec2.ebs.status["{#VOLUME_ID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Attachment state | The attachment state of the volume. Possible values: 0 (attaching), 1 (attached), 2 (detaching). |
Dependent item | aws.ec2.ebs.attachmentstatus["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Attachment time | The time stamp when the attachment initiated. |
Dependent item | aws.ec2.ebs.attachmenttime["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Device | The device name specified in the block device mapping (for example, /dev/sda1). |
Dependent item | aws.ec2.ebs.device["{#VOLUME_ID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Get metrics | Get metrics of EBS volume. Full metrics list related to EBS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/usingcloudwatchebs.html |
Script | aws.ec2.getebsmetrics["{#VOLUME_ID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Read, bytes | Provides information on the read operations in a specified period of time. The average size of each read operation during the period, except on volumes attached to a Nitro-based instance, where the average represents the average over the specified period. For Xen instances, data is reported only when there is read activity on the volume. |
Dependent item | aws.ec2.ebs.volume.readbytes["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Write, bytes | Provides information on the write operations in a specified period of time. The average size of each write operation during the period, except on volumes attached to a Nitro-based instance, where the average represents the average over the specified period. For Xen instances, data is reported only when there is write activity on the volume. |
Dependent item | aws.ec2.ebs.volume.writebytes["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Write, ops | The total number of write operations in a specified period of time. Note: write operations are counted on completion. |
Dependent item | aws.ec2.ebs.volume.writeops["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Read, ops | The total number of read operations in a specified period of time. Note: read operations are counted on completion. |
Dependent item | aws.ec2.ebs.volume.readops["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Read time, total | This metric is not supported with Multi-Attach enabled volumes. The total number of seconds spent by all read operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 1 minutes (60 seconds): if 150 operations completed during that period, and each operation took 1 second, the value would be 150 seconds. For Xen instances, data is reported only when there is read activity on the volume. |
Dependent item | aws.ec2.ebs.volume.totalreadtime["{#VOLUME_ID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Write time, total | This metric is not supported with Multi-Attach enabled volumes. The total number of seconds spent by all write operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 1 minute (60 seconds): if 150 operations completed during that period, and each operation took 1 second, the value would be 150 seconds. For Xen instances, data is reported only when there is write activity on the volume. |
Dependent item | aws.ec2.ebs.volume.totalwritetime["{#VOLUME_ID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Idle time | This metric is not supported with Multi-Attach enabled volumes. The total number of seconds in a specified period of time when no read or write operations were submitted. |
Dependent item | aws.ec2.ebs.volume.idletime["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Queue length | The number of read and write operation requests waiting to be completed in a specified period of time. |
Dependent item | aws.ec2.ebs.volume.queuelength["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Throughput, pct | This metric is not supported with Multi-Attach enabled volumes. Used with Provisioned IOPS SSD volumes only. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. Provisioned IOPS SSD volumes deliver their provisioned performance 99.9 percent of the time. During a write, if there are no other pending I/O requests in a minute, the metric value will be 100 percent. Also, a volume's I/O performance may become degraded temporarily due to an action you have taken (for example, creating a snapshot of a volume during peak usage, running the volume on a non-EBS-optimized instance, or accessing data on the volume for the first time). |
Dependent item | aws.ec2.ebs.volume.throughputpercentage["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Consumed Read/Write, ops | Used with Provisioned IOPS SSD volumes only. The total amount of read and write operations (normalized to 256K capacity units) consumed in a specified period of time. I/O operations that are smaller than 256K each count as 1 consumed IOPS. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS. |
Dependent item | aws.ec2.ebs.volume.consumedreadwriteops["{#VOLUMEID}"] Preprocessing
|
AWS EBS: [{#VOLUME_ID}]: Burst balance | Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1), and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket. Data is reported to CloudWatch only when the volume is active. If the volume is not attached, no data is reported. |
Dependent item | aws.ec2.ebs.volume.burstbalance["{#VOLUMEID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS EBS: Volume [{#VOLUME_ID}] has 'error' state | last(/AWS EC2 by HTTP/aws.ec2.ebs.status["{#VOLUME_ID}"])=5 |Warning |
|||
AWS EBS: Burst balance is too low | max(/AWS EC2 by HTTP/aws.ec2.ebs.volume.burst_balance["{#VOLUME_ID}"],5m)<{$AWS.EBS.BURST.CREDIT.BALANCE.MIN.WARN} |Warning |
The template to monitor AWS RDS instance by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about metrics and used API methods:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template get AWS RDS instance metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon RDS metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"rds:DescribeEvents",
"rds:DescribeDBInstances"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"rds:DescribeEvents",
"rds:DescribeDBInstances"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"rds:DescribeEvents",
"rds:DescribeDBInstances",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the macros: {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, {$AWS.RDS.INSTANCE.ID}
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about manage access keys, see official documentation
Also, see the Macros section for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | Amazon RDS Region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.RDS.INSTANCE.ID} | RDS DB Instance identifier. |
|
{$AWS.RDS.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.RDS.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.RDS.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.RDS.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.RDS.LLD.FILTER.EVENT_CATEGORY.MATCHES} | Filter of discoverable events by category. |
.* |
{$AWS.RDS.LLD.FILTER.EVENTCATEGORY.NOTMATCHES} | Filter to exclude discovered events by category. |
CHANGE_IF_NEEDED |
{$AWS.RDS.LLD.FILTER.EVENTSOURCETYPE.MATCHES} | Filter of discoverable events by source type. |
.* |
{$AWS.RDS.LLD.FILTER.EVENTSOURCETYPE.NOT_MATCHES} | Filter to exclude discovered events by source type. |
CHANGE_IF_NEEDED |
{$AWS.RDS.CPU.UTIL.WARN.MAX} | The warning threshold of the CPU utilization expressed in %. |
85 |
{$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN} | Minimum number of free earned CPU credits for trigger expression. |
50 |
{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN} | Minimum percentage of I/O credits remaining for trigger expression. |
20 |
{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN} | Minimum percentage of Byte credits remaining for trigger expression. |
20 |
{$AWS.RDS.BURST.CREDIT.BALANCE.MIN.WARN} | Minimum percentage of Byte credits remaining for trigger expression. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS RDS: Get metrics data | Get instance metrics. Full metrics list related to RDS: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-metrics.html Full metrics list related to Amazon Aurora: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances |
Script | aws.rds.get_metrics Preprocessing
|
AWS RDS: Get instance info | Get instance info. DescribeDBInstances API method: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeDBInstances.html |
Script | aws.rds.getinstanceinfo Preprocessing
|
AWS CloudWatch: Get instance alarms data | DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.rds.get_alarms Preprocessing
|
AWS RDS: Get instance events data | DescribeEvents API method: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeEvents.html |
Script | aws.rds.get_events Preprocessing
|
AWS RDS: Get metrics check | Data collection check. |
Dependent item | aws.rds.metrics.check Preprocessing
|
AWS RDS: Get instance info check | Data collection check. |
Dependent item | aws.rds.instance_info.check Preprocessing
|
AWS RDS: Get alarms check | Data collection check. |
Dependent item | aws.rds.alarms.check Preprocessing
|
AWS RDS: Get events check | Data collection check. |
Dependent item | aws.rds.events.check Preprocessing
|
AWS RDS: Class | Contains the name of the compute and memory capacity class of the DB instance. |
Dependent item | aws.rds.class Preprocessing
|
AWS RDS: Engine | Database engine. |
Dependent item | aws.rds.engine Preprocessing
|
AWS RDS: Engine version | Indicates the database engine version. |
Dependent item | aws.rds.engine.version Preprocessing
|
AWS RDS: Status | Specifies the current state of this database. All possible status values and their description: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/accessing-monitoring.html#Overview.DBInstance.Status |
Dependent item | aws.rds.status Preprocessing
|
AWS RDS: Storage type | Specifies the storage type associated with DB instance. |
Dependent item | aws.rds.storage_type Preprocessing
|
AWS RDS: Create time | Provides the date and time the DB instance was created. |
Dependent item | aws.rds.create_time Preprocessing
|
AWS RDS: Storage: Allocated | Specifies the allocated storage size specified in gibibytes (GiB). |
Dependent item | aws.rds.storage.allocated Preprocessing
|
AWS RDS: Storage: Max allocated | The upper limit in gibibytes (GiB) to which Amazon RDS can automatically scale the storage of the DB instance. If limit is not specified returns -1. |
Dependent item | aws.rds.storage.max_allocated Preprocessing
|
AWS RDS: Read replica: State | The status of a read replica. If the instance isn't a read replica, this is blank. Boolean value that is true if the instance is operating normally, or false if the instance is in an error state. |
Dependent item | aws.rds.readreplicastate Preprocessing
|
AWS RDS: Read replica: Status | The status of a read replica. If the instance isn't a read replica, this is blank. Status of the DB instance. For a StatusType of read replica, the values can be replicating, replication stop point set, replication stop point reached, error, stopped, or terminated. |
Dependent item | aws.rds.readreplicastatus Preprocessing
|
AWS RDS: Swap usage | The amount of swap space used. This metric is available for the Aurora PostgreSQL DB instance classes db.t3.medium, db.t3.large, db.r4.large, db.r4.xlarge, db.r5.large, db.r5.xlarge, db.r6g.large, and db.r6g.xlarge. For Aurora MySQL, this metric applies only to db.t* DB instance classes. This metric is not available for SQL Server. |
Dependent item | aws.rds.swap_usage Preprocessing
|
AWS RDS: Disk: Write IOPS | The number of write records generated per second. This is more or less the number of log records generated by the database. These do not correspond to 8K page writes, and do not correspond to network packets sent. |
Dependent item | aws.rds.write_iops.rate Preprocessing
|
AWS RDS: Disk: Write latency | The average amount of time taken per disk I/O operation. |
Dependent item | aws.rds.write_latency Preprocessing
|
AWS RDS: Disk: Write throughput | The average number of bytes written to persistent storage every second. |
Dependent item | aws.rds.write_throughput.rate Preprocessing
|
AWS RDS: Network: Receive throughput | The incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. |
Dependent item | aws.rds.networkreceivethroughput.rate Preprocessing
|
AWS RDS: Burst balance | The percent of General Purpose SSD (gp2) burst-bucket I/O credits available. |
Dependent item | aws.rds.burst_balance Preprocessing
|
AWS RDS: CPU: Utilization | The percentage of CPU utilization. |
Dependent item | aws.rds.cpu.utilization Preprocessing
|
AWS RDS: Credit CPU: Balance | The number of CPU credits that an instance has accumulated, reported at 5-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate. When an instance is running, credits in the CPUCreditBalance don't expire. When the instance stops, the CPUCreditBalance does not persist, and all accrued credits are lost. This metric applies only to db.t2.small and db.t2.medium instances for Aurora MySQL, and to db.t3 instances for Aurora PostgreSQL. |
Dependent item | aws.rds.cpu.credit_balance Preprocessing
|
AWS RDS: Credit CPU: Usage | The number of CPU credits consumed during the specified period, reported at 5-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance. This metric applies only to db.t2.small and db.t2.medium instances for Aurora MySQL, and to db.t3 instances for Aurora PostgreSQL |
Dependent item | aws.rds.cpu.credit_usage Preprocessing
|
AWS RDS: Connections | The number of client network connections to the database instance. The number of database sessions can be higher than the metric value because the metric value doesn't include the following: - Sessions that no longer have a network connection but which the database hasn't cleaned up - Sessions created by the database engine for its own purposes - Sessions created by the database engine's parallel execution capabilities - Sessions created by the database engine job scheduler - Amazon Aurora/RDS connections |
Dependent item | aws.rds.database_connections Preprocessing
|
AWS RDS: Disk: Queue depth | The number of outstanding read/write requests waiting to access the disk. |
Dependent item | aws.rds.diskqueuedepth Preprocessing
|
AWS RDS: EBS: Byte balance | The percentage of throughput credits remaining in the burst bucket of your RDS database. This metric is available for basic monitoring only. To find the instance sizes that support this metric, see the instance sizes with an asterisk (*) in the EBS optimized by default table (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current) in Amazon RDS User Guide for Linux Instances. |
Dependent item | aws.rds.ebsbytebalance Preprocessing
|
AWS RDS: EBS: IO balance | The percentage of I/O credits remaining in the burst bucket of your RDS database. This metric is available for basic monitoring only. To find the instance sizes that support this metric, see the instance sizes with an asterisk (*) in the EBS optimized by default table (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current) in Amazon RDS User Guide for Linux Instances. |
Dependent item | aws.rds.ebsiobalance Preprocessing
|
AWS RDS: Memory, freeable | The amount of available random access memory. For MariaDB, MySQL, Oracle, and PostgreSQL DB instances, this metric reports the value of the MemAvailable field of /proc/meminfo. |
Dependent item | aws.rds.freeable_memory Preprocessing
|
AWS RDS: Storage: Local free | The amount of local storage available, in bytes. Unlike for other DB engines, for Aurora DB instances this metric reports the amount of storage available to each DB instance. This value depends on the DB instance class. You can increase the amount of free storage space for an instance by choosing a larger DB instance class for your instance. (This doesn't apply to Aurora Serverless v2.) |
Dependent item | aws.rds.freelocalstorage Preprocessing
|
AWS RDS: Network: Receive throughput | The incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. For Amazon Aurora: The amount of network throughput received from the Aurora storage subsystem by each instance in the DB cluster. |
Dependent item | aws.rds.storagenetworkreceive_throughput Preprocessing
|
AWS RDS: Network: Transmit throughput | The outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. For Amazon Aurora: The amount of network throughput sent to the Aurora storage subsystem by each instance in the Aurora MySQL DB cluster. |
Dependent item | aws.rds.storagenetworktransmit_throughput Preprocessing
|
AWS RDS: Disk: Read IOPS | The average number of disk I/O operations per second. Aurora PostgreSQL-Compatible Edition reports read and write IOPS separately, in 1-minute intervals. |
Dependent item | aws.rds.read_iops.rate Preprocessing
|
AWS RDS: Disk: Read latency | The average amount of time taken per disk I/O operation. |
Dependent item | aws.rds.read_latency Preprocessing
|
AWS RDS: Disk: Read throughput | The average number of bytes read from disk per second. |
Dependent item | aws.rds.read_throughput.rate Preprocessing
|
AWS RDS: Network: Transmit throughput | The outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. |
Dependent item | aws.rds.networktransmitthroughput.rate Preprocessing
|
AWS RDS: Network: Throughput | The amount of network throughput both received from and transmitted to clients by each instance in the Aurora MySQL DB cluster, in bytes per second. This throughput doesn't include network traffic between instances in the DB cluster and the cluster volume. |
Dependent item | aws.rds.network_throughput.rate Preprocessing
|
AWS RDS: Storage: Space free | The amount of available storage space. |
Dependent item | aws.rds.freestoragespace Preprocessing
|
AWS RDS: Disk: Read IOPS, local storage | The average number of disk read I/O operations to local storage per second. Only applies to Multi-AZ DB clusters. |
Dependent item | aws.rds.readiopslocal_storage.rate Preprocessing
|
AWS RDS: Disk: Read latency, local storage | The average amount of time taken per disk I/O operation for local storage. Only applies to Multi-AZ DB clusters. |
Dependent item | aws.rds.readlatencylocal_storage Preprocessing
|
AWS RDS: Disk: Read throughput, local storage | The average number of bytes read from disk per second for local storage. Only applies to Multi-AZ DB clusters. |
Dependent item | aws.rds.readthroughputlocal_storage.rate Preprocessing
|
AWS RDS: Replication: Lag | The amount of time a read replica DB instance lags behind the source DB instance. Applies to MySQL, MariaDB, Oracle, PostgreSQL, and SQL Server read replicas. |
Dependent item | aws.rds.replica_lag Preprocessing
|
AWS RDS: Disk: Write IOPS, local storage | The average number of disk write I/O operations per second on local storage in a Multi-AZ DB cluster. |
Dependent item | aws.rds.writeiopslocal_storage.rate Preprocessing
|
AWS RDS: Disk: Write latency, local storage | The average amount of time taken per disk I/O operation on local storage in a Multi-AZ DB cluster. |
Dependent item | aws.rds.writelatencylocal_storage Preprocessing
|
AWS RDS: Disk: Write throughput, local storage | The average number of bytes written to disk per second for local storage. |
Dependent item | aws.rds.writethroughputlocal_storage.rate Preprocessing
|
AWS RDS: SQLServer: Failed agent jobs | The number of failed Microsoft SQL Server Agent jobs during the last minute. |
Dependent item | aws.rds.failedsqlserveragentjobs_count Preprocessing
|
AWS RDS: Disk: Binlog Usage | The amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas. |
Dependent item | aws.rds.binlogdisk_usage Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS RDS: Failed to get metrics data | Failed to get CloudWatch metrics for RDS. |
length(last(/AWS RDS instance by HTTP/aws.rds.metrics.check))>0 |Warning |
||
AWS RDS: Failed to get instance data | Failed to get CloudWatch instance info for RDS. |
length(last(/AWS RDS instance by HTTP/aws.rds.instance_info.check))>0 |Warning |
||
AWS RDS: Failed to get alarms data | Failed to get CloudWatch alarms for RDS. |
length(last(/AWS RDS instance by HTTP/aws.rds.alarms.check))>0 |Warning |
||
AWS RDS: Failed to get events data | Failed to get CloudWatch events for RDS. |
length(last(/AWS RDS instance by HTTP/aws.rds.events.check))>0 |Warning |
||
AWS RDS: Read replica in error state | The status of a read replica. |
last(/AWS RDS instance by HTTP/aws.rds.read_replica_state)=0 |Average |
||
AWS RDS: Burst balance is too low | max(/AWS RDS instance by HTTP/aws.rds.burst_balance,5m)<{$AWS.RDS.BURST.CREDIT.BALANCE.MIN.WARN} |Warning |
|||
AWS RDS: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS RDS instance by HTTP/aws.rds.cpu.utilization,15m)>{$AWS.RDS.CPU.UTIL.WARN.MAX} |Warning |
||
AWS RDS: Instance CPU Credit balance is too low | The number of earned CPU credits has been less than {$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN} in the last 5 minutes. |
max(/AWS RDS instance by HTTP/aws.rds.cpu.credit_balance,5m)<{$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN} |Warning |
||
AWS RDS: Byte Credit balance is too low | max(/AWS RDS instance by HTTP/aws.rds.ebs_byte_balance,5m)<{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN} |Warning |
|||
AWS RDS: I/O Credit balance is too low | max(/AWS RDS instance by HTTP/aws.rds.ebs_io_balance,5m)<{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Instance Alarms discovery | Discovery instance alarms. |
Dependent item | aws.rds.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS RDS Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.rds.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS RDS Alarms: [{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM). Alarm description: {#ALARMDESCRIPTION} |
Dependent item | aws.rds.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS RDS Alarms: [{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has 'Alarm' state. |
last(/AWS RDS instance by HTTP/aws.rds.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS RDS instance by HTTP/aws.rds.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS RDS Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS RDS instance by HTTP/aws.rds.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Aurora metrics discovery | Discovery Amazon Aurora metrics. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances |
Dependent item | aws.rds.aurora.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS RDS: Row lock time | The total time spent acquiring row locks for InnoDB tables. |
Dependent item | aws.rds.row_locktime[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Select throughput | The average number of select queries per second. |
Dependent item | aws.rds.select_throughput.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Select latency | The amount of latency for select queries. |
Dependent item | aws.rds.select_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: Replication: Lag, max | The maximum amount of lag between the primary instance and each Aurora DB instance in the DB cluster. |
Dependent item | aws.rds.aurorareplicalag.max[{#SINGLETON}] Preprocessing
|
AWS RDS: Replication: Lag, min | The minimum amount of lag between the primary instance and each Aurora DB instance in the DB cluster. |
Dependent item | aws.rds.aurorareplicalag.min[{#SINGLETON}] Preprocessing
|
AWS RDS: Replication: Lag | For an Aurora replica, the amount of lag when replicating updates from the primary instance. |
Dependent item | aws.rds.aurorareplicalag[{#SINGLETON}] Preprocessing
|
AWS RDS: Buffer Cache hit ratio | The percentage of requests that are served by the buffer cache. |
Dependent item | aws.rds.buffercachehit_ratio[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Commit latency | The amount of latency for commit operations. |
Dependent item | aws.rds.commit_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Commit throughput | The average number of commit operations per second. |
Dependent item | aws.rds.commit_throughput.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Deadlocks, rate | The average number of deadlocks in the database per second. |
Dependent item | aws.rds.deadlocks.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Engine uptime | The amount of time that the instance has been running. |
Dependent item | aws.rds.engine_uptime[{#SINGLETON}] Preprocessing
|
AWS RDS: Rollback segment history list length | The undo logs that record committed transactions with delete-marked records. These records are scheduled to be processed by the InnoDB purge operation. |
Dependent item | aws.rds.rollbacksegmenthistorylistlength[{#SINGLETON}] Preprocessing
|
AWS RDS: Network: Throughput | The amount of network throughput received from and sent to the Aurora storage subsystem by each instance in the Aurora MySQL DB cluster. |
Dependent item | aws.rds.storagenetworkthroughput[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Aurora MySQL metrics discovery | Discovery Aurora MySQL metrics. Storage types: aurora (for MySQL 5.6-compatible Aurora) aurora-mysql (for MySQL 5.7-compatible and MySQL 8.0-compatible Aurora) |
Dependent item | aws.rds.postgresql.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS RDS: Operations: Delete latency | The amount of latency for delete queries. |
Dependent item | aws.rds.delete_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Delete throughput | The average number of delete queries per second. |
Dependent item | aws.rds.delete_throughput.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: DML: Latency | The amount of latency for inserts, updates, and deletes. |
Dependent item | aws.rds.dml_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: DML: Throughput | The average number of inserts, updates, and deletes per second. |
Dependent item | aws.rds.dml_throughput.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: DDL: Latency | The amount of latency for data definition language (DDL) requests - for example, create, alter, and drop requests. |
Dependent item | aws.rds.ddl_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: DDL: Throughput | The average number of DDL requests per second. |
Dependent item | aws.rds.ddl_throughput.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Backtrack: Window, actual | The difference between the target backtrack window and the actual backtrack window. |
Dependent item | aws.rds.backtrackwindowactual[{#SINGLETON}] Preprocessing
|
AWS RDS: Backtrack: Window, alert | The number of times that the actual backtrack window is smaller than the target backtrack window for a given period of time. |
Dependent item | aws.rds.backtrackwindowalert[{#SINGLETON}] Preprocessing
|
AWS RDS: Transactions: Blocked, rate | The average number of transactions in the database that are blocked per second. |
Dependent item | aws.rds.blocked_transactions.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Replication: Binlog lag | The amount of time that a binary log replica DB cluster running on Aurora MySQL-Compatible Edition lags behind the binary log replication source. A lag means that the source is generating records faster than the replica can apply them. The metric value indicates the following: A high value: The replica is lagging the replication source. 0 or a value close to 0: The replica process is active and current. -1: Aurora can't determine the lag, which can happen during replica setup or when the replica is in an error state |
Dependent item | aws.rds.aurorareplicationbinlog_lag[{#SINGLETON}] Preprocessing
|
AWS RDS: Transactions: Active, rate | The average number of current transactions executing on an Aurora database instance per second. By default, Aurora doesn't enable this metric. To begin measuring this value, set innodbmonitorenable='all' in the DB parameter group for a specific DB instance. |
Dependent item | aws.rds.auroratransactionsactive.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Connections: Aborted | The number of client connections that have not been closed properly. |
Dependent item | aws.rds.auroraclientsaborted[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Insert latency | The amount of latency for insert queries, in milliseconds. |
Dependent item | aws.rds.insert_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Insert throughput | The average number of insert queries per second. |
Dependent item | aws.rds.insert_throughput.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Login failures, rate | The average number of failed login attempts per second. |
Dependent item | aws.rds.login_failures.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Queries, rate | The average number of queries executed per second. |
Dependent item | aws.rds.queries.rate[{#SINGLETON}] Preprocessing
|
AWS RDS: Resultset cache hit ratio | The percentage of requests that are served by the Resultset cache. |
Dependent item | aws.rds.resultsetcachehitratio[{#SINGLETON}] Preprocessing
|
AWS RDS: Binary log files, number | The number of binlog files generated. |
Dependent item | aws.rds.numbinarylog_files[{#SINGLETON}] Preprocessing
|
AWS RDS: Binary log files, size | The total size of the binlog files. |
Dependent item | aws.rds.sumbinarylog_files[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Update latency | The amount of latency for update queries. |
Dependent item | aws.rds.update_latency[{#SINGLETON}] Preprocessing
|
AWS RDS: Operations: Update throughput | The average number of update queries per second. |
Dependent item | aws.rds.update_throughput.rate[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Instance Events discovery | Discovery instance events. |
Dependent item | aws.rds.events.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS RDS Events: [{#EVENTCATEGORY}]: {#EVENTSOURCETYPE}/{#EVENTSOURCE_ID}: Message | Provides the text of this event. |
Dependent item | aws.rds.eventmessage["{#EVENTCATEGORY}/{#EVENTSOURCETYPE}/{#EVENTSOURCEID}"] Preprocessing
|
AWS RDS Events: [{#EVENTCATEGORY}]: {#EVENTSOURCETYPE}/{#EVENTSOURCE_ID} : Date | Provides the text of this event. |
Dependent item | aws.rds.eventdate["{#EVENTCATEGORY}/{#EVENTSOURCETYPE}/{#EVENTSOURCEID}"] Preprocessing
|
The template to monitor AWS S3 bucket by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about metrics and used API methods:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template gets AWS S3 metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon S3 metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"s3:GetMetricsConfiguration"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"s3:GetMetricsConfiguration"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"s3:GetMetricsConfiguration",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
To gather Request metrics, enable Requests metrics on your Amazon S3 buckets from the AWS console.
You can also define a filter for the Request metrics using a shared prefix, object tag, or access point.
Set the macros: {$AWS.AUTH_TYPE}
, {$AWS.S3.BUCKET.NAME}
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about manage access keys, see official documentation
Also, see the Macros section for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.REQUEST.REGION} | Region used in GET request |
us-east-1 |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.S3.BUCKET.NAME} | S3 bucket name. |
|
{$AWS.S3.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.S3.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.S3.LLD.FILTER.ID.NAME.MATCHES} | Filter of discoverable request metrics by filter ID name. |
.* |
{$AWS.S3.LLD.FILTER.ID.NAME.NOT_MATCHES} | Filter to exclude discovered request metrics by filter ID name. |
CHANGE_IF_NEEDED |
{$AWS.S3.UPDATE.INTERVAL} | Interval in seconds for getting request metrics. Used in the metric configuration and in the JavaScript API query. Must be between 1 and 86400 seconds. |
1800 |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS S3: Get metrics data | Get bucket metrics. Full metrics list related to S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html |
Script | aws.s3.get_metrics Preprocessing
|
AWS S3: Get alarms data | Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.s3.get_alarms Preprocessing
|
AWS S3: Get metrics check | Data collection check. |
Dependent item | aws.s3.metrics.check Preprocessing
|
AWS S3: Get alarms check | Data collection check. |
Dependent item | aws.s3.alarms.check Preprocessing
|
AWS S3: Bucket Size | This is a daily metric for the bucket. The amount of data in bytes stored in a bucket in the STANDARD storage class, INTELLIGENTTIERING storage class, Standard-Infrequent Access (STANDARDIA) storage class, OneZone-Infrequent Access (ONEZONE_IA), Reduced Redundancy Storage (RRS) class, S3 Glacier Instant Retrieval storage class, Deep Archive Storage (S3 Glacier Deep Archive) class, or S3 Glacier Flexible Retrieval (GLACIER) storage class. This value is calculated by summing the size of all objects and metadata in the bucket (both current and noncurrent objects), including the size of all parts for all incomplete multipart uploads to the bucket. |
Dependent item | aws.s3.bucketsizebytes Preprocessing
|
AWS S3: Number of objects | This is a daily metric for the bucket. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket. |
Dependent item | aws.s3.numberofobjects Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS S3: Failed to get metrics data | Failed to get CloudWatch metrics for S3 bucket. |
length(last(/AWS S3 bucket by HTTP/aws.s3.metrics.check))>0 |Warning |
||
AWS S3: Failed to get alarms data | Failed to get CloudWatch alarms for S3 bucket. |
length(last(/AWS S3 bucket by HTTP/aws.s3.alarms.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Bucket Alarms discovery | Discovery of bucket alarms. |
Dependent item | aws.s3.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS S3 Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.s3.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS S3 Alarms: [{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM). Alarm description: {#ALARMDESCRIPTION} |
Dependent item | aws.s3.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS S3 Alarms: [{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has 'Alarm' state. |
last(/AWS S3 bucket by HTTP/aws.s3.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS S3 bucket by HTTP/aws.s3.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS S3 Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS S3 bucket by HTTP/aws.s3.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Request Metrics discovery | Discovery of request metrics. |
Dependent item | aws.s3.configuration.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Get request metrics | Get bucket request metrics filter: '{#AWS.S3.FILTER.ID.NAME}'. Full metrics list related to S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html |
Script | aws.s3.get_metrics["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: All | The total number of HTTP requests made to an Amazon S3 bucket, regardless of type. If you're using a metrics configuration with a filter, then this metric only returns the HTTP requests that meet the filter's requirements. |
Dependent item | aws.s3.all_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Get | The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations. Paginated list-oriented requests, like List Multipart Uploads, List Parts, Get Bucket Object versions, and others, are not included in this metric. |
Dependent item | aws.s3.get_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Put | The number of HTTP PUT requests made for objects in an Amazon S3 bucket. |
Dependent item | aws.s3.put_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Delete | The number of HTTP DELETE requests made for objects in an Amazon S3 bucket. This also includes Delete Multiple Objects requests. This metric shows the number of requests, not the number of objects deleted. |
Dependent item | aws.s3.delete_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Head | The number of HTTP HEAD requests made to an Amazon S3 bucket. |
Dependent item | aws.s3.head_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Post | The number of HTTP POST requests made to an Amazon S3 bucket. Delete Multiple Objects and SELECT Object Content requests are not included in this metric. |
Dependent item | aws.s3.post_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select | The number of Amazon S3 SELECT Object Content requests made for objects in an Amazon S3 bucket. |
Dependent item | aws.s3.select_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select, bytes scanned | The number of bytes of data scanned with Amazon S3 SELECT Object Content requests in an Amazon S3 bucket. Statistic: Average (bytes per request). |
Dependent item | aws.s3.selectbytesscanned["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select, bytes returned | The number of bytes of data returned with Amazon S3 SELECT Object Content requests in an Amazon S3 bucket. Statistic: Average (bytes per request). |
Dependent item | aws.s3.selectbytesreturned["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: List | The number of HTTP requests that list the contents of a bucket. |
Dependent item | aws.s3.list_requests["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Bytes downloaded | The number of bytes downloaded for requests made to an Amazon S3 bucket, where the response includes a body. Statistic: Average (bytes per request). |
Dependent item | aws.s3.bytes_downloaded["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Bytes uploaded | The number of bytes uploaded that contain a request body, made to an Amazon S3 bucket. Statistic: Average (bytes per request). |
Dependent item | aws.s3.bytes_uploaded["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Errors, 4xx | The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. The average statistic shows the error rate, and the sum statistic shows the count of that type of error, during each period. Statistic: Average (reports per request). |
Dependent item | aws.s3.4xx_errors["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Errors, 5xx | The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. The average statistic shows the error rate, and the sum statistic shows the count of that type of error, during each period. Statistic: Average (reports per request). |
Dependent item | aws.s3.5xx_errors["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: First byte latency, avg | The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. Statistic: Average. |
Dependent item | aws.s3.firstbytelatency.avg["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: First byte latency, p90 | The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. Statistic: 90th percentile. |
Dependent item | aws.s3.firstbytelatency.p90["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Total request latency, avg | The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. Statistic: Average. |
Dependent item | aws.s3.totalrequestlatency.avg["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Total request latency, p90 | The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. Statistic: 90th percentile. |
Dependent item | aws.s3.totalrequestlatency.p90["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Latency | The maximum number of seconds by which the replication destination region is behind the source Region for a given replication rule. |
Dependent item | aws.s3.replication_latency["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Bytes pending | The total number of bytes of objects pending replication for a given replication rule. |
Dependent item | aws.s3.bytespendingreplication["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
AWS S3: Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Operations pending | The number of operations pending replication for a given replication rule. |
Dependent item | aws.s3.operationspendingreplication["{#AWS.S3.FILTER.ID.NAME}"] Preprocessing
|
The template to monitor AWS ECS Serverless Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about the metrics and used API methods:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the following macros {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, {$AWS.ECS.CLUSTER.NAME}
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see official documentation
Refer to the Macros section for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | Amazon ECS Region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.ECS.CLUSTER.NAME} | ECS cluster name. |
|
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.ECS.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES} | Filter of discoverable services by name. |
.* |
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES} | Filter to exclude discovered services by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} | The warning threshold of the cluster CPU utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} | The warning threshold of the cluster memory utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} | The warning threshold of the cluster service CPU utilization expressed in %. |
80 |
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} | The warning threshold of the cluster service memory utilization expressed in %. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster: Get cluster metrics | Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.get_metrics Preprocessing
|
AWS ECS Cluster: Get cluster services | Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.getclusterservices Preprocessing
|
AWS ECS Cluster: Get alarms data | Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.ecs.get_alarms Preprocessing
|
AWS ECS Cluster: Get metrics check | Data collection check. |
Dependent item | aws.ecs.metrics.check Preprocessing
|
AWS ECS Cluster: Get alarms check | Data collection check. |
Dependent item | aws.ecs.alarms.check Preprocessing
|
AWS ECS Cluster: Container Instance Count | The number of EC2 instances running the Amazon ECS agent that are registered with a cluster. |
Dependent item | aws.ecs.containerinstancecount Preprocessing
|
AWS ECS Cluster: Task Count | The number of tasks running in the cluster. |
Dependent item | aws.ecs.task_count Preprocessing
|
AWS ECS Cluster: Service Count | The number of services in the cluster. |
Dependent item | aws.ecs.service_count Preprocessing
|
AWS ECS Cluster: CPU Utilization | Cluster CPU utilization. |
Dependent item | aws.ecs.cpu_utilization Preprocessing
|
AWS ECS Cluster: Memory Utilization | The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.memory_utilization Preprocessing
|
AWS ECS Cluster: Network rx bytes | The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.network.rx Preprocessing
|
AWS ECS Cluster: Network tx bytes | The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.network.tx Preprocessing
|
AWS ECS Cluster: Ephemeral Storage Reserved | The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. |
Dependent item | aws.ecs.ephemeral.storage.reserved Preprocessing
|
AWS ECS Cluster: Ephemeral Storage Utilized | The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. |
Dependent item | aws.ecs.ephemeral.storage.utilized Preprocessing
|
AWS ECS Cluster: Ephemeral Storage Utilization | The calculated Disk Utilization. |
Dependent item | aws.ecs.disk.utilization Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster: Failed to get metrics data | Failed to get CloudWatch metrics for ECS Cluster. |
length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.metrics.check))>0 |Warning |
||
AWS ECS Cluster: Failed to get alarms data | Failed to get CloudWatch alarms for ECS Cluster. |
length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarms.check))>0 |Warning |
||
AWS ECS Cluster: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} |Warning |
||
AWS ECS Cluster: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Alarms discovery | Discovery instance alarms. |
Dependent item | aws.ecs.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: Get metrics | Get alarm metrics about the state and its reason. |
Dependent item | aws.ecs.alarm.getmetrics["{#ALARMNAME}"] Preprocessing
|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ecs.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM). Alarm description: {#ALARMDESCRIPTION} |
Dependent item | aws.ecs.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster Alarms: [{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has 'Alarm' state. |
last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS ECS Cluster Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Services discovery | Discovery {$AWS.ECS.CLUSTER.NAME} services. |
Dependent item | aws.ecs.services.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Running Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Pending Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Desired Task | The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Task Set | The number of task sets in the {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: CPU Reserved | A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition. |
Dependent item | aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: CPU Utilization | A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition. |
Dependent item | aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory utilized | The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory utilization | The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory reserved | The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Network rx bytes | The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Network tx bytes | The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Ephemeral storage reserved | The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. |
Dependent item | aws.ecs.services.ephemeral.storage.reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Ephemeral storage utilized | The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. |
Dependent item | aws.ecs.services.ephemeral.storage.utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Storage read bytes | The number of bytes read from storage in the resource that is specified by the dimensions that you're using. |
Dependent item | aws.ecs.services.storage.read.bytes["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Storage write bytes | The number of bytes written to storage in the resource that is specified by the dimensions that you're using. |
Dependent item | aws.ecs.services.storage.write.bytes["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Get metrics | Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html |
Script | aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} |Warning |
||
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} |Warning |
The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about the metrics and used API methods:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the following macros {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, {$AWS.ECS.CLUSTER.NAME}
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see official documentation
Refer to the Macros section for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | Amazon ECS Region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.ECS.CLUSTER.NAME} | ECS cluster name. |
|
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.ECS.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.ECS.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES} | Filter of discoverable services by name. |
.* |
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES} | Filter to exclude discovered services by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} | The warning threshold of the cluster CPU utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} | The warning threshold of the cluster memory utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} | The warning threshold of the cluster service CPU utilization expressed in %. |
80 |
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} | The warning threshold of the cluster service memory utilization expressed in %. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster: Get cluster metrics | Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.get_metrics Preprocessing
|
AWS ECS Cluster: Get cluster services | Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.getclusterservices Preprocessing
|
AWS ECS Cluster: Get alarms data | Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.ecs.get_alarms Preprocessing
|
AWS ECS Cluster: Get metrics check | Data collection check. |
Dependent item | aws.ecs.metrics.check Preprocessing
|
AWS ECS Cluster: Get alarms check | Data collection check. |
Dependent item | aws.ecs.alarms.check Preprocessing
|
AWS ECS Cluster: Container Instance Count | The number of EC2 instances running the Amazon ECS agent that are registered with a cluster. |
Dependent item | aws.ecs.containerinstancecount Preprocessing
|
AWS ECS Cluster: Task Count | The number of tasks running in the cluster. |
Dependent item | aws.ecs.task_count Preprocessing
|
AWS ECS Cluster: Service Count | The number of services in the cluster. |
Dependent item | aws.ecs.service_count Preprocessing
|
AWS ECS Cluster: CPU Reserved | A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition. |
Dependent item | aws.ecs.cpu_reserved Preprocessing
|
AWS ECS Cluster: CPU Utilization | Cluster CPU utilization |
Dependent item | aws.ecs.cpu_utilization Preprocessing
|
AWS ECS Cluster: Memory Utilization | The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.memory_utilization Preprocessing
|
AWS ECS Cluster: Network rx bytes | The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.network.rx Preprocessing
|
AWS ECS Cluster: Network tx bytes | The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.network.tx Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster: Failed to get metrics data | Failed to get CloudWatch metrics for ECS Cluster. |
length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0 |Warning |
||
AWS ECS Cluster: Failed to get alarms data | Failed to get CloudWatch alarms for ECS Cluster. |
length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0 |Warning |
||
AWS ECS Cluster: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} |Warning |
||
AWS ECS Cluster: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Alarms discovery | Discovery instance alarms. |
Dependent item | aws.ecs.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: Get metrics | Get alarm metrics about the state and its reason. |
Dependent item | aws.ecs.alarm.getmetrics["{#ALARMNAME}"] Preprocessing
|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ecs.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENTDATA), 2 (ALARM). Alarm description: {#ALARMDESCRIPTION} |
Dependent item | aws.ecs.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster Alarms: [{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has |
last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS ECS Cluster Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Services discovery | Discovery {$AWS.ECS.CLUSTER.NAME} services. |
Dependent item | aws.ecs.services.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Running Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Pending Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Desired Task | The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Task Set | The number of task sets in the {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: CPU Reserved | A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition. |
Dependent item | aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: CPU Utilization | A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition. |
Dependent item | aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory utilized | The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory utilization | The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory reserved | The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition. |
Dependent item | aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Network rx bytes | The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Network tx bytes | The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes. |
Dependent item | aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Get metrics | Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html |
Script | aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} |Warning |
||
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} |Warning |
Please scroll down for AWS ELB Network Load Balancer by HTTP.
The template is designed to monitor AWS ELB Application Load Balancer by HTTP via Zabbix, and it works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about metrics and API methods used in the template:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template gets AWS ELB Application Load Balancer metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the ELB policies page on the AWS website.
Add the following required permissions to your Zabbix IAM policy in order to collect AWS ELB Application Load Balancer metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"elasticloadbalancing:DescribeTargetGroups"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"elasticloadbalancing:DescribeTargetGroups"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"elasticloadbalancing:DescribeTargetGroups",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the macros: {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, and {$AWS.ELB.ARN}
.
If you are using access key-based authorization, set the macros {$AWS.ACCESS.KEY.ID}
and {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see official AWS documentation.
See the section below for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.DATA.TIMEOUT} | API response timeout. |
60s |
{$AWS.PROXY} | Sets the HTTP proxy value. If this macro is empty, no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | AWS Application Load Balancer region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.ELB.ARN} | Amazon Resource Names (ARN) of the load balancer. |
|
{$AWS.HTTP.4XX.FAIL.MAX.WARN} | Maximum number of HTTP request failures for a trigger expression. |
5 |
{$AWS.HTTP.5XX.FAIL.MAX.WARN} | Maximum number of HTTP request failures for a trigger expression. |
5 |
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.MATCHES} | Filter of discoverable target groups by name. |
.* |
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.NOT_MATCHES} | Filter to exclude discovered target groups by name. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.ELB.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ELB ALB: Get metrics data | Get ELB Application Load Balancer metrics. Full metrics list related to Application Load Balancer: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html |
Script | aws.elb.alb.get_metrics Preprocessing
|
AWS ELB ALB: Get target groups | Get ELB target group.
|
Script | aws.elb.alb.gettargetgroups Preprocessing
|
AWS CloudWatch: Get ELB ALB alarms data |
|
Script | aws.elb.alb.get_alarms Preprocessing
|
AWS ELB ALB: Get metrics check | Check that the Application Load Balancer metrics data has been received correctly. |
Dependent item | aws.elb.alb.metrics.check Preprocessing
|
AWS ELB ALB: Get alarms check | Check that the alarm data has been received correctly. |
Dependent item | aws.elb.alb.alarms.check Preprocessing
|
AWS ELB ALB: Active Connection Count | The total number of active concurrent TCP connections from clients to the load balancer and from the load balancer to targets. |
Dependent item | aws.elb.alb.activeconnectioncount Preprocessing
|
AWS ELB ALB: New Connection Count | The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets. |
Dependent item | aws.elb.alb.newconnectioncount Preprocessing
|
AWS ELB ALB: Rejected Connection Count | The number of connections that were rejected because the load balancer had reached its maximum number of connections. |
Dependent item | aws.elb.alb.rejectedconnectioncount Preprocessing
|
AWS ELB ALB: Requests Count | The number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target. Requests that are rejected before a target is chosen are not reflected in this metric. |
Dependent item | aws.elb.alb.requests_count Preprocessing
|
AWS ELB ALB: Target Response Time | The time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received. This is equivalent to the |
Dependent item | aws.elb.alb.targetresponsetime Preprocessing
|
AWS ELB ALB: HTTP Fixed Response Count | The number of fixed-response actions that were successful. |
Dependent item | aws.elb.alb.httpfixedresponse_count Preprocessing
|
AWS ELB ALB: Rule Evaluations | The number of rules processed by the load balancer given a request rate averaged over an hour. |
Dependent item | aws.elb.alb.rule_evaluations Preprocessing
|
AWS ELB ALB: Client TLS Negotiation Error Count | The number of TLS connections initiated by the client that did not establish a session with the load balancer due to a TLS error. Possible causes include a mismatch of ciphers or protocols or the client failing to verify the server certificate and closing the connection. |
Dependent item | aws.elb.alb.clienttlsnegotiationerrorcount Preprocessing
|
AWS ELB ALB: Target TLS Negotiation Error Count | The number of TLS connections initiated by the load balancer that did not establish a session with the target. Possible causes include a mismatch of ciphers or protocols. This metric does not apply if the target is a Lambda function. |
Dependent item | aws.elb.alb.targettlsnegotiationerrorcount Preprocessing
|
AWS ELB ALB: Target Connection Error Count | The number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function. |
Dependent item | aws.elb.alb.targetconnectionerror_count Preprocessing
|
AWS ELB ALB: Consumed LCUs | The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/ |
Dependent item | aws.elb.alb.capacity_units Preprocessing
|
AWS ELB ALB: Processed Bytes | The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload). This count includes traffic to and from clients and Lambda functions, and traffic from an Identity Provider (IdP) if user authentication is enabled. |
Dependent item | aws.elb.alb.processed_bytes Preprocessing
|
AWS ELB ALB: Desync Mitigation Mode Non Compliant Request Count | The number of requests that fail to comply with HTTP protocols. |
Dependent item | aws.elb.alb.noncompliantrequest_count Preprocessing
|
AWS ELB ALB: HTTP Redirect Count | The number of redirect actions that were successful. |
Dependent item | aws.elb.alb.httpredirectcount Preprocessing
|
AWS ELB ALB: HTTP Redirect Url Limit Exceeded Count | The number of redirect actions that could not be completed because the URL in the response location header is larger than 8K bytes. |
Dependent item | aws.elb.alb.httpredirecturllimitexceeded_count Preprocessing
|
AWS ELB ALB: ELB HTTP 3XX Count | The number of HTTP 3XX redirection codes that originate from the load balancer. This count does not include response codes generated by targets. |
Dependent item | aws.elb.alb.http3xxcount Preprocessing
|
AWS ELB ALB: ELB HTTP 4XX Count | The number of HTTP 4XX client error codes that originate from the load balancer. Client errors are generated when requests are malformed or incomplete. These requests were not received by the target, other than in the case where the load balancer returns an HTTP 460 error code. This count does not include any response codes generated by the targets. |
Dependent item | aws.elb.alb.http4xxcount Preprocessing
|
AWS ELB ALB: ELB HTTP 5XX Count | The number of HTTP 5XX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets. |
Dependent item | aws.elb.alb.http5xxcount Preprocessing
|
AWS ELB ALB: ELB HTTP 500 Count | The number of HTTP 500 error codes that originate from the load balancer. |
Dependent item | aws.elb.alb.http500count Preprocessing
|
AWS ELB ALB: ELB HTTP 502 Count | The number of HTTP 502 error codes that originate from the load balancer. |
Dependent item | aws.elb.alb.http502count Preprocessing
|
AWS ELB ALB: ELB HTTP 503 Count | The number of HTTP 503 error codes that originate from the load balancer. |
Dependent item | aws.elb.alb.http503count Preprocessing
|
AWS ELB ALB: ELB HTTP 504 Count | The number of HTTP 504 error codes that originate from the load balancer. |
Dependent item | aws.elb.alb.http504count Preprocessing
|
AWS ELB ALB: ELB Auth Error | The number of user authentications that could not be completed because an authenticate action was misconfigured, the load balancer could not establish a connection with the IdP, or the load balancer could not complete the authentication flow due to an internal error. |
Dependent item | aws.elb.alb.auth_error Preprocessing
|
AWS ELB ALB: ELB Auth Failure | The number of user authentications that could not be completed because the IdP denied access to the user or an authorization code was used more than once. |
Dependent item | aws.elb.alb.auth_failure Preprocessing
|
AWS ELB ALB: ELB Auth User Claims Size Exceeded | The number of times that a configured IdP returned user claims that exceeded 11K bytes in size. |
Dependent item | aws.elb.alb.authuserclaimssizeexceeded Preprocessing
|
AWS ELB ALB: ELB Auth Latency | The time elapsed, in milliseconds, to query the IdP for the ID token and user info. If one or more of these operations fail, this is the time to failure. |
Dependent item | aws.elb.alb.auth_latency Preprocessing
|
AWS ELB ALB: ELB Auth Success | The number of authenticate actions that were successful. This metric is incremented at the end of the authentication workflow, after the load balancer has retrieved the user claims from the IdP. |
Dependent item | aws.elb.alb.auth_success Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ELB ALB: Failed to get metrics data | Failed to get CloudWatch metrics for Application Load Balancer. |
length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.metrics.check))>0 |Warning |
||
AWS ELB ALB: Failed to get alarms data | Failed to get CloudWatch alarms for Application Load Balancer. |
length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarms.check))>0 |Warning |
||
AWS ELB ALB: Too many HTTP 4XX error codes | Too many requests failed with HTTP 4XX code. |
min(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.http_4xx_count,5m)>{$AWS.HTTP.4XX.FAIL.MAX.WARN} |Warning |
||
AWS ELB ALB: Too many HTTP 5XX error codes | Too many requests failed with HTTP 5XX code. |
min(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.http_5xx_count,5m)>{$AWS.HTTP.5XX.FAIL.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Load Balancer alarm discovery | Used for the discovery of alarm balancers. |
Dependent item | aws.elb.alb.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ELB ALB Alarms: [{#ALARM_NAME}]: Get metrics | Get metrics about the alarm state and its reason. |
Dependent item | aws.elb.alb.alarm.getmetrics["{#ALARMNAME}"] Preprocessing
|
AWS ELB ALB Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state reason in text format. Alarm description:
|
Dependent item | aws.elb.alb.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS ELB ALB Alarms: [{#ALARM_NAME}]: State | The value of the alarm state. Possible values: 0 - OK; 1 - INSUFFICIENT_DATA; 2 - ALARM. Alarm description:
|
Dependent item | aws.elb.alb.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ELB ALB Alarms: [{#ALARM_NAME}] has 'Alarm' state | The alarm |
last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS ELB ALB Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Target groups discovery | Used for the discovery of |
Dependent item | aws.elb.alb.target_groups.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Get metrics | Get the metrics of the ELB target group Full list of metrics related to AWS ELB here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html#user-authentication-metric-table |
Script | aws.elb.alb.targetgroups.getmetrics["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 2XX Count | The number of HTTP response 2XX codes generated by the targets. This does not include any response codes generated by the load balancer. |
Dependent item | aws.elb.alb.targetgroups.http2xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 3XX Count | The number of HTTP response 3XX codes generated by the targets. This does not include any response codes generated by the load balancer. |
Dependent item | aws.elb.alb.targetgroups.http3xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 4XX Count | The number of HTTP response 4XX codes generated by the targets. This does not include any response codes generated by the load balancer. |
Dependent item | aws.elb.alb.targetgroups.http4xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 5XX Count | The number of HTTP response 5XX codes generated by the targets. This does not include any response codes generated by the load balancer. |
Dependent item | aws.elb.alb.targetgroups.http5xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy Host Count | The number of targets that are considered healthy. |
Dependent item | aws.elb.alb.targetgroups.healthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Host Count | The number of targets that are considered unhealthy. |
Dependent item | aws.elb.alb.targetgroups.unhealthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy State Routing | The number of zones that meet the routing healthy state requirements. |
Dependent item | aws.elb.alb.targetgroups.healthystate_routing["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy State Routing | The number of zones that do not meet the routing healthy state requirements, and therefore the load balancer distributes traffic to all targets in the zone, including the unhealthy targets. |
Dependent item | aws.elb.alb.targetgroups.unhealthystate_routing["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Request Count Per Target | The average request count per target, in a target group. You must specify the target group using the TargetGroup dimension. |
Dependent item | aws.elb.alb.target_groups.request["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Routing Request Count | The average request count per target, in a target group. |
Dependent item | aws.elb.alb.targetgroups.unhealthyroutingrequestcount["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Mitigated Host Count | The number of targets under mitigation. |
Dependent item | aws.elb.alb.targetgroups.mitigatedhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Anomalous Host Count | The number of hosts detected with anomalies. |
Dependent item | aws.elb.alb.targetgroups.anomaloushost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy State DNS | The number of zones that meet the DNS healthy state requirements. |
Dependent item | aws.elb.alb.targetgroups.healthystate_dns["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB ALB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy State DNS | The number of zones that do not meet the DNS healthy state requirements and therefore were marked unhealthy in DNS. |
Dependent item | aws.elb.alb.targetgroups.unhealthystate_dns["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
The template is designed to monitor AWS ELB Network Load Balancer by HTTP via Zabbix, and it works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about metrics and API methods used in the template:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template gets AWS ELB Network Load Balancer metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the ELB policies page on the AWS website.
Add the following required permissions to your Zabbix IAM policy in order to collect AWS ELB Network Load Balancer metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"elasticloadbalancing:DescribeTargetGroups"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"elasticloadbalancing:DescribeTargetGroups"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"elasticloadbalancing:DescribeTargetGroups",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the macros: {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, and {$AWS.ELB.ARN}
.
If you are using access key-based authorization, set the macros {$AWS.ACCESS.KEY.ID}
and {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see official AWS documentation.
See the section below for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.DATA.TIMEOUT} | API response timeout. |
60s |
{$AWS.PROXY} | Sets the HTTP proxy value. If this macro is empty, no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | AWS Network Load Balancer region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.ELB.ARN} | Amazon Resource Names (ARN) of the load balancer. |
|
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.MATCHES} | Filter of discoverable target groups by name. |
.* |
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.NOT_MATCHES} | Filter to exclude discovered target groups by name. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.ELB.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.ELB.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.ELB.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.ELB.UNHEALTHY.HOST.MAX} | Maximum number of unhealthy hosts for a trigger expression. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ELB NLB: Get metrics data | Get ELB Network Load Balancer metrics. Full metrics list related to Network Load Balancer: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html |
Script | aws.elb.nlb.get_metrics Preprocessing
|
AWS ELB NLB: Get target groups | Get ELB target group.
|
Script | aws.elb.nlb.gettargetgroups Preprocessing
|
AWS CloudWatch: Get ELB NLB alarms data |
|
Script | aws.elb.nlb.get_alarms Preprocessing
|
AWS ELB NLB: Get metrics check | Check that the Network Load Balancer metrics data has been received correctly. |
Dependent item | aws.elb.nlb.metrics.check Preprocessing
|
AWS ELB NLB: Get alarms check | Check that the alarm data has been received correctly. |
Dependent item | aws.elb.nlb.alarms.check Preprocessing
|
AWS ELB NLB: Active Flow Count | The total number of concurrent flows (or connections) from clients to targets. This metric includes connections in the TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow. |
Dependent item | aws.elb.nlb.activeflowcount Preprocessing
|
AWS ELB NLB: Active Flow Count TCP | The total number of concurrent TCP flows (or connections) from clients to targets. This metric includes connections in the TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow. |
Dependent item | aws.elb.nlb.activeflowcount_tcp Preprocessing
|
AWS ELB NLB: Active Flow Count TLS | The total number of concurrent TLS flows (or connections) from clients to targets. This metric includes connections in the |
Dependent item | aws.elb.nlb.activeflowcount_tls Preprocessing
|
AWS ELB NLB: Active Flow Count UDP | The total number of concurrent UDP flows (or connections) from clients to targets. |
Dependent item | aws.elb.nlb.activeflowcount_udp Preprocessing
|
AWS ELB NLB: Client TLS Negotiation Error Count | The total number of TLS handshakes that failed during negotiation between a client and a TLS listener. |
Dependent item | aws.elb.nlb.clienttlsnegotiationerrorcount Preprocessing
|
AWS ELB NLB: Consumed LCUs | The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/ |
Dependent item | aws.elb.nlb.capacity_units Preprocessing
|
AWS ELB NLB: Consumed LCUs TCP | The number of load balancer capacity units (LCU) used by your load balancer for TCP. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/ |
Dependent item | aws.elb.nlb.capacityunitstcp Preprocessing
|
AWS ELB NLB: Consumed LCUs TLS | The number of load balancer capacity units (LCU) used by your load balancer for TLS. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/ |
Dependent item | aws.elb.nlb.capacityunitstls Preprocessing
|
AWS ELB NLB: Consumed LCUs UDP | The number of load balancer capacity units (LCU) used by your load balancer for UDP. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/ |
Dependent item | aws.elb.nlb.capacityunitsudp Preprocessing
|
AWS ELB NLB: New Flow Count | The total number of new flows (or connections) established from clients to targets in the specified time period. |
Dependent item | aws.elb.nlb.newflowcount Preprocessing
|
AWS ELB NLB: New Flow Count TCP | The total number of new TCP flows (or connections) established from clients to targets in the specified time period. |
Dependent item | aws.elb.nlb.newflowcount_tcp Preprocessing
|
AWS ELB NLB: New Flow Count TLS | The total number of new TLS flows (or connections) established from clients to targets in the specified time period. |
Dependent item | aws.elb.nlb.newflowcount_tls Preprocessing
|
AWS ELB NLB: New Flow Count UDP | The total number of new UDP flows (or connections) established from clients to targets in the specified time period. |
Dependent item | aws.elb.nlb.newflowcount_udp Preprocessing
|
AWS ELB NLB: Peak Packets per second | Highest average packet rate (packets processed per second), calculated every 10 seconds during the sampling window. This metric includes health check traffic. |
Dependent item | aws.elb.nlb.peak_packets.rate Preprocessing
|
AWS ELB NLB: Port Allocation Error Count | The total number of ephemeral port allocation errors during a client IP translation operation. A non-zero value indicates dropped client connections. Note: Network Load Balancers support 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port) when performing client address translation. To fix port allocation errors, add more targets to the target group. |
Dependent item | aws.elb.nlb.portallocationerror_count Preprocessing
|
AWS ELB NLB: Processed Bytes | The total number of bytes processed by the load balancer, including TCP/IP headers. This count includes traffic to and from targets, minus health check traffic. |
Dependent item | aws.elb.nlb.processed_bytes Preprocessing
|
AWS ELB NLB: Processed Bytes TCP | The total number of bytes processed by TCP listeners. |
Dependent item | aws.elb.nlb.processedbytestcp Preprocessing
|
AWS ELB NLB: Processed Bytes TLS | The total number of bytes processed by TLS listeners. |
Dependent item | aws.elb.nlb.processedbytestls Preprocessing
|
AWS ELB NLB: Processed Bytes UDP | The total number of bytes processed by UDP listeners. |
Dependent item | aws.elb.nlb.processedbytesudp Preprocessing
|
AWS ELB NLB: Processed Packets | The total number of packets processed by the load balancer. This count includes traffic to and from targets, including health check traffic. |
Dependent item | aws.elb.nlb.processed_packets Preprocessing
|
AWS ELB NLB: Security Group Blocked Flow Count Inbound ICMP | The number of new ICMP messages rejected by the inbound rules of the load balancer security groups. |
Dependent item | aws.elb.nlb.sgblockedinbound_icmp Preprocessing
|
AWS ELB NLB: Security Group Blocked Flow Count Inbound TCP | The number of new TCP flows rejected by the inbound rules of the load balancer security groups. |
Dependent item | aws.elb.nlb.sgblockedinbound_tcp Preprocessing
|
AWS ELB NLB: Security Group Blocked Flow Count Inbound UDP | The number of new UDP flows rejected by the inbound rules of the load balancer security groups. |
Dependent item | aws.elb.nlb.sgblockedinbound_udp Preprocessing
|
AWS ELB NLB: Security Group Blocked Flow Count Outbound ICMP | The number of new ICMP messages rejected by the outbound rules of the load balancer security groups. |
Dependent item | aws.elb.nlb.sgblockedoutbound_icmp Preprocessing
|
AWS ELB NLB: Security Group Blocked Flow Count Outbound TCP | The number of new TCP flows rejected by the outbound rules of the load balancer security groups. |
Dependent item | aws.elb.nlb.sgblockedoutbound_tcp Preprocessing
|
AWS ELB NLB: Security Group Blocked Flow Count Outbound UDP | The number of new UDP flows rejected by the outbound rules of the load balancer security groups. |
Dependent item | aws.elb.nlb.sgblockedoutbound_udp Preprocessing
|
AWS ELB NLB: Target TLS Negotiation Error Count | The total number of TLS handshakes that failed during negotiation between a TLS listener and a target. |
Dependent item | aws.elb.nlb.targettlsnegotiationerrorcount Preprocessing
|
AWS ELB NLB: TCP Client Reset Count | The total number of reset (RST) packets sent from a client to a target. These resets are generated by the client and forwarded by the load balancer. |
Dependent item | aws.elb.nlb.tcpclientreset_count Preprocessing
|
AWS ELB NLB: TCP ELB Reset Count | The total number of reset (RST) packets generated by the load balancer. For more information, see: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#elb-reset-count-metric |
Dependent item | aws.elb.nlb.tcpelbreset_count Preprocessing
|
AWS ELB NLB: TCP Target Reset Count | The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer. |
Dependent item | aws.elb.nlb.tcptargetreset_count Preprocessing
|
AWS ELB NLB: Unhealthy Routing Flow Count | The number of flows (or connections) that are routed using the routing failover action (fail open). |
Dependent item | aws.elb.nlb.unhealthyroutingflow_count Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ELB NLB: Failed to get metrics data | Failed to get CloudWatch metrics for Network Load Balancer. |
length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.metrics.check))>0 |Warning |
||
AWS ELB NLB: Failed to get alarms data | Failed to get CloudWatch alarms for Network Load Balancer. |
length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarms.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Load Balancer alarm discovery | Used for the discovery of alarm balancers. |
Dependent item | aws.elb.nlb.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ELB NLB Alarms: [{#ALARM_NAME}]: Get metrics | Get metrics about the alarm state and its reason. |
Dependent item | aws.elb.nlb.alarm.getmetrics["{#ALARMNAME}"] Preprocessing
|
AWS ELB NLB Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state reason in text format. Alarm description:
|
Dependent item | aws.elb.nlb.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS ELB NLB Alarms: [{#ALARM_NAME}]: State | The value of the alarm state. Possible values: 0 - OK; 1 - INSUFFICIENT_DATA; 2 - ALARM. Alarm description:
|
Dependent item | aws.elb.nlb.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ELB NLB Alarms: [{#ALARM_NAME}] has 'Alarm' state | The alarm |
last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS ELB NLB Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state["{#ALARM_NAME}"])=1 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Target groups discovery | Used for the discovery of |
Dependent item | aws.elb.nlb.target_groups.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ELB NLB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Get metrics | Get the metrics of the ELB target group Full list of metrics related to AWS ELB here: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html#user-authentication-metric-table |
Script | aws.elb.nlb.targetgroups.getmetrics["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB NLB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy Host Count | The number of targets that are considered healthy. |
Dependent item | aws.elb.nlb.targetgroups.healthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
AWS ELB NLB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Host Count | The number of targets that are considered unhealthy. |
Dependent item | aws.elb.nlb.targetgroups.unhealthyhost_count["{#AWS.ELB.TARGET.GROUP.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ELB NLB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have become unhealthy | This trigger helps in identifying when your targets have become unhealthy. |
last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.target_groups.healthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]) = 0 |Average |
||
AWS ELB NLB Target Groups: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have unhealthy host | This trigger allows you to become aware when there are no more registered targets. |
last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.target_groups.unhealthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]) > {$AWS.ELB.UNHEALTHY.HOST.MAX} |Warning |
Depends on:
|
This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about metrics and API methods used in the template:
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
The template gets AWS Lambda metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the Lambda permissions page on the AWS website.
Add the following required permissions to your Zabbix IAM policy in order to collect AWS Lambda metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the macros: {$AWS.AUTH_TYPE}
, {$AWS.REGION}
, and {$AWS.LAMBDA.ARN}
.
If you are using access key-based authorization, set the macros {$AWS.ACCESS.KEY.ID}
and {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see the official AWS documentation.
See the section below for a list of macros used for LLD filters.
Name | Description | Default |
---|---|---|
{$AWS.DATA.TIMEOUT} | API response timeout. |
60s |
{$AWS.PROXY} | Sets the HTTP proxy value. If this macro is empty, no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | AWS Lambda function region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.LAMBDA.ARN} | The Amazon Resource Names (ARN) of the Lambda function. |
|
{$AWS.LAMBDA.LLD.FILTER.ALARMSERVICENAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.LAMBDA.LLD.FILTER.ALARMSERVICENAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.LAMBDA.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.LAMBDA.LLD.FILTER.ALARMNAME.NOTMATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS Lambda: Get metrics data | Get Lambda function metrics. Full metrics list related to the Lambda function: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html |
Script | aws.lambda.get_metrics Preprocessing
|
AWS CloudWatch: Get Lambda alarms data |
|
Script | aws.lambda.get_alarms Preprocessing
|
AWS Lambda: Get metrics check | Check that the Lambda function metrics data has been received correctly. |
Dependent item | aws.lambda.metrics.check Preprocessing
|
AWS Lambda: Get alarms check | Check that the alarm data has been received correctly. |
Dependent item | aws.lambda.alarms.check Preprocessing
|
AWS Lambda: Async events received sum | The number of events that Lambda successfully queues for processing. This metric provides insight into the number of events that a Lambda function receives. |
Dependent item | aws.lambda.asynceventsreceived.sum Preprocessing
|
AWS Lambda: Async event age average | The time between when Lambda successfully queues the event and when the function is invoked. The value of this metric increases when events are being retried due to invocation failures or throttling. |
Dependent item | aws.lambda.asynceventage.avg Preprocessing
|
AWS Lambda: Async events dropped sum | The number of events that are dropped without successfully executing the function. If you configure a dead-letter queue (DLQ) or an |
Dependent item | aws.lambda.asynceventsdropped.sum Preprocessing
|
AWS Lambda: Total concurrent executions | The number of function instances that are processing events. If this number reaches your concurrent executions quota for the Region or the reserved concurrency limit on the function, then Lambda will throttle additional invocation requests. |
Dependent item | aws.lambda.concurrent_executions.max Preprocessing
|
AWS Lambda: Unreserved concurrent executions maximum | For a Region, the number of events that function without reserved concurrency are processing. |
Dependent item | aws.lambda.unreservedconcurrentexecutions.max Preprocessing
|
AWS Lambda: Invocations sum | The number of times that your function code is invoked, including successful invocations and invocations that result in a function error. Invocations aren't recorded if the invocation request is throttled or otherwise results in an invocation error. The value of |
Dependent item | aws.lambda.invocations.sum Preprocessing
|
AWS Lambda: Errors sum | The number of invocations that result in a function error. Function errors include exceptions that your code throws and exceptions that the Lambda runtime throws. The runtime returns errors for issues such as timeouts and configuration errors. |
Dependent item | aws.lambda.errors.sum Preprocessing
|
AWS Lambda: Dead letter errors sum | For asynchronous invocation, the number of times that Lambda attempts to send an event to a dead-letter queue (DLQ) but fails. Dead-letter errors can occur due to misconfigured resources or size limits. |
Dependent item | aws.lambda.deadlettererrors.sum Preprocessing
|
AWS Lambda: Throttles sum | The number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a |
Dependent item | aws.lambda.throttles.sum Preprocessing
|
AWS Lambda: Duration average | The amount of time that your function code spends processing an event. The billed duration for an invocation is the value of |
Dependent item | aws.lambda.duration.avg Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS Lambda: Failed to get metrics data | Failed to get CloudWatch metrics for the Lambda function. |
length(last(/AWS Lambda by HTTP/aws.lambda.metrics.check))>0 |Warning |
||
AWS Lambda: Failed to get alarms data | Failed to get CloudWatch alarms for the Lambda function. |
length(last(/AWS Lambda by HTTP/aws.lambda.alarms.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Lambda alarm discovery | Used for the discovery of alarm Lambda functions. |
Dependent item | aws.lambda.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS Lambda Alarms: [{#ALARM_NAME}]: Get metrics | Get metrics about the alarm state and its reason. |
Dependent item | aws.lambda.alarm.getmetrics["{#ALARMNAME}"] Preprocessing
|
AWS Lambda Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state reason in text format. Alarm description:
|
Dependent item | aws.lambda.alarm.statereason["{#ALARMNAME}"] Preprocessing
|
AWS Lambda Alarms: [{#ALARM_NAME}]: State | The value of the alarm state. Possible values: 0 - OK; 1 - INSUFFICIENT_DATA; 2 - ALARM. Alarm description:
|
Dependent item | aws.lambda.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS Lambda Alarms: [{#ALARM_NAME}] has 'Alarm' state | The alarm |
last(/AWS Lambda by HTTP/aws.lambda.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS Lambda by HTTP/aws.lambda.alarm.state_reason["{#ALARM_NAME}"]))>0 |Average |
||
AWS Lambda Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state. |
last(/AWS Lambda by HTTP/aws.lambda.alarm.state["{#ALARM_NAME}"])=1 |Info |
The template to monitor AWS Cost Explorer by HTTP via Zabbix, which works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the Cost Explorer API calls to list and retrieve metrics. For more information, please refer to the Cost Explorer pricing page.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect metrics.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ce:GetDimensionValues",
"ce:GetCostAndUsage"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
For using assume role authorization, add the appropriate permissions to the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{Account}:user/{UserName}"
},
{
"Effect": "Allow",
"Action": [
"ce:GetDimensionValues",
"ce:GetCostAndUsage"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{Account}:user/{UserName}"
},
"Action": "sts:AssumeRole"
}
]
}
If you are using role-based authorization, add the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Effect": "Allow",
"Action": [
"ce:GetDimensionValues",
"ce:GetCostAndUsage",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Next, add a principal to the trust relationships of the role you are using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
}
Note, Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.
Set the macros: {$AWS.AUTH_TYPE}
. Possible values: access_key
, assume_role
, role_base
.
If you are using access key-based authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
.
If you are using access assume role authorization, set the following macros: {$AWS.ACCESS.KEY.ID}
, {$AWS.SECRET.ACCESS.KEY}
, {$AWS.STS.REGION}
, {$AWS.ASSUME.ROLE.ARN}
.
For more information about managing access keys, see the official documentation.
Also, see the Macros section for a list of macros used in LLD filters.
Additional information about metrics and used API methods:
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.BILLING.REGION} | Amazon Billing region code. |
us-east-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: |
access_key |
{$AWS.STS.REGION} | Region used in assume role request. |
us-east-1 |
{$AWS.ASSUME.ROLE.ARN} | ARN assume role; add when using the |
|
{$AWS.BILLING.MONTH} | Months to get historical data from AWS Cost Explore API, no more than 12 months. |
11 |
{$AWS.BILLING.LLD.FILTER.SERVICE.MATCHES} | Filter of discoverable discovered billing service by name. |
.* |
{$AWS.BILLING.LLD.FILTER.SERVICE.NOT_MATCHES} | Filter to exclude discovered billing service by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS Cost: Get monthly costs | Get raw data on the monthly costs by service. |
Script | aws.get.monthly.costs Preprocessing
|
AWS Cost: Get daily costs | Get raw data on the daily costs by service. |
Script | aws.get.daily.costs Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS daily costs by services discovery | Discovery of daily blended costs by services. |
Dependent item | aws.daily.services.costs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS Cost: Service [{#AWS.BILLING.SERVICE.NAME}]: Blended daily cost | The daily blended cost of the {#AWS.BILLING.SERVICE.NAME} service for the previous day. |
Dependent item | aws.daily.service.cost["{#AWS.BILLING.SERVICE.NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS monthly costs by services discovery | Discovery of monthly costs by services. |
Dependent item | aws.cost.service.monthly.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS Cost: [{#AWS.BILLING.SERVICE.NAME}]: Month [{#AWS.BILLING.MONTH}] Blended cost | The monthly cost by service {#AWS.BILLING.SERVICE.NAME}. |
Dependent item | aws.monthly.service.cost["{#AWS.BILLING.SERVICE.NAME}", "{#AWS.BILLING.MONTH}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS monthly costs discovery | Discovery of monthly costs. |
Dependent item | aws.monthly.cost.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS Cost: [{#AWS.BILLING.MONTH}]: Blended cost per month | The blended cost by month {#AWS.BILLING.MONTH}. |
Dependent item | aws.monthly.cost["{#AWS.BILLING.MONTH}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums