Monitoring NetApp Volumes over HTTP
Hello everyone. In continuation of the last article related to crutches and SSH for monitoring the location and performance metrics of volumes available to us on NetApp, I want to share and describe a more correct way to monitor through ONTAP REST API using Zabbix HTTP-agent… Since we only rent space, the only thing that we can get useful for ourselves is performance metrics, space utilization and volume statuses for different indications.
General information and creation of master elements
We need to create several master elements that we will parse and use to low-level discovery… There are two master elements from which we will receive the information we need:
NetApp: Get cluster information – data is collected through the API by reference {$ONTAP.SCHEME}://{HOST.CONN}/api/cluster
where:
{$ONTAP.SCHEME}
– HTTPS;{HOST.CONN}
– IP NetApp;{$ONTAP.USERNAME}
– user from NetApp;{$ONTAP.PASSWORD}
– password from NetApp.
{
"name": "NETAPP_NAME",
"uuid": "e58448eb-39ac-11e8-ba6a-00a098d57984",
"version": {
"full": "NetApp Release 9.6P10: Thu Aug 20 19:45:05 UTC 2020",
"generation": 9,
"major": 6,
"minor": 0
},
"_links": {
"self": {
"href": "/api/cluster"
}
}
}
NetApp: Get volumes – data is collected through the API by reference {$ONTAP.SCHEME}://{HOST.CONN}/api/storage/volumes
, there are a lot of volumes (124), so I cut them down to four for clarity.
{
"records": [
{
"uuid": "04dd9e6a-d04f-49d8-8999-97572ed9183c",
"name": "data_1",
"_links": {
"self": {
"href": "/api/storage/volumes/04dd9e6a-d04f-49d8-8999-97572ed9183c"
}
}
},
{
"uuid": "0638aa9e-d683-43a7-bb75-7f056876a6cb",
"name": "data_2",
"_links": {
"self": {
"href": "/api/storage/volumes/0638aa9e-d683-43a7-bb75-7f056876a6cb"
}
}
},
{
"uuid": "0672d0de-e3b0-47e5-9d4a-4e3ae1d34e51",
"name": "data_3",
"_links": {
"self": {
"href": "/api/storage/volumes/0672d0de-e3b0-47e5-9d4a-4e3ae1d34e51"
}
}
},
{
"uuid": "06b8a873-6d85-49f2-bf83-55daf31b26e7",
"name": "data_4",
"_links": {
"self": {
"href": "/api/storage/volumes/06b8a873-6d85-49f2-bf83-55daf31b26e7"
}
}
}
],
"num_records": 124,
"_links": {
"self": {
"href": "/api/storage/volumes"
}
}
}
Items and low-level discovery
NetApp: Get cluster information – from here we only need a version to keep track of firmware updates.
NetApp: Get volumes – based on this information, we need to make a low-level discovery and create two macros:
{#VOLUME_NAME}
– volume name;{#VOLUME_UUID}
– Volume UUID, will be used to collect statistics for each volume.
To automatically create data items, you need to do two master item prototype…
NetApp: Get volume {#VOLUME_NAME} information – we get the information we need about the volume through the API by the link {$ONTAP.SCHEME}://{HOST.CONN}/api/storage/volumes/{#VOLUME_UUID}
…
NetApp: Get volume {#VOLUME_NAME} information
{
"uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
"comment": "",
"create_time": "2018-10-10T15:44:42+03:00",
"language": "c.utf_8",
"name": "data_1",
"size": 265106042880,
"state": "online",
"style": "flexvol",
"tiering": {
"policy": "none"
},
"type": "rw",
"aggregates": [
{
"name": "DC_AFF300_03",
"uuid": "a9e28005-e099-4fcc-bcaa-6781c0086e0b"
}
],
"clone": {
"is_flexclone": false
},
"nas": {
"export_policy": {
"name": "default"
}
},
"snapshot_policy": {
"name": "DOMCLICK_daily"
},
"svm": {
"name": "DOMCLIC_SVM",
"uuid": "46a00e5d-c22d-11e8-b6ed-00a098d48e6d"
},
"space": {
"size": 265106042880,
"available": 62382796800,
"used": 189467947008
},
"metric": {
"timestamp": "2021-02-16T12:42:00Z",
"duration": "PT15S",
"status": "ok",
"latency": {
"other": 175,
"total": 183,
"read": 515,
"write": 323
},
"iops": {
"read": 41,
"write": 15,
"other": 2006,
"total": 2063
},
"throughput": {
"read": 374272,
"write": 78948,
"other": 0,
"total": 453220
}
},
"_links": {
"self": {
"href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
}
}
}
NetApp: Get volume {#VOLUME_NAME} inode information – we get information about the use of inode in each volume through the API link {$ONTAP.SCHEME}://{HOST.CONN}/api/storage/volumes/{#VOLUME_UUID}?fields=files
…
{
"uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
"name": "data_1",
"files": {
"maximum": 7685871,
"used": 4108317
},
"_links": {
"self": {
"href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
}
}
}
Dependent items
Above, we received several master elements with JSON inside, now they can be easily parsed using JSONPath and pulled out the metrics we need. I will analyze one example and then simply describe what metrics we collect.
NetApp: Volume state – volume state, parse the master element NetApp: Get volume {#VOLUME_NAME} information…
NetApp: Get volume data_1 information
{
"uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
"comment": "",
"create_time": "2018-10-10T15:44:42+03:00",
"language": "c.utf_8",
"name": "data_1",
"size": 265106042880,
"state": "online",
"style": "flexvol",
"tiering": {
"policy": "none"
},
"type": "rw",
"aggregates": [
{
"name": "DC_AFF300_03",
"uuid": "a9e28005-e099-4fcc-bcaa-6781c0086e0b"
}
],
"clone": {
"is_flexclone": false
},
"nas": {
"export_policy": {
"name": "default"
}
},
"snapshot_policy": {
"name": "DOMCLICK_daily"
},
"svm": {
"name": "DOMCLIC_SVM",
"uuid": "46a00e5d-c22d-11e8-b6ed-00a098d48e6d"
},
"space": {
"size": 265106042880,
"available": 62382796800,
"used": 189467947008
},
"metric": {
"timestamp": "2021-02-16T12:42:00Z",
"duration": "PT15S",
"status": "ok",
"latency": {
"other": 175,
"total": 183,
"read": 515,
"write": 323
},
"iops": {
"read": 41,
"write": 15,
"other": 2006,
"total": 2063
},
"throughput": {
"read": 374272,
"write": 78948,
"other": 0,
"total": 453220
}
},
"_links": {
"self": {
"href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
}
}
}
Actually, pulling out the required metrics looks the same.
Volume states
Volume state – volume state. Pre-processing steps: JSONPath: $.state
Volume state. A volume can only be brought online if it is offline. The ‘mixed’ state applies to FlexGroup volumes only and cannot be specified as a target state. An ‘error’ state implies that the volume is not in a state to serve data.
Type of the volume – volume type. Pre-processing steps: JSONPath: $.type
Type of the volume.
rw – read-write volume.
dp – data-protection volume.
ls – load-sharing dp volume. Valid in GET.
Style of the volume – volume style. Pre-processing steps: JSONPath: $.style
The style of the volume. If “style” is not specified, the volume type is determined based on the specified aggregates. Specifying a single aggregate, without “constituents_per_aggregate” creates a flexible volume. Specifying multiple aggregates, or a single aggregate with “constituents_per_aggregate” creates a FlexGroup. If “style” is specified, a volume of that type is created. That is, if style is “flexvol”, a single aggregate must be specified. If style is “flexgroup”, the system either uses the specified aggregates, or automatically provisions if no aggregates are specified.
flexvol – flexible volumes and FlexClone volumes
flexgroup – FlexGroups.
Volume metrics status – the status of the volume indicators. Pre-processing steps: JSONPath: $.metric.status
Any errors associated with the sample. For example, if the aggregation of data over multiple nodes fails then any of the partial errors might be returned, “ok” on success, or “error” on any internal uncategorized failure. Whenever a sample collection is missed but done at a later time, it is back filled to the previous 15 second timestamp and tagged with “backfilled_data”. “Inconsistent_delta_time” is encountered when the time between two collections is not the same for all nodes. Therefore, the aggregated value might be over or under inflated. “Negative_delta” is returned when an expected monotonically increasing value has decreased in value. “Inconsistent_old_data” is returned when one or more nodes does not have the latest data.
Comment on the volume – a comment to that. Pre-processing steps: JSONPath: $.comment
A comment for the volume. Valid in POST or PATCH.
Disk space usage
Volume space used – used space. Pre-processing steps: JSONPath: $.space.used
The virtual space used (includes volume reserves) before storage efficiency, in bytes.
Volume space size – space allocated. Pre-processing steps: JSONPath: $.space.size
Total provisioned size. The default size is equal to the minimum size of 20MB, in bytes.
Volume space available – available space. Pre-processing steps: JSONPath: $.space.available
The available space, in bytes.
Volume space used in percentage – used space as a percentage, used in triggers and for dashboards. Calculated as (last(netapp.get.volume.space.used[{#VOLUME_NAME}])*100)/last(netapp.get.volume.space.size[{#VOLUME_NAME}])
Volume performance metrics
Bandwidth
Storage throughput, measured in bytes per second.
Volume throughput write – write bandwidth. Pre-processing steps: JSONPath: $.metric.throughput.write
Peformance metric for write I / O operations.
Volume throughput total – total bandwidth. Pre-processing steps: JSONPath: $.metric.throughput.total
Performance metric aggregated over all types of I / O operations.
Volume throughput read – read bandwidth. Pre-processing steps: JSONPath: $.metric.throughput.read
Performance metric for read I / O operations.
Volume throughput other – bandwidth for other operations. Pre-processing steps: JSONPath: $.metric.throughput.other
Performance metric for other I / O operations. Other I / O operations can be metadata operations, such as directory lookups and so on.
Delayed operations
The round-trip delay in the storage object, measured in microseconds.
Volume latency write in ms – recording delay in ms. Pre-processing steps: JSONPath: $.metric.latency.write
Peformance metric for write I / O operations.
Volume latency total in ms – total delay in ms. Pre-processing steps: JSONPath: $.metric.latency.total
Performance metric aggregated over all types of I / O operations.
Volume latency read in ms – reading delay in ms. Pre-processing steps: JSONPath: $.metric.latency.read
Performance metric for read I / O operations.
Volume latency other in ms – delay for other operations in ms. Pre-processing steps: JSONPath: $.metric.latency.other
Performance metric for other I / O operations. Other I / O operations can be metadata operations, such as directory lookups and so on.
I / O speed
The I / O rate observed at the storage object.
Volume iops write – the speed of write input-output operations. Pre-processing steps: JSONPath: $.metric.iops.write
Peformance metric for write I / O operations.
Volume iops total – overall speed of I / O operations. Pre-processing steps: JSONPath: $.metric.iops.total
Performance metric aggregated over all types of I / O operations.
Volume iops read – the speed of read I / O operations. Pre-processing steps: JSONPath: $.metric.iops.read
Performance metric for read I / O operations.
Volume iops other – speed for other I / O operations. Pre-processing steps: JSONPath: $.metric.iops.other
Performance metric for other I / O operations. Other I / O operations can be metadata operations, such as directory lookups and so on.
Using inode
{
"uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
"name": "data_1",
"files": {
"maximum": 7685871,
"used": 4110457
},
"_links": {
"self": {
"href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
}
}
}
Inode used on the volume – used inode. Pre-processing steps: JSONPath: $.files.used
Number of files (inodes) used for user-visible data permitted on the volume. This field is valid only when the volume is online.
Inode maximum on the volume – only inode is available. Pre-processing steps: JSONPath: $.files.maximum
The maximum number of files (inodes) for user-visible data allowed on the volume. This value can be increased or decreased. Increasing the maximum number of files does not immediately cause additional disk space to be used to track files. Instead, as more files are created on the volume, the system dynamically increases the number of disk blocks that are used to track files. The space assigned to track files is never freed, and this value cannot be decreased below the current number of files that can be tracked within the assigned space for the volume. Valid in PATCH.
Inode available on the volume – free inode. Calculated as:
last(netapp.get.volume.files.maximum[{#VOLUME_NAME}])-last(netapp.get.volume.files.used[{#VOLUME_NAME}])
Inode used in percentage on the volume – used inode in percent, used in triggers and dashboards. Calculated as: (last(netapp.get.volume.files.used[{#VOLUME_NAME}])*100)/last(netapp.get.volume.files.maximum[{#VOLUME_NAME}])
Alerts
Free disk space less than 1% on the {#VOLUME_NAME}
Free disk space less than 5% on the {#VOLUME_NAME}
Free disk space less than 10% on the {#VOLUME_NAME}
Free inodes less than 1% on the {#VOLUME_NAME}
Free inodes less than 5% on the {#VOLUME_NAME}
Free inodes less than 10% on the {#VOLUME_NAME}
Volume metrics status is not OK on the {#VOLUME_NAME}
Volume state is not ONLINE on the {#VOLUME_NAME}
NetApp cluster version was changed
Visualization
Dashbod looks almost the same, but it has become more compact and informative.
There is a button in the upper right corner Show in Zabbix, with which you can fail in Zabbix and see all the metrics for the selected volume.
Outcome
The same automatic setting / deletion of volumes on / s monitoring / a remains.
We got rid of scripts and stopped bothering colleagues from the DC.
There are a little more metrics and they have become more informative.