Monitoring NetApp Volumes over HTTP

Hello everyone. In continuation of the last article related to crutches and SSH for monitoring the location and performance metrics of volumes available to us on NetApp, I want to share and describe a more correct way to monitor through ONTAP REST API using Zabbix HTTP-agent… Since we only rent space, the only thing that we can get useful for ourselves is performance metrics, space utilization and volume statuses for different indications.

General information and creation of master elements

We need to create several master elements that we will parse and use to low-level discovery… There are two master elements from which we will receive the information we need:

NetApp: Get cluster information – data is collected through the API by reference {$ONTAP.SCHEME}://{HOST.CONN}/api/clusterwhere:

  • {$ONTAP.SCHEME} – HTTPS;

  • {HOST.CONN} – IP NetApp;

  • {$ONTAP.USERNAME} – user from NetApp;

  • {$ONTAP.PASSWORD} – password from NetApp.

{
  "name": "NETAPP_NAME",
  "uuid": "e58448eb-39ac-11e8-ba6a-00a098d57984",
  "version": {
    "full": "NetApp Release 9.6P10: Thu Aug 20 19:45:05 UTC 2020",
    "generation": 9,
    "major": 6,
    "minor": 0
  },
  "_links": {
    "self": {
      "href": "/api/cluster"
    }
  }
}

NetApp: Get volumes – data is collected through the API by reference {$ONTAP.SCHEME}://{HOST.CONN}/api/storage/volumes, there are a lot of volumes (124), so I cut them down to four for clarity.

{
  "records": [
    {
      "uuid": "04dd9e6a-d04f-49d8-8999-97572ed9183c",
      "name": "data_1",
      "_links": {
        "self": {
          "href": "/api/storage/volumes/04dd9e6a-d04f-49d8-8999-97572ed9183c"
        }
      }
    },
    {
      "uuid": "0638aa9e-d683-43a7-bb75-7f056876a6cb",
      "name": "data_2",
      "_links": {
        "self": {
          "href": "/api/storage/volumes/0638aa9e-d683-43a7-bb75-7f056876a6cb"
        }
      }
    },
    {
      "uuid": "0672d0de-e3b0-47e5-9d4a-4e3ae1d34e51",
      "name": "data_3",
      "_links": {
        "self": {
          "href": "/api/storage/volumes/0672d0de-e3b0-47e5-9d4a-4e3ae1d34e51"
        }
      }
    },
    {
      "uuid": "06b8a873-6d85-49f2-bf83-55daf31b26e7",
      "name": "data_4",
      "_links": {
        "self": {
          "href": "/api/storage/volumes/06b8a873-6d85-49f2-bf83-55daf31b26e7"
        }
      }
    }
  ],
  "num_records": 124,
  "_links": {
    "self": {
      "href": "/api/storage/volumes"
    }
  }
}

Items and low-level discovery

NetApp: Get cluster information – from here we only need a version to keep track of firmware updates.

We use JSONPath preprocessing.
We use JSONPath preprocessing.

NetApp: Get volumes – based on this information, we need to make a low-level discovery and create two macros:

  • {#VOLUME_NAME} – volume name;

  • {#VOLUME_UUID} – Volume UUID, will be used to collect statistics for each volume.

We use JSONPath preprocessing.
We use JSONPath preprocessing.

To automatically create data items, you need to do two master item prototype

NetApp: Get volume {#VOLUME_NAME} information – we get the information we need about the volume through the API by the link {$ONTAP.SCHEME}://{HOST.CONN}/api/storage/volumes/{#VOLUME_UUID}

NetApp: Get volume {#VOLUME_NAME} information
{
  "uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
  "comment": "",
  "create_time": "2018-10-10T15:44:42+03:00",
  "language": "c.utf_8",
  "name": "data_1",
  "size": 265106042880,
  "state": "online",
  "style": "flexvol",
  "tiering": {
    "policy": "none"
  },
  "type": "rw",
  "aggregates": [
    {
      "name": "DC_AFF300_03",
      "uuid": "a9e28005-e099-4fcc-bcaa-6781c0086e0b"
    }
  ],
  "clone": {
    "is_flexclone": false
  },
  "nas": {
    "export_policy": {
      "name": "default"
    }
  },
  "snapshot_policy": {
    "name": "DOMCLICK_daily"
  },
  "svm": {
    "name": "DOMCLIC_SVM",
    "uuid": "46a00e5d-c22d-11e8-b6ed-00a098d48e6d"
  },
  "space": {
    "size": 265106042880,
    "available": 62382796800,
    "used": 189467947008
  },
  "metric": {
    "timestamp": "2021-02-16T12:42:00Z",
    "duration": "PT15S",
    "status": "ok",
    "latency": {
      "other": 175,
      "total": 183,
      "read": 515,
      "write": 323
    },
    "iops": {
      "read": 41,
      "write": 15,
      "other": 2006,
      "total": 2063
    },
    "throughput": {
      "read": 374272,
      "write": 78948,
      "other": 0,
      "total": 453220
    }
  },
  "_links": {
    "self": {
      "href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
    }
  }
}

NetApp: Get volume {#VOLUME_NAME} inode information – we get information about the use of inode in each volume through the API link {$ONTAP.SCHEME}://{HOST.CONN}/api/storage/volumes/{#VOLUME_UUID}?fields=files

{
  "uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
  "name": "data_1",
  "files": {
    "maximum": 7685871,
    "used": 4108317
  },
  "_links": {
    "self": {
      "href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
    }
  }
}

Dependent items

Above, we received several master elements with JSON inside, now they can be easily parsed using JSONPath and pulled out the metrics we need. I will analyze one example and then simply describe what metrics we collect.

NetApp: Volume state – volume state, parse the master element NetApp: Get volume {#VOLUME_NAME} information

NetApp: Get volume data_1 information
{
  "uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
  "comment": "",
  "create_time": "2018-10-10T15:44:42+03:00",
  "language": "c.utf_8",
  "name": "data_1",
  "size": 265106042880,
  "state": "online",
  "style": "flexvol",
  "tiering": {
    "policy": "none"
  },
  "type": "rw",
  "aggregates": [
    {
      "name": "DC_AFF300_03",
      "uuid": "a9e28005-e099-4fcc-bcaa-6781c0086e0b"
    }
  ],
  "clone": {
    "is_flexclone": false
  },
  "nas": {
    "export_policy": {
      "name": "default"
    }
  },
  "snapshot_policy": {
    "name": "DOMCLICK_daily"
  },
  "svm": {
    "name": "DOMCLIC_SVM",
    "uuid": "46a00e5d-c22d-11e8-b6ed-00a098d48e6d"
  },
  "space": {
    "size": 265106042880,
    "available": 62382796800,
    "used": 189467947008
  },
  "metric": {
    "timestamp": "2021-02-16T12:42:00Z",
    "duration": "PT15S",
    "status": "ok",
    "latency": {
      "other": 175,
      "total": 183,
      "read": 515,
      "write": 323
    },
    "iops": {
      "read": 41,
      "write": 15,
      "other": 2006,
      "total": 2063
    },
    "throughput": {
      "read": 374272,
      "write": 78948,
      "other": 0,
      "total": 453220
    }
  },
  "_links": {
    "self": {
      "href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
    }
  }
}
We use JSONPath preprocessing.
We use JSONPath preprocessing.

Actually, pulling out the required metrics looks the same.

Volume states

Volume state – volume state. Pre-processing steps: JSONPath: $.state

Volume state. A volume can only be brought online if it is offline. The ‘mixed’ state applies to FlexGroup volumes only and cannot be specified as a target state. An ‘error’ state implies that the volume is not in a state to serve data.

Type of the volume – volume type. Pre-processing steps: JSONPath: $.type

Type of the volume.
rw – read-write volume.
dp – data-protection volume.
ls – load-sharing dp volume. Valid in GET.

Style of the volume – volume style. Pre-processing steps: JSONPath: $.style

The style of the volume. If “style” is not specified, the volume type is determined based on the specified aggregates. Specifying a single aggregate, without “constituents_per_aggregate” creates a flexible volume. Specifying multiple aggregates, or a single aggregate with “constituents_per_aggregate” creates a FlexGroup. If “style” is specified, a volume of that type is created. That is, if style is “flexvol”, a single aggregate must be specified. If style is “flexgroup”, the system either uses the specified aggregates, or automatically provisions if no aggregates are specified.
flexvol – flexible volumes and FlexClone volumes
flexgroup – FlexGroups.

Volume metrics status – the status of the volume indicators. Pre-processing steps: JSONPath: $.metric.status

Any errors associated with the sample. For example, if the aggregation of data over multiple nodes fails then any of the partial errors might be returned, “ok” on success, or “error” on any internal uncategorized failure. Whenever a sample collection is missed but done at a later time, it is back filled to the previous 15 second timestamp and tagged with “backfilled_data”. “Inconsistent_delta_time” is encountered when the time between two collections is not the same for all nodes. Therefore, the aggregated value might be over or under inflated. “Negative_delta” is returned when an expected monotonically increasing value has decreased in value. “Inconsistent_old_data” is returned when one or more nodes does not have the latest data.

Comment on the volume – a comment to that. Pre-processing steps: JSONPath: $.comment

A comment for the volume. Valid in POST or PATCH.

Disk space usage

Volume space used – used space. Pre-processing steps: JSONPath: $.space.used

The virtual space used (includes volume reserves) before storage efficiency, in bytes.

Volume space size – space allocated. Pre-processing steps: JSONPath: $.space.size

Total provisioned size. The default size is equal to the minimum size of 20MB, in bytes.

Volume space available – available space. Pre-processing steps: JSONPath: $.space.available

The available space, in bytes.

Volume space used in percentage – used space as a percentage, used in triggers and for dashboards. Calculated as (last(netapp.get.volume.space.used[{#VOLUME_NAME}])*100)/last(netapp.get.volume.space.size[{#VOLUME_NAME}])

Volume performance metrics

Bandwidth

Storage throughput, measured in bytes per second.

Volume throughput write – write bandwidth. Pre-processing steps: JSONPath: $.metric.throughput.write

Peformance metric for write I / O operations.

Volume throughput total – total bandwidth. Pre-processing steps: JSONPath: $.metric.throughput.total

Performance metric aggregated over all types of I / O operations.

Volume throughput read – read bandwidth. Pre-processing steps: JSONPath: $.metric.throughput.read

Performance metric for read I / O operations.

Volume throughput other – bandwidth for other operations. Pre-processing steps: JSONPath: $.metric.throughput.other

Performance metric for other I / O operations. Other I / O operations can be metadata operations, such as directory lookups and so on.

Delayed operations

The round-trip delay in the storage object, measured in microseconds.

Volume latency write in ms – recording delay in ms. Pre-processing steps: JSONPath: $.metric.latency.write

Peformance metric for write I / O operations.

Volume latency total in ms – total delay in ms. Pre-processing steps: JSONPath: $.metric.latency.total

Performance metric aggregated over all types of I / O operations.

Volume latency read in ms – reading delay in ms. Pre-processing steps: JSONPath: $.metric.latency.read

Performance metric for read I / O operations.

Volume latency other in ms – delay for other operations in ms. Pre-processing steps: JSONPath: $.metric.latency.other

Performance metric for other I / O operations. Other I / O operations can be metadata operations, such as directory lookups and so on.

I / O speed

The I / O rate observed at the storage object.

Volume iops write – the speed of write input-output operations. Pre-processing steps: JSONPath: $.metric.iops.write

Peformance metric for write I / O operations.

Volume iops total – overall speed of I / O operations. Pre-processing steps: JSONPath: $.metric.iops.total

Performance metric aggregated over all types of I / O operations.

Volume iops read – the speed of read I / O operations. Pre-processing steps: JSONPath: $.metric.iops.read

Performance metric for read I / O operations.

Volume iops other – speed for other I / O operations. Pre-processing steps: JSONPath: $.metric.iops.other

Performance metric for other I / O operations. Other I / O operations can be metadata operations, such as directory lookups and so on.

Using inode

{
  "uuid": "a1ef1dd3-bf62-470c-87d0-39d0c4366913",
  "name": "data_1",
  "files": {
    "maximum": 7685871,
    "used": 4110457
  },
  "_links": {
    "self": {
      "href": "/api/storage/volumes/a1ef1dd3-bf62-470c-87d0-39d0c4366913"
    }
  }
}

Inode used on the volume – used inode. Pre-processing steps: JSONPath: $.files.used

Number of files (inodes) used for user-visible data permitted on the volume. This field is valid only when the volume is online.

Inode maximum on the volume – only inode is available. Pre-processing steps: JSONPath: $.files.maximum

The maximum number of files (inodes) for user-visible data allowed on the volume. This value can be increased or decreased. Increasing the maximum number of files does not immediately cause additional disk space to be used to track files. Instead, as more files are created on the volume, the system dynamically increases the number of disk blocks that are used to track files. The space assigned to track files is never freed, and this value cannot be decreased below the current number of files that can be tracked within the assigned space for the volume. Valid in PATCH.

Inode available on the volume – free inode. Calculated as:

last(netapp.get.volume.files.maximum[{#VOLUME_NAME}])-last(netapp.get.volume.files.used[{#VOLUME_NAME}])

Inode used in percentage on the volume – used inode in percent, used in triggers and dashboards. Calculated as: (last(netapp.get.volume.files.used[{#VOLUME_NAME}])*100)/last(netapp.get.volume.files.maximum[{#VOLUME_NAME}])

Alerts

  • Free disk space less than 1% on the {#VOLUME_NAME}

  • Free disk space less than 5% on the {#VOLUME_NAME}

  • Free disk space less than 10% on the {#VOLUME_NAME}

  • Free inodes less than 1% on the {#VOLUME_NAME}

  • Free inodes less than 5% on the {#VOLUME_NAME}

  • Free inodes less than 10% on the {#VOLUME_NAME}

  • Volume metrics status is not OK on the {#VOLUME_NAME}

  • Volume state is not ONLINE on the {#VOLUME_NAME}

  • NetApp cluster version was changed

Visualization

Dashbod looks almost the same, but it has become more compact and informative.

There is a button in the upper right corner Show in Zabbix, with which you can fail in Zabbix and see all the metrics for the selected volume.

Outcome

  • The same automatic setting / deletion of volumes on / s monitoring / a remains.

  • We got rid of scripts and stopped bothering colleagues from the DC.

  • There are a little more metrics and they have become more informative.

Template and dashboard

useful links

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *