Justifications for such a transition and a brief description of the process of setting up a new storage system are considered. We give an example of the pros and cons of switching to the selected system.
The infrastructure of one of our customers consisted of many heterogeneous storage systems of different levels: from SOHO systems QNAP, Synology for user data to Entry and Mid-range storage systems Eternus DX90 and DX600 for iSCSI and FC for service data and virtualization systems.
All this differed both in generations and in the disks used; part of the systems was legacy equipment that did not have vendor support.
A separate issue was the management of free space, since all available disk space was highly fragmented across many systems. As a result, the inconvenience of administration and the high cost of maintaining a fleet of systems.
We faced the task of optimizing the storage infrastructure in order to reduce the cost of ownership and unification.
The task was comprehensively analyzed by our company experts taking into account customer requirements for data availability, IOPS, RPO / RTO, as well as the possibility of upgrading the existing infrastructure.
The main players in the market for mid-range storage systems (and above) are IBM with Storwize; Fujitsu, represented by the Eternus line, and NetApp with the FAS series. These systems were considered as a storage system that meets the specified requirements, namely: IBM Storwize V7000U, Fujitsu Eternus DX100, NetApp FAS2620. All three are Unified-SHD, that is, they provide both block access and file access, and provide close performance indicators.
But in the case of the Storwize V7000U, file access is organized through a separate controller – a file module that connects to the main block controller, which is an additional point of failure. In addition, this system is relatively difficult to manage, and does not provide proper isolation of services.
The Eternus DX100 storage system, also being an Unified storage system, has serious limitations on the number of file systems created, without giving the necessary isolation. In addition, the process of creating a new file system takes a long time (up to half an hour). Both described systems do not allow sharing CIFS / NFS servers used at the network level.
Taking into account all the parameters, including the total cost of ownership of the system, NetApp FAS2620 was chosen, consisting of a pair of controllers operating in Active-Active mode and allowing to distribute the load between the controllers. And when combined with the built-in mechanisms of online deduplication and compression, it can significantly save on the space occupied by data on disks. These mechanisms become much more effective when aggregating data on one system compared to the initial situation, when potentially identical data was located on different storage systems and it was impossible to deduplicate them among themselves.
Such a system made it possible to place all types of services under the control of a single fail-safe cluster: SAN in the form of block devices for virtualization and NAS in the form of CIFS, NFS shares for user data in Windows and * nix-systems. At the same time, there remained the possibility of a safe logical separation of these services thanks to the SVM (Storage Virtual Machine) technology: the services responsible for different components do not affect the “neighbors” and do not allow access to them.
There is also the opportunity to isolate services at the disk level, preventing the performance from sagging under heavy load from the “neighbors”.
For services that require fast read / write, you can use a hybrid type of RAID array, adding several SSDs to the HDD aggregate. The system itself will place “hot” data on them, reducing the latency of reading frequently used data. This is in addition to the NVRAM cache, which ensures its atomicity and integrity in addition to the high write speed (data will be stored in the NVRAM powered by the battery until confirmation of their complete recording is received from the file system) in case of a sudden power failure.
After data migration to a new storage system, it becomes possible to more efficiently use the cache disk space.
As mentioned above, the use of this system allowed us to solve two problems at once:
- One cluster in a single chassis, consisting of two controllers, allowing you to solve the full range of tasks facing the company.
- A single point of management for all storage services. You no longer need to look at which storage is sent to the LUN, where what data can be migrated if there is not enough space, and so on.
- Single point of service. Now they use the same type of disks inserted into a common disk shelf. The system is mounted in one rack, reducing the required number of Ethernet and Fiber Channel cables and switches.
- Since the new cluster has access to all stored data, it becomes possible to efficiently compress data by looking for identical blocks in them. This works most effectively for virtual machines and backups.
- The SVM (Storage Virtual Machine) technology used in NetApp, as mentioned above, allows to differentiate services, while preserving the advantages of unification. Now, for each task, you can create a separate SVM that will solve its problem. Provide data on only one protocol only to strictly specified users / services.
- Network isolation.
Each SVM uses its own virtual network interface, which uses a strictly defined group of physical ports, or VLAN interfaces. Thus, even if traffic from different SVMs goes through the same physical port, this traffic is in different VLAN’s. That is, the storage network port is a trunk port.
Groups of ports for iSCSI traffic are allocated in order to separate the high network SAN load from user traffic up to the fact that separate systems can reserve a separate physical port without “sharing” it with anyone.
- Isolation at the disk subsystem level.
In a typical implementation, the smallest possible number of RAID groups is created (maximizing the number of disks in a single RAID group increases the performance of the array), on which separate file systems are then created in the form of Volume. Volume is assigned to SVMs, thus ensuring the inaccessibility of data between SVMs in case of compromise. And if you fill out the “volume” with one service, the “volumes” of other SVMs will not be affected.
In some cases, for safety reasons, dedicated RAID groups are created for especially critical data in order to guarantee data isolation even at the physical level.
- As the volume of data grows without complex manipulations, the required number of disk shelves is added without stopping the service. New drives are immediately available to both controllers to expand existing RAID groups, or to create new ones.
- An increase in the number of services can lead to a lack of computing resources of the system (CPU, RAM). In this case, you can add another storage node to an existing cluster by including it in an existing cluster, expanding the number of IO interfaces, the amount of memory and fault tolerance in it.
- NetApp supports S3-compatible object storage both as third-party services and by providing its products for creating on-premise object storage for storing cold data and archives.
– Having reduced all services under the control of one system, we expectedly get a greater impact from disabling one component (1 out of 2 controllers versus 1 out of 10+ in the old infrastructure).
– Reduced distribution of storage infrastructure. If earlier storage systems could be located on different floors / in different buildings, now everything is concentrated in one rack. This item can be offset by buying a less efficient system and using synchronous / asynchronous replication in case of force majeure situations.
Step by step setup
Due to the confidentiality of information, it is impossible to demonstrate screenshots from the real customer’s environment, so the configuration steps are shown in the test environment and completely repeat the steps performed in the customer’s productive environment.
The initial state of the cluster. Two aggregates for root partitions of the corresponding nodes clus01_01, clus01_02 of the cluster
Creating aggregates for data. Each node has its own aggregate consisting of one RAID-DP array.
Result: two aggregates were created: rg0_node02, rg1_node01. There is no data on them yet.
Creation of SVM as a CIFS server. For SVM, it is mandatory to create a root volume for which the root aggregate is selected – rg1_node01. This volume will store individual SVM settings.
Configuring the CIFS protocol of this SVM. Here you can specify the IP address of the server and the physical interface through which server traffic should go. The port can be either a VLAN port or an aggregated LACP port. At the same step, a Volume is created for storing data, and a shared folder that will be accessible over the network for users.
After adding user data to a shared folder, automatic compression and deduplication mechanisms demonstrate the following efficiencies. Actually, the space occupied on the server turned out to be 4.9 times less than the total file size. The actual compression factor depends on the type of data recorded.
Create SVM as an iSCSI target. Similarly, the unit on which the Root Volume of this server will be located is selected. At the second step of this Wizard, by analogy with the CIFS server, the IP address of the virtual interface of the iSCSI server, the physical port for it, and also the block device (LUN), which will be presented to the initiator, are set.
Ready LUN size of 10 GB. He should ask the group of initiators to whom he should be available.
The initiator group consists of one Hyper-V Server with the iqn listed below.
In the mounted LUN-Hyper-V Server, a hard disk file was created for the Linux virtual machine. After performing regular optimization, the data inside Volume located on the storage system was compressed more than twice. If there were more of the same type of virtual machines in this LUN, the total savings would be even higher.