which devices to choose for backup and how to calculate the cost of storage

  • Sequential Read/Write Mode. By winding the tape, you can either write or read data. A new array of information can only be added to the end of the previous recording, and not to any location on the tape.

  • Does not support optimizations. For example, synthetic backup, instant data recovery and deduplication are such legacy technologies. You can’t make small backups often, because in fact these tapes need to be constantly mounted, unmounted, and rewinded. Therefore, tapes are not suitable, for example, for backing up transaction logs from a large number of clients every 15–30 minutes.

  • Less reliability. Over time, the tapes may become demagnetized or fall apart. Older generations are especially vulnerable to reading problems. It’s just a pain when customers store backups for a long time and forget, and after five years they remember that they need to restore. Of course, it turns out that by this time the tape had fallen apart.

I remember well a case when a customer had cassettes from the decade before last, the 2000s. Formally, there is a backup, but in reality there is practically no chance of recovering from it. This is not to mention the software that was installed on Windows Server 2003 at most.

It’s not enough to just record and duplicate cassettes. Regular checking and rewinding of tapes is also necessary. It reduces the likelihood of information loss and adds operational cost. However, tapes still remain noticeably cheaper than disks, and the newer the tape and the LTO generation, the better the situation.

15 years ago, in the process of prevention, we advised taking a closer look and possibly replacing the tape after 300 mounts, and after a thousand (in my experience) read/write errors often appeared. These were SDLT cassettes or the LTO2-LTO3 generation. However, starting somewhere with LTO5, even a counter of 3-4 thousand mounts no longer meant anything, and the number of failures decreased by an order of magnitude.

More about backup disks

The direct opposite of tape is disk storage. It is used in different formats:

  • Servers with built-in disks (from 12 to 24 or more).

  • Entry-level storage system.

  • Specialized storage appliances like HPE StoreOnce, Dell DataDomain, or Yadro's new Tatlin.Backup, which we've been testing all summer. Spoiler: there are some nuances, but overall everything is great. Wait for the article.

All of these devices are equipped with compression, deduplication, and increased fault tolerance functions in one form or another. In the first and second cases, these are software implementations (by the way, we usually use NL-SAS disks, the main thing is to know when to stop and use 8/10/12 TB), and in the third – at the storage level. But despite all the differences, the basis of all these systems is hard drives.

Compared to tape, disk storage systems have two key advantages:

  • Random access. You can read and restore any block at any time.

  • Greater security. When using disk systems, RAID groups are usually created and Hot Spare disks are used. Such configurations ensure that a backup is available even if multiple disks fail.

However, disks have several disadvantages for long-term data storage. Thus, they often become the first victim of various attacks or incorrect deletions, since information on disks is available almost instantly, and is also quickly erased and overwritten.

The situation is different in tape libraries. When deleting a backup, the management server only marks it as deleted in the database. Physically, the information remains on the tape until it is re-recorded. This feature has helped our clients out more than once. They believed that they had lost important data, but in fact the information was still stored on the tape, waiting to be overwritten by a new backup.

Tape libraries with dozens or hundreds of tapes offer a better chance of recovering erroneously deleted data. Some tapes may not be used for weeks, which increases the likelihood of successful data recovery.

One day our client needed to return a critical document file. We were able to find it in the library a month after the entry was deleted. To do this, we used the cassette import process: the backup program rescanned the library and found old backup records. True, at the same time a serious problem with the regulations was revealed: new backups were not created, and old ones were deleted after the expiration of the storage period. However, thanks to tape storage, it was possible to restore the required file.

With disk storage, this outcome is unlikely. We have repeatedly encountered situations where disk backups were deleted accidentally or intentionally (to free up space). When customers discovered that they needed a remote backup, it was no longer possible to restore the data; there was nothing left on the physical media.

In general, disks are the opposite of tapes, which means they are reliable, allow for deduplication, synthetics, and instant startup, but consume more power and cost more per gigabyte of data stored. Creating completely offline disk storage is more difficult, although some clients try to solve this problem by writing data to the storage system and physically disconnecting it.

Despite these disadvantages, disks are more often used for online backups due to their high speed. It is expected that such copies will require frequent access. However, due to the high cost of disks, online backups are usually stored for a short time, only a few weeks.

Clouds as a means of storing backups. Immutable Storage

The new trend in backup is creating escrow copies in cloud-based object storage compatible with Amazon S3. This service is offered by various companies, including colleagues from K2 Cloud. Yes, essentially these are the same hard drives, but located outside of your infrastructure. In theory, they have the reliability of disk systems, but usually cost a little less than the storage system in your server room.

The main disadvantage of cloud storage — the need to transfer backup copies to an external provider, which reduces the level of control over data. For many companies this poses a serious risk. However, modern technologies make it possible to minimize these risks. Now they are creating secure communication channels with the cloud provider, and the backup copies themselves are additionally encrypted before being transferred to the cloud.

Another significant problem: it is difficult to implement fully alienated, broken storage in the cloud.

Modern object storage with support for software copying offers the Immutable Storage feature. This is an implementation of the WORM (write once, read many) principle. Once data is recorded, it cannot be deleted even with administrator rights for a specified period.

Object storage, including our cloud solution, has this feature, and we actively test it for compatibility with various software. Many enterprise software supports this feature.

If Immutable Storage is not available, you can protect your backups using an alternative method. Create additional cloud accounts with separate passwords and logins (two-factor is our everything!). Periodically upload backups there, and then disconnect from this storage. But this method, of course, increases the cost of data storage and the likelihood of leakage.

How frequency and storage time of backups affect storage costs

And again we return to the issue of costs for storing backups. Often, customers make biased decisions when choosing equipment due to incorrect assessment of the frequency and storage period of backup copies.

Of course, any customer strives to make backups as often as possible and store them for as long as possible. But it's expensive.

Let's roughly estimate that 1 GB of storage per month can cost 2–3 rubles. The figure seems small, but that's it for now.

This has happened to us more than once: the customer asks us to make backups every week and store them for at least a month. We offer a scheme: full backup on weekends, incremental backups every day. 4 complete copies are accumulated per month. The customer wants to store monthly copies for a year. That's 12 more complete copies. OK. In addition, he wants to keep annual copies for another 5 years. Fine. 5 more complete copies.

The result is 4+12+5 – 21 copies that need to be permanently stored. And then it turns out that this is a 100 terabyte frontend backup. That turns out to be more than 2 petabytes of storage just for backups!

The cost of such storage is several million rubles monthly. When we calculate the total amount, taking into account full and incremental copies, the customer is surprised and goes away to think. Often he expects that the cost of backups will be limited only to the process of storing data and forgets about the constant costs of storage.

One of our customers wanted to store backups for ten years, but calculations showed that it would cost more than his entire product (the cost of backup infrastructure is higher than the cost of the protected infrastructure), but there are a couple of recommendations that help reduce the load on storage and at the same time reduce costa:

  • Create a backup routine. To do this, classify all systems: mission critical, business critical, business critical, development test environments. Then, for each category, determine the recovery time objective (RTO), recovery point objective (RPO), retention periods, and retention frequency.

  • Keep only “golden copies” for a long time, and not monthly backups. Golden copies should be kept for at least a year, and key data perhaps up to 5-10 years. But a longer period is usually an extra burden, especially since this data will most likely never be restored. Unless you work in an archive like the Wayback Machine.

  • It is imperative to create regulations for data recovery testing procedures. If testing has not been carried out even once in 5 years, there is a high probability that something will go wrong during restoration. In addition, you need to take into account that the period of backup copies may exceed the period of planned replacement of equipment and software in the company. It happens that you need to restore information from archival media, and you have to look for the appropriate equipment and software in the museum.
    In the case of that customer, we found out that it was enough to store data for a year, and not 5-10 years. The exception was a few critical files. In addition, it turned out that not all systems can be backed up, but only a part. Then thoughtful analysis helped reduce equipment costs and storage costs by almost ten times.

  • Mix different storage systems. For online backups that are stored for 2–4 weeks, the optimal choice is disks. If you want to store data on them longer, you need to use deduplication (software or based on a specialized storage). However, the classic approach to long-term storage is still tape, and a more flexible alternative is cloud storage.

  • Follow the multiple site rule. Not the best, but the minimum acceptable approach is cross-storing copies at a remote site, for example, a data center and a data center. A standard, reliable approach is storage on both sites (deduplication will reduce the load on the channel, and specialized devices will allow you to transfer backups directly). In an ideal situation, another alienated copy is added to this pair of copies – a cloud or cassettes taken to a third location.

Planning does not always have a multiple effect, but it will never be useless. It must be carried out before selecting and purchasing equipment for backups. This is the only way to optimally use different types of storage, taking into account their pros and cons.

To be continued…

In the next article I will tell you about software for backups. We will also discuss how to increase the performance of backups without extra costs. We will touch on the topics of deduplication, retention lock, encryption shortcomings and much more.

Traditionally, for conversations about backup, I am always available at: alzotov@k2.tech

More about backups and storage systems:

From media to regulations: how to build a secure backup architecture

Testing the ExaGrid EX18 storage system: was it possible to replace Dell DataDomain and HPE StoreOnce?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *