Understanding what S3 is and making a simple object storage in Go

An object can be anything: an image, a video, or a text document. In addition to the identifier, metadata also contains additional information about the object, such as: creation date, file type, author, and other attributes that have almost no effect on the object's properties, unlike similar ones in the file system (for example, the same author), which have almost no effect.

The difference between object storage and traditional file systems

The usual file systems NTFS, EXT4, etc. organize files in a hierarchical folder structure. This model works well for small amounts of data and simple usage scenarios, when the number of files is relatively small, for example, on a personal device or a home file server. But when the amount of data starts to exceed terabytes, and the number of users is not several members of your family connecting via WiFi to the NAS to watch a movie in the evening, scroll through photos or download scans of foreign passports, noticeable difficulties in terms of speed and resource consumption for performing operations begin.

Without a rigid hierarchical structure, all data access operations are reduced to simply using a key to find the desired object: no long searches through subfolders of subfolders and access rights checks. In addition, when the main requirement for an object is its unique identifier, the system is easily scalable: just add objects while there is enough space on the drive. And if there is not enough, then expand on the fly – there are no whims, as with file systems, which often require formatting or at least unmounting a partition for such a trick.

Features of object storage

The advantages include:

Scalability. As mentioned earlier, they can be easily scaled horizontally by adding new nodes and increasing storage capacity. Just keep up with bringing new disks or racks of storage into the server room.

Flexibility. Object storage can store any type of data, from small text files to large videos of tap water being cut with scissors for 24 hours. Everything is equally easy and simple to control via API.

Ease of control. No hassle with hierarchy. Folders, subfolders, author, access rights, configured via chmod – all this is not about object storage. Metadata is stored together with objects, which allows them to be indexed easily and quickly.

Reliability and availability. Object storage is fairly easy to decentralize on physically remote devices, where data is replicated across multiple nodes, creating a single system. This ensures high reliability and availability even if one or more nodes fail.

But object storage also has its drawbacks – where the fastest possible execution of operations or work with a large number of small files is required, it is better to take a closer look at block storage.
In addition, object storages are by definition not suitable for working with data that requires a strict hierarchy, various access rights, links, etc. Which, however, does not stop anyone from hammering nails with a microscope and turning object storages into file systems. Why – history is silent.

Using Object Storage

Object storage is widely used in various cloud services and platforms, where it is often necessary to store and perform operations with large volumes of information:

Backup and archiving. Object storage is ideal for long-term data storage, such as backups and archives. Especially when there are a lot of backups and they need to be done and forgotten until the next system crash.

Storage of media files. If you need to create a file dump for your online cinema, where you just want to dump all the files without bothering, object storage is your friend.

Cloud applications. Cloud services and applications based on the SaaS or PaaS model often use object storage to store user data, logs, reports and other unorganized data that will usually lie unattended until the second coming.

Containers and microservicesIn containerized microservices environments, object storage is used to store and transfer data between different services, ensuring portability and decentralization of the system architecture.

What is S3?

Now that we've covered the basics, let's talk about S3 – a service, protocol, and technology that has essentially become synonymous with the phrase “object storage.”

S3 offers users a simple and scalable way to store data through a web interface. It supports various access protocols, including REST API, and integrates with other AWS services such as EC2, Lambda, and RDS. Due to its reliability, availability, and flexibility, S3 has become the de facto standard for cloud data storage.

It is interesting to note that since its introduction on March 14, 2006, S3 has become not just another AWS service, but the standard for all object storage. This has led to many companies and developers creating their own solutions compatible with the S3 API to provide users with the ability to use the same tools and applications as S3, but on other platforms.

Like S3 but not S3

When we say “S3,” we most often don't mean the service from AWS. S3 has gradually become what “Xerox” is to scanners or “Google” is to search engines — a household name. The term “S3” has come to refer to a whole class of object storage compatible with Amazon's original API standard.

The reason for this is simple: S3 appeared quite a long time ago, 18 years have passed, that is, the pioneer effect played a role. And on top of everything else, the pioneer is Amazon: not only is it one of the richest mega-corporations, but it also tirelessly claims to squeeze the entire market under itself in the cloud market. One of the advantages for mere mortals is that the S3 API turned out to be extremely simple and understandable to master, which contributed to its wide adoption. As a result, the S3 API turned into a kind of universal language for interacting with object storage.

However, compared to “Xerox”, where the term simply became a synonym for any scanner, in the case of S3, the situation is more complicated. S3-compatible storages follow a common standard. They implement the same API as the original from Amazon. That is, being familiar with the original in the AWS ecosystem, you can easily work with any other S3-compatible storages, be it Ceph, MinIO, etc.

As a result, this standardization of object storage according to the S3 template led to an interesting effect on the market. Companies that did not want to be completely dependent on Amazon, which even Amazon itself seems to not want, or were looking for more economical alternatives, which Amazon itself also seems to want, began to develop their own object storage compatible with S3, but which are simply called S3 storage. Although, if we were talking not about IT, but about food products, then such a story would rather be called S3-product identical to the natural, or S3-product imitation. It is as if scanner manufacturers were not simply called “Xerox”, but relied heavily on the documentation and standards used in the originals from Xerox itself.

Ceph, for example, through its RADOS Gateway (RGW) emulates S3 so well that most applications originally designed for AWS can easily work with Ceph as if it were native. MinIO went even further and made S3 API compatibility its main advantage, making migration from AWS to your own self-hosted solution or to a provider using MinIO for S3 storage even more seamless.

But it is worth understanding that even though all these imitations use a common API and S3 standard, they can differ significantly in their backend. Ceph and MinIO are two completely different stories, with at least different performance and resource consumption.

Category: Experiments

Theory is great, but let's move on to practice and try writing our own object storage in Go. Why? Why not?

Step 1: Create and configure the project

Let's start with the basics. Let's create a directory for our project and initialize the Go module:

```bash
mkdir go-object-storage
cd go-object-storage
go mod init go-object-storage
```

Step 2: Writing the code

Now the fun part. Let's create a main.go file and start writing code:

Application architecture

Our application is a simple object storage. Let's look at its key components:

1 Storage structure:

```
type Storage struct {
    mu    sync.Mutex
    files map[string][]byte
}
```

This is the core of our storage. It uses a hash table (map) to store objects in memory, where the key is the file name and the value is its contents as bytes. A mutex (sync.Mutex) ensures thread safety when accessing simultaneously.

2 — Save and Load methods:

```go
func (s *Storage) Save(key string, data []byte)
func (s *Storage) Load(key string) ([]byte, bool)
```

These methods are responsible for saving and loading objects. Save saves data both in RAM and on the file system, ensuring data persistence. Load loads data first from memory, and if there is no data there, from disk.

  • In the current version, the application saves data to disk, but when the server is restarted, it is not loaded back into memory. So this is just a template for future implementation.

3 — HTTP handlers:

```go
func HandleUpload(w http.ResponseWriter, r *http.Request, storage *Storage)
func HandleDownload(w http.ResponseWriter, r *http.Request, storage *Storage)
func HandleList(w http.ResponseWriter, r *http.Request, storage *Storage)
```

These functions handle HTTP requests to upload, download, and list objects:

  • HandleUpload uploads data to the server and stores it in storage.

  • HandleDownload provides data from storage to the client upon request.

  • HandleList returns a list of all objects stored in the system.

4 — main function:

```go

func main() {...}
```

Initializes the store and starts the HTTP server.

The application itself

```go
package main

import (
	"encoding/json"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"sync"
)

const (
	STORAGE_DIR         = "./storage"        // ДИРЕКТОРИЯ ДЛЯ ХРАНЕНИЯ ОБЪЕКТОВ
	UPLOAD_PREFIX_LEN   = len("/upload/")    // ДЛИНА ПРЕФИКСА ДЛЯ МАРШРУТА ЗАГРУЗКИ
	DOWNLOAD_PREFIX_LEN = len("/download/")  // ДЛИНА ПРЕФИКСА ДЛЯ МАРШРУТА ЗАГРУЗКИ
)

// Storage — структура для хранения объектов в памяти
type Storage struct {
	mu    sync.Mutex            // Мьютекс для обеспечения потокобезопасности
	files map[string][]byte      // Хэш-таблица для хранения данных объектов
}

// NewStorage — конструктор для создания нового хранилища
func NewStorage() *Storage {
	return &Storage{
		files: make(map[string][]byte),
	}
}

// Save — метод для сохранения объекта в хранилище
func (s *Storage) Save(key string, data []byte) {
	s.mu.Lock()         // Захватываем мьютекс перед записью
	defer s.mu.Unlock() // Освобождаем мьютекс после записи

	// Сохраняем данные в памяти
	s.files[key] = data

	// Также сохраняем данные на диск
	err := ioutil.WriteFile(STORAGE_DIR+"/"+key, data, 0644)
	if err != nil {
		log.Printf("Ошибка при сохранении файла %s: %v", key, err)
	}
}

// Load — метод для загрузки объекта из хранилища
func (s *Storage) Load(key string) ([]byte, bool) {
	s.mu.Lock()         // Захватываем мьютекс перед чтением
	defer s.mu.Unlock() // Освобождаем мьютекс после чтения

	// Проверяем наличие объекта в памяти
	data, exists := s.files[key]
	if exists {
		return data, true
	}

	// Если объект не найден в памяти, пытаемся загрузить его с диска
	data, err := ioutil.ReadFile(STORAGE_DIR + "/" + key)
	if err != nil {
		return nil, false
	}

	// Если загрузка с диска успешна, кэшируем объект в памяти
	s.files[key] = data
	return data, true
}

// HandleUpload — обработчик для загрузки объектов
func HandleUpload(w http.ResponseWriter, r *http.Request, storage *Storage) {
	if r.Method != http.MethodPost {
		http.Error(w, "Метод не поддерживается", http.StatusMethodNotAllowed)
		return
	}

	// Получаем ключ (имя объекта) из URL
	key := r.URL.Path[UPLOAD_PREFIX_LEN:]

	// Читаем тело запроса (данные объекта)
	data, err := ioutil.ReadAll(r.Body)
	if err != nil {
		http.Error(w, "Ошибка чтения данных", http.StatusInternalServerError)
		return
	}

	// Сохраняем объект в хранилище
	storage.Save(key, data)

	// Отправляем ответ клиенту
	w.WriteHeader(http.StatusOK)
	fmt.Fprintf(w, "Объект %s успешно сохранен", key)
}

// HandleDownload — обработчик для загрузки объектов
func HandleDownload(w http.ResponseWriter, r *http.Request, storage *Storage) {
	if r.Method != http.MethodGet {
		http.Error(w, "Метод не поддерживается", http.StatusMethodNotAllowed)
		return
	}

	// Получаем ключ (имя объекта) из URL
	key := r.URL.Path[DOWNLOAD_PREFIX_LEN:]

	// Загружаем объект из хранилища
	data, exists := storage.Load(key)
	if !exists {
		http.Error(w, "Объект не найден", http.StatusNotFound)
		return
	}

	// Отправляем данные объекта клиенту
	w.WriteHeader(http.StatusOK)
	w.Write(data)
}

// HandleList — обработчик для вывода списка всех объектов
func HandleList(w http.ResponseWriter, r *http.Request, storage *Storage) {
	if r.Method != http.MethodGet {
		http.Error(w, "Метод не поддерживается", http.StatusMethodNotAllowed)
		return
	}

	// Захватываем мьютекс для доступа к хэш-таблице объектов
	storage.mu.Lock()
	defer storage.mu.Unlock()

	// Создаем список ключей (имен объектов)
	keys := make([]string, 0, len(storage.files))
	for key := range storage.files {
		keys = append(keys, key)
	}

	// Кодируем список ключей в формат JSON и отправляем клиенту
	w.Header().Set("Content-Type", "application/json")
	json.NewEncoder(w).Encode(keys)
}

func main() {
	// Проверяем наличие директории для хранения объектов
	if _, err := os.Stat(STORAGE_DIR); os.IsNotExist(err) {
		err := os.Mkdir(STORAGE_DIR, 0755)
		if err != nil {
			log.Fatalf("Ошибка создания директории %s: %v", STORAGE_DIR, err)
		}
	}

	// Создаем новое хранилище
	storage := NewStorage()

	// Настраиваем маршруты для обработки HTTP-запросов
	http.HandleFunc("/upload/", func(w http.ResponseWriter, r *http.Request) {
		HandleUpload(w, r, storage)
	})
	http.HandleFunc("/download/", func(w http.ResponseWriter, r *http.Request) {
		HandleDownload(w, r, storage)
	})
	http.HandleFunc("/list", func(w http.ResponseWriter, r *http.Request) {
		HandleList(w, r, storage)
	})

	// Запускаем HTTP-сервер на порту 8080
	log.Println("Сервер запущен на порту 8080")
	log.Fatal(http.ListenAndServe(":8080", nil))
}
```

Compile and test

Now let's compile our application:

```bash
go build -o object-storage
```

And let's launch our freshly baked server:

```bash
./object-storage
```

Let's check how our creation works. Let's use curl for testing:

  1. Loading object:

```bash
curl -X POST -d "Hello, World!" http://localhost:8080/upload/hello.txt
```
  1. Downloading an object:

```bash
curl -O http://localhost:8080/download/hello.txt
```
  1. Getting a list of all objects:

```bash
curl http://localhost:8080/list
```

Voila! We have created a simple object store in Go. Of course, this is just a basic implementation, and in the real world you will need a lot more additional functionality on top. But it is a good enough starting point for further experimentation and learning about object stores within a pet project.

P.S. As you can see, the project actively uses the Linux file system, although earlier I spent the entire post ranting about how object storage differs from file systems and in general “It's different” ™. The thing is that there is a nuance here. That was the theory, but in practice, object storage is *drum roll* – an abstraction. Yes, not again, but again. And if you look into their essence, basis and base, at the zero level they will use file systems to store information.

Summary

The history of S3 and object storage in general clearly shows the old and regularly repeating story of how a technology, due to its emergence at the right time and in the right company, due to the first-mover effect and coupled with the monstrous size of the market, becomes an industry standard and a household name for its fellow industry members.
However, this would be an understatement of S3's achievements. The success of both the service and the standard was also largely due to the simplicity and understandability of the API and ecosystem as a whole, which made it easy and quick for even beginners to integrate it into the infrastructure of their services and applications on the fly. As a result, S3's position as the de facto standard for cloud storage has already been strengthened.

IN cdnnow!as you might guess after reading this article, we provide customers with access to various storages, including S3-compatible ones, based on our implementation using Ceph. This allows you to flexibly manage data using familiar tools and processes. And also not be afraid that due to the next round of sanctions you will have to say “bye” to your S3 storage within the AWS ecosystem.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *