SMTP protocol. Writing an SMTP server in C

Recently I wanted to improve my knowledge of the C language. I decided to write a small project using only the standard library of the language. To get even more benefit from this project, I decided to study a new technology for me. This technology was the SMTP application layer protocol, and the project was a small SMTP server. Today I will tell you how the SMTP protocol works, and how I implemented a server that works with it.

SMTP protocol

SMTP is an application-level protocol designed to transfer email messages between a client and a server. SMTP was originally introduced in the standard RFC 780 in 1981, and then supplemented in RFC 821 next year. This standard described the mechanism for transmitting emails between ARPANET clients. The standard is so old that the Internet did not even exist at that time. Due to the rapid development of technology, the standard often had to be supplemented and redefined, we will talk about this a little later.

SMTP works on the client-server principle. After connecting to the server, the client must introduce itself. This is done using the command HELO. The client sends this command along with its host name so that the server can identify it. After this, the client and server are ready to send emails.

The basic procedure for the protocol is the SMTP transaction (SMTP mail transaction). It allows us to send emails. The transaction consists of three steps and three commands:

  • Defining the reverse path (reverse-path). Command MAIL,

  • Determining paths to recipients of the letter. Team RCPT,

  • Determining the contents of a letter. Command DATA.

The return path is the path the recipient of the letter must take to send a response. The path to the recipient is, accordingly, the path the letter will take before it reaches the recipient's mailbox.

For example, let's take a direct path <@foo.local,@bar.local:carl@quz.local> and the way back <john@domain.local>. The forward path contains the hostnames that the email must pass through before it reaches the recipient. The email will first reach the host foo.localthen on bar.local and in the end it will go to the user's mailbox carl on the host quz.local.

Intermediate servers must modify the reverse-path and normal-path fields after the email arrives at the host. foo.localthe server on this host should forward the letter to the host bar.local with the return path <@foo.local:john@domain.local> and the direct path <@bar.local:carl@quz.local> (on the way back, the server that received the letter was added, but on the way forward it disappeared).

The path can consist of a single address, for example the return path<john@domain.local> and the straight path <carl@domain.local>. The letter will reach the recipient if the sender is located on the same server as the recipient.

After defining the paths, we must define the content of the letter. This is the useful information that the sender wants to convey to the recipient. After the client enters the command DATAthe server starts recording the contents of the letter. In order for the server to stop recording, you need to send it one period on a new line. As soon as the client has finished entering the contents of the letter, the server immediately begins processing it.

SMTP transaction

SMTP transaction

The letter consists of headers and text. One empty line is used to separate them. The headers of the letter usually contain the subject (Subject), the sender and recipient (From and To) and the date of sending the letter (Date). These fields are added by the client.

Servers can also add their own headers. For example, each server adds a header after receiving a letter Receivedwhich contains information about the sending host, the receiving host, and the date the message was received.

Received letter

Received letter

SMTP Extensions

The STMP protocol has been extended several times. After its first introduction in RFC 821, the protocol was updated in the standard RFC 2821. The main innovation was the command EHLO (extended hello). It came to be used instead of HELO. If the server has a new command, then after the greeting, the server will send an extended list of commands that have been defined in other RFCs.

One of the first extensions was the introduction of the command SIZEwhich allows you to define the maximum message size. This extension is described in RFC 1870. Team STARTTLSwhich allows establishing a TLS session between a client and a server, was introduced in RFC 3207. And in RFC 4954 the command was added AUTHwhich allows authentication on the STMP server.

The latest major update to the protocol so far is RFC 5321. It is this standard that is implemented on all modern SMTP servers.

Writing an SMTP server in C

To implement my server, I chose one of the first STMP standards – RFC 821, since the old standards have much less functionality that needs to be implemented than the new ones.

The standard specifies the minimum set of commands that the server must implement:

  • HELO – a command to initialize a session,

  • MAIL – a command to specify the return path,

  • RCPT – a command to specify a direct path,

  • DATA – a command to get the contents of a letter,

  • RSET – a command to reset a transaction,

  • NOOP – a command for doing nothing,

  • QUIT – a command to close a session.

I chose C as the programming language for the reasons stated at the beginning. I used Visual Studio as the development environment. But this does not mean that the server only runs on Windows. I did not use any libraries other than the standard C library. Let's move on to the code.

main function

In function main we load the configuration file and initialize the listening socket. Since I used Windows as the operating system, I used the library Winsock2.hfor working with sockets. An alternative for Linux is the library socket.h.

For each new connection to the server, we create a service thread with an entry point serve_connection. This function is where the main work with the client takes place.

int main(int argv, char* argc[]) {

    thrd_t new_thread;
    WSADATA wsa_data;

    if (argv < 2) {
        printf("Usage: %s config_file_path", argc[0]);
        return 1;
    }

    int status = WSAStartup(MAKEWORD(2, 2), &wsa_data);
    if (status != 0) {
        return 1;
    }

    config_parse_file(argc[1]);

    SOCKET sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

    struct sockaddr_in server_address;
    server_address.sin_family = AF_INET;
    server_address.sin_addr.s_addr = INADDR_ANY;
    server_address.sin_port = htons(config_get_listen_port());

    bind(sock, &server_address, sizeof(server_address));
    listen(sock, MAX_PENDING_CONNECTIONS);

    while (1) {
        SOCKET new_sock = accept(sock, NULL, NULL);
        if (new_sock == -1) {
            continue;
        }
        thrd_create(&new_thread, serve_connection, new_sock);
    }

	return 0;
}

Customer service

At the beginning of the function serve_connection we initialize the variables and structures we need and notify the client that we are ready to serve it. After that, we enter an infinite loop in which we accept commands from the client. Each command has its own handler. After the client has finished interacting with the server, we release the resources and close the socket. Let's go in order.

At the very beginning, the structures and the message exchange buffer are initialized, as well as the state variable is initialized.

void serve_connection(SOCKET sock) {
	int status = 0;
	enum SERVER_STATE current_state = DEFAULT;

	char* buffer = init_socket_buffer();

	struct smtp_request* smtp_request = init_smtp_request();
	struct server_session* server_session = init_server_session();

	send_response(sock, buffer, SERVICE_READY);

The most important structure here is smtp_requestwhich is an SMTP transaction:

struct smtp_request {
	struct email_address* mail_from;
	struct smtp_request_recipient* rcpt_to_list;
	char* data;
};

This structure contains three main transaction fields: the reverse path, the list of forward paths, and the message content.

Structure email_address is used quite often. It is a simple wrapper for two strings, a domain and a username, which can be quickly converted to a single string and back.

struct email_address {
	char* user;
	char* domain;
};

Structure smtp_request_recipient is a linked list, since a letter can have multiple recipients.

struct smtp_request_recipient {
	struct email_address* email_address;
	struct list list;
};

To implement the linked list, I used a trick from the Linux kernel. The kernel uses one structure to implement all linked lists list_head which contains a pointer to itself. This structure is embedded in another structure, which must be made a linked list. After that, using a special macro, we can access the parent structure list_head. Thus we have one implementation for a linked list for all our structures.

Further in the function serve_connection there is an infinite loop of command processing. I removed some command handlers so as not to stretch the code listing.

while (1) {

    status = get_message(sock, buffer);
	if (status == STATUS_ERROR) break;

    // Если соединение оборвалось – останавливаем цикл

	lower_buffer(buffer);

    // Другие обработчики команд

    if (buffer_has_command("helo ", buffer)) {
    	status = serve_helo(sock, buffer, server_session);
    	if (status == STATUS_OK) {
      		initialize_session(&smtp_request, &current_state);
    	}
    	continue;
    }

    if (buffer_has_command("mail from:", buffer)) {
        if (!validate_state(current_state, INITIALIZED)) {
    		send_response(sock, buffer, BAD_SEQUENCE);
    		continue;
    	}
    
    	status = serve_mail_from(sock, buffer, smtp_request);
    	if (status == STATUS_OK) { 
    		set_state(&current_state, HAS_MAIL_FROM);
    	}
    	continue;
    }

    // Другие обработчики команд
  
    if (buffer_has_command("data", buffer)) {
        if (!validate_state(current_state, HAS_RCPT_TO)) {
            send_response(sock, buffer, BAD_SEQUENCE);
      		continue;
        }
    
        status = serve_data(sock, buffer, smtp_request);
        if (status == STATUS_OK) {
        	process_smtp_request(smtp_request, server_session);
    
        	initialize_session(&smtp_request, &current_state);
        }
        continue;
    }

    // Другие обработчики команд

    // Если команда не обработана, значит мы её не поддерживаем
    // Возвращаем SYNTAX_ERROR

	send_response(sock, buffer, SYNTAX_ERROR);
}

Before starting an SMTP transaction we must receive a command HELO. Once we have received it, we initialize a new session with the client. This happens in the function initialize_session:

static enum STATUS initialize_session(struct smtp_request** smtp_request, enum SERVER_STATE* current_state) {
	clean_smtp_request(*smtp_request);
	*smtp_request = init_smtp_request();
	set_state(current_state, INITIALIZED);
	return STATUS_OK;
}

The server has four possible states that allow it to execute various commands. When initializing a new session, we move to the state INITIALIZED. After assigning the forward path and the reverse path, we change the state to HAS_MAIL_FROM And HAS_RCPT_TO respectively. You can reset the current state and start a new session using the command RSET.

State diagram

State diagram

After the session is initialized, we are ready to process other commands. Let's look at command processing MAIL. After checking the state, we run the command handler. In this case, it is the function serve_mail_fromEach command has its own handler, which is stored in a separate file.

The handlers look like this:

enum STATUS serve_mail_from(SOCKET sock, char* buffer, struct smtp_request* smtp_request) {
	if (validate_with_args(buffer, "mail from:", ":") == STATUS_NOT_OK) {
		send_response(sock, buffer, SYNTAX_ERROR_PARAMETERS);
		return STATUS_NOT_OK;
	}

	char* mail_from = get_value_from_buffer(buffer, ":");
	mail_from = trim_string(mail_from);

	if (validate_email_string(mail_from) == STATUS_NOT_OK) {
		send_response(sock, buffer, SYNTAX_ERROR_PARAMETERS);
		free(mail_from);
		return STATUS_NOT_OK;
	}

	struct email_address* mail_from_email_address = string_to_email_address(mail_from);
	free(mail_from);
	smtp_request_set_mail_from(smtp_request, mail_from_email_address);

	send_response(sock, buffer, ACTION_OK);

	return STATUS_OK;
}

The handler consists of three parts: basic checks (validate_with_args And validate_without_args in functions without arguments), specific to the check command and perform the action. In this case, we set the field mail_from for structure smtp_request.

Transaction processing begins immediately after the command completes. DATA. After processing, we reinitialize the session, returning to the state INITIALIZED.

Usually, after the client has sent the letter, he immediately ends the session with the command QUIT. After the session is completed, we exit the loop, then the resources are released and the processing thread is terminated:

	clean_smtp_request(smtp_request);
	clean_server_session(server_session);
	clean_socket_buffer(buffer);
	socket_cleanup(sock);
}

Transaction processing

Transaction processing occurs in the function process_smtp_request. This function processes the body of the message and adds a timestamp, server headers, and session headers.

After processing, we send the letter to all recipients by iterating through the linked list. list_parent – the same macro that looks like container_of from the Linux kernel. It gets the parent structure for the structure list.

void process_smtp_request(struct smtp_request* smtp_request, struct server_session* server_session) {
	struct mail* mail = init_mail();

	mail_parse_headers(mail, smtp_request->data);
	mail_add_timestamp(mail, smtp_request->mail_from);
	mail_add_server_headers(mail, smtp_request);
	mail_add_session_headers(mail, server_session);
	
	struct smtp_request_recipient* last_recipient = smtp_request->rcpt_to_list;
	char* final_text = build_mail(mail);
	while (1) {
		deliver_mail(final_text, last_recipient);
		if (last_recipient->list.prev == NULL) {
			break;
		}
		last_recipient = list_parent(last_recipient->list.prev, struct smtp_request_recipient, list);
	}
	free(final_text);

	clean_mail(mail);
}

The main structure here is mailwhich is a ready-made letter.

struct mail {
	char* text;
	char* timestamp;
	struct mail_header* headers_list;
};

Field text – this is the text of the letter without headers. Field timestamp – a mandatory timestamp that is placed at the beginning of each letter. headers_list– a linked list of headings:

struct mail_header {
	char* name;
	char* value;
	struct list list;
};

The serialization of headers is handled by the function mail_parse_headers . This function converts the headers into a linked list and separates them from the main text of the letter. This allows us to easily work with them, add new ones and delete old ones.

This advantage is used in the function mail_add_server_headers which adds headings From And Toif they are missing, and headings Return-Path And X-Original-To.

enum STATUS mail_add_server_headers(struct mail* mail, struct smtp_request* smtp_request) {
	char* mail_from_string = email_address_to_string(smtp_request->mail_from);
	mail_add_header_if_not_exists(mail, "From", mail_from_string);
  
	char* mail_from_string_with_arrows = email_address_string_add_arrows(mail_from_string);
	mail_replace_header(mail, "Return-Path", mail_from_string_with_arrows);
  
	free(mail_from_string);

	char* all_recipients = get_all_recipients(smtp_request);
	mail_add_header_if_not_exists(mail, "To", all_recipients);
	mail_replace_header(mail, "X-Original-To", all_recipients);
  
	free(all_recipients);

	return STATUS_OK;
}

After adding all the headers, we can assemble the letter. This happens in the function build_mail. In it, we add all the headers and the body of the letter to one buffer, after which the letter is ready for delivery. Here we also use the linked list iteration using the macro list_parent.

char* build_mail(struct mail* mail) {
	char* result = calloc(MAIL_SIZE, sizeof(char));
	add_to_buffer(result, mail->timestamp);

	struct mail_header* current_header = mail->headers_list;

	while (1) {
		flush_to_buffer(result, 2, "%s: %s\r\n", current_header->name, current_header->value);
		if (current_header->list.prev == NULL) {
			break;
		}
		current_header = list_parent(current_header->list.prev, struct mail_header, list);
	}

	flush_to_buffer(result, 1, "\r\n%s\r\n", mail->text);

	return result;
}

Next comes the delivery of the letter to the user's mailbox. The mailbox is a file to which a new letter is added. Delivery occurs in the function deliver_mailwhich is a wrapper for the function wirte_mail_to_file.

Function get_full_mail_path adds to the recipient's name the path to the mail folder specified in the configuration.

static enum STATUS deliver_mail(char* mail, struct smtp_request_recipient* smtp_request_recipient) {
	write_mail_to_file(smtp_request_recipient->email_address->user, mail);

	return STATUS_OK;
}

static enum STATUS write_mail_to_file(char* recipient, char* buffer) {

	char* full_path = get_full_mail_path(recipient);

	FILE* file_ptr = fopen(full_path, "ab");
	if (file_ptr == NULL) {
		return STATUS_ERROR;
	}

	fprintf(file_ptr, "%s", buffer);
	fclose(file_ptr);
	free(full_path);

	return STATUS_OK;
}

Configuration

In order to avoid recompiling the server every time we wanted to change the configuration, I added a simple module for processing configuration files.

The configuration is represented by a global structure config:

struct config {
	char* domain;
	char* mail_path;
	char* listen_port;
	struct user* users_list;
	char* hostname;
} config;

Before starting the server, we read and process the configuration file that was passed via the command line argument. The processing occurs in the function config_parse_file:

void config_parse_file(char* filename) {
	char* buffer = read_config_file(filename);

	config_parse_buffer(buffer);

	free(buffer);
}

void config_parse_buffer(char* buffer) {
	config.domain = get_config_param(buffer, "domain", "domain.local");
	config.mail_path = get_config_param(buffer, "mail_path", "./");
	config.listen_port = get_config_param(buffer, "listen_port", "25");
	config.users_list = get_config_users(buffer, "users", "");
	config.hostname = get_config_hostname();
}

Here we find the corresponding fields in the file and assign them to the structure fields. configIf we do not find the required field in the file, then we set the default value.

To access the structure fields, I made separate functions:

char* config_get_domain();
char* config_get_mail_path();
int config_get_listen_port();
struct user* config_get_users();
char* config_get_hostname();

This is what an example configuration file looks like:

domain = domain.local
mail_path = E:/
listen_port = 1025
users = john, carl, ann

Sending a letter via Mutt

To test the server, I used the Mutt SMTP client. It allows you to send emails without going into the details of the protocol. Before sending, you need to configure some settings. We need to set our email address, the server IP address, and disable SMTP protocol extensions.

The Mutt configuration file is stored at ~/.muttrc. To set the parameters we need, we must add the following lines to the file:

set from = john@domain.local
set smtp_url = smtp://172.18.160.1:1025
set ssl_starttls = no
set ssl_force_tls = no

Now we can send letters to our server. Let's write a test letter:

Test letter

Test letter

After sending the letter, in the user's mailbox carl we see the following letter:

Received letter

Received letter

Conclusion

This server supports all commands from the RFC 821 standard and is fully compatible with modern SMTP clients. Perhaps in the future I will add commands from new standards and write a new article about it.

This is my first relatively large project in C. As I expected, I managed to improve my skills in this programming language. I also significantly deepened my knowledge of the SMTP protocol. I left the source code of the project on GitHub.

Project GitHub: https://github.com/Reedus0/MailServer

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *