we analyze the structure of the letter in eml format

Hi, my name is Elena Tikhomirova, I work as a systems analyst at the Sfera Platform, a development of T1 Holding. If you want to automate the processing of emails, but don’t know what exactly a letter consists of, this article is for you. I will describe the general structure and consider examples of its implementation depending on the content of the message.

General information

The email contains the following information:

  • topic;

  • addresses and names of senders and recipients;

  • text of the letter;

  • attached files;

  • various service information.

The familiar appearance of a letter in a mail client differs from its “code” as much as the display of a web page in a browser differs from an HTML file. In addition to the “payload”, the “code” of an email also contains technical information, which makes it possible to forward the letter and interpret its structure and content by mail clients.

The standard format for emails is MIME. This format or type is letterswhile one of the formats filein which the letter is saved, is .eml. If you open such a file in a mail client, you will see a regular email, as if you selected it in the Inbox folder. But if you open .eml in a text editor, the screen will display unusual content: a long list of parameters in English, which have some value in encrypted form indicated after a colon. These are the headers of the letter, some of which “say” who the sender is, when it was sent, what the subject is. The rest of the headers are known to specialists and are not related to the topic of this article. The headers are followed by the body of the letter, also encrypted and divided into parts by “borders”. They can contain text in one of the text formats and attached files. My article is devoted to these parts, their delimitation and grouping.

To practice as you read, you can download the letter in .eml format and try to find the specified parts in it. Instructions on how to do this are given at the end of the article.

Here are two diagrams of the structure of a MIME email:

Letter structure multipart/mixed

Letter structure multipart/mixed

Structure of MIME document objects

Structure of MIME document objects

Now I will give my interpretation of how the structure of a letter is filled depending on its content.

Parts of a letter

The following structural elements are distinguished in an email:

All parts of the letter except the envelope are optional: as with regular mail, only the delivery information is needed, while the envelope content can be empty. However, most email clients always transmit the body part of the letter (2.1), even if empty if the sender has not written a single character. In addition, unlike regular mail, the body of an email can contain not only a single message from the sender, but also the entire correspondence on a specific topic. In this case, the body part of the letter, regardless of the format (HTML or plaintext), includes the entire chain of messages in reverse chronological order. For such quoted Messages also have headers, but there are a minimum of them: sender, recipients, date, subject.

Not only text parts and the envelope have headers, but also attached files and embedded images – the metadata of these files is transmitted.

Attachments (2.2) contain files added by the last sender. One part is allocated per file.

Screenshots, logos, photographs and other images inserted into the text are transmitted as embedded images (2.3). The image file is “formatted” in a special section for embedded images (inline), one section per file. In the text of the letter, at the insertion point, there is a link to the file of the desired image. Thanks to this, the mail client displays the text in which the images are inserted in the correct places, and not the text separately, the images separately.

Another option for transmitting graphic files in an email is possible. This approach is most often used in email newsletters: the contents of a small file are inserted into the attribute src tag <img> in the HTML version of the letter. In the plaintext version (if available), a link to the name of this file from the attribute is indicated alt tag <img>In this case, a separate inline part for the image is not required.

Containers

Above we looked at parts of the letter that have specific content (paragraphs 2.1-2.3). There are also parts of the letter that act as additional envelopes – containers (unofficial term). Like folders in a file storage, they help logically organize other parts of the body and can themselves be nested inside each other, like nesting dolls.

The name of such a part of the letter as a container indicates the type multipart. For example, Content-type: multipart/alternative. The type of simpler parts (paragraphs 2.1-2.3) is usually related to their format:

Content-type: text/plain
Content-type: video/mpeg  
Content-type: image/jpeg  

When a mail client sorts the contents of a letter into containers:

  • If both versions of the text are transmitted, they are placed in a container. multipart/alternative. If the letter is sent only in HTML or plaintext version and does not contain embedded images or attached files, then there are no containers and the letter is attached directly to the root (envelope).

  • Attached files are always placed in a container. multipart/mixed. As with all parts, these files have a header. Content‑type with the MIME content type, as well as an additional header specifying that this content is attached: Content‑Disposition: attachment. In addition to files, in the same container multipart/mixed add a container with text – multipart/alternative. Special case: if an .eml file is attached to the letter as an attachment, it is located as a separate part (location type attachment) also in multipart/mixedwhile containing its own headers, containers and parts nested within them.

  • Images embedded in text are “folded” into a container multipart/related – one part for each picture, each part is indicated Content‑type and the meaning of the title Content‑Disposition: inline. The text of the letter is also enclosed in this container (which, in turn, is enclosed in the container multipart/alternative). If the letter also contains attached files, then multipart/related with all the contents (text and pictures from this text) are put into multipart/mixed.

In other words, the filling of containers can be schematically described as follows:

Examples of letter section headings

Inline images — images embedded in text and their titles

An image inserted into the text of a letter is transmitted in one container multipart/related with the text it is inserted into. All images embedded in the text are transmitted, regardless of whether it is a new message in the correspondence or a quoted one.

Headlines:

Content-Type: image/png; name="image001.png"
Content-Description: image001.png
Content-Disposition: inline; filename="image001.png"; size=3621;
creation-date="Tue, 21 Nov 2023 06:27:15 GMT";
modification-date="Tue, 21 Nov 2023 06:27:15 GMT"
Content-ID: <image001.png@01DA1C5C.E9556E30>
Content-Transfer-Encoding: base64

There is also a link to this file in the plaintext version section:

[cid:image001.png@01DA1C5C.E9556E30]

and in HTML:

<img naturalheight=3D"123" naturalwidth=3D"102" width=3D"102" height=3D"123=" id=3D"=D0=E8=F1=F3=ED=EE=EA_x0020_1" style=3D"width: 1.0625in; height: 1.=2847in; user-select: none;" src=3D"cid:image001.png@01DA1C5C.E9556E30">

Attachments — files attached to the letter and their headers

Attachments are transferred in a container multipart/mixedwhich may include:

  • container multipart/related (contains the text of the letter (plaintext, HTML), as well as embedded (inline) images);

  • the attached files themselves (one attachment part for each file).

Attached files are transferred only with the last reply in the chain. That is, when sending a reply letter, the user chooses which files to attach: from their own or from those already attached to the letter they are replying to. All attached files for the entire history of the correspondence are NOT transferred.

The Content-Type header of the part with the attached file indicates the format of this file. The format can be different: application/octet-streamimage/jpeg, application/msword, text/asciidoc, text/json, text/plain, text/html, etc.

As we can see from the list, not only the text of the letter, but also the attached file can be in HTML or plaintext format. In this case, the main differences between such parts of the letter (body parts) as the text and the attached file are preserved:

  1. Attached file Always is located inside the container multipart/mixed. This container appears only when there are attached files. However, the text of the letter can also be immediately inside the container. multipart/mixedwithout “intermediaries”. This happens Sometimes: if the text of the letter is transmitted only in one format, and not in both (text/plain or text/html) and if there are no embedded images, which happens relatively rarely. More often, the text of the letter is first embedded in the container multipart/alternativethen in multipart/relatedand only then in multipart/mixed.

  2. The headers and their attributes of the attached file and the text in the same format are different. The file:

    Content-Type: text/plain; name="?.txt"
    Content-Description: ?.txt
    Content-Disposition: attachment; filename="?"; size=?;creation-date=?; modification-date=?

    The text of the letter:

    Content-Type: text/plain; charset="?"

    or

    Content-Type: text/html; charset="?"

More examples of attached file titles:

  1. Image:

    Content-Type: image/png; name="Notifications architecture.png"
    Content-Description: Notifications architecture.png
    Content-Disposition: attachment; filename="Notifications architecture.png";
    size=105312; creation-date="Mon, 27 May 2024 14:12:44 GMT";
    modification-date="Mon, 27 May 2024 14:12:44 GMT"
    Content-Transfer-Encoding: base64
  2. Letter in .eml format:

    Content-Type: application/octet-stream; name="original_msg (1).eml"
    Content-Description: original_msg (1).eml
    Content-Disposition: attachment; filename="original_msg (1).eml"; size=14108;
    creation-date="Mon, 27 May 2024 18:45:26 GMT";
    modification-date="Mon, 27 May 2024 18:45:26 GMT"
    Content-Transfer-Encoding: base64

This concludes my brief overview. I hope you have managed to better understand the nested structure of an email and to lay it out not only on the shelves, but also as a hierarchical tree. In conclusion, I would like to remind you that you can independently see what the code of an email received from a colleague or from an email newsletter consists of. To do this, read the following instructions.

How to download a letter in eml format

The file in .eml format contains the full original letter, including all headers and attachments. In this format, you can download a letter from some mail clients, such as Gmail or Thunderbird. Everything is simple there: viewing the original and downloading are in the letter options. You just need to open it and click “More”.

Outlook, popular in the corporate environment, has its limitations, but it also allows you to download a letter in the required format. From the Outlook web interface (OWA):

  1. Open your mailbox in your browser.

  2. Click on “Compose” and select “Message” or open a draft message.

  3. Drag the message you want to save in .eml format from the list of messages to the open draft of the new message. The transferred message will appear in the new one as an attached file.

  4. Click the down arrow on the attachment and select “Download”. Save the file with the extension “.eml”.

From the desktop Outlook application under Mac OS, you can drag the letter to the desired local folder, and it will be in .eml format. This method does not work under Windows: the letter is saved in .msg format.

Sources

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *