How does the HLS protocol work?

For several weeks now I have been developing server support for short videos for a company Bluesky.

The main purpose of this feature is to provide streaming of short (maximum 90 seconds) videos. The display should be free and not too expensive for us.

To accommodate these constraints, we tried to use a content delivery network (CDN) that could bear the brunt of supporting the bandwidth that would support on-demand streaming video.

While this CDN is a full-featured product, we didn't want to be too vendor-specific, so we decided to make some improvements to our streaming platform to expand the range of services it offers and get creative with the implementation of streaming video protocols.

Here are some things we wanted to provide that are not provided out of the box:

  • Keep track of views, user sessions, and how long a session lasts. We need all of this to get more accurate feedback on how the video is performing.

  • Provide dynamic support for closed captions, and make this feature flexible enough to be automated later.

  • Store a transcoded version of the original videos somewhere so that we have a long-lasting “source of truth” for the video content that we can refer to when needed.

  • At the end of each streaming video, attach a “trailer” to further branch the videos into 3-second fragments in the style of TikTok.

In this post, we'll focus on the aforementioned HLS-related features, namely view time tracking, closed captions, and trailers.

HLS is just a set of text files

HTTP Live Streaming (HLS) is standarddeveloped by Apple in 2009. This standard provides adaptive bitrate for live broadcasts and streaming video using VOD (video on demand) technology.

For the purpose of this post, only HLS VOD streaming will be explained.

A player implementing the HLS protocol can automatically adjust the quality of the video being transmitted based on network conditions. In addition, a server implementing the HLS protocol must provide one or more variants of the streaming media, adapting to changing network conditions. This approach allows for a smooth reduction in the quality of the streaming transmission without interrupting video playback.

Here's how HLS does this. It creates a series of “playlist” files, written in plain text (.m3u8). From these files, the player knows what bitrate and resolution options the server is providing. This way, the player can “decide” which variant to stream.

HLS distinguishes between two types of “playlist” files: Master Playlists and Media Playlists.

Master Playlists

The Master Playlist is the first file selected by your video player. It contains a series of variants that point to child Media Playlists. It also describes the approximate bitrate of the source files of a particular variant, as well as the codecs and resolutions used in those sources.

$ curl https://my.video.host.com/video_15/playlist.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360
360p/video.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720
720p/video.m3u8

In the above file, you need to pay attention, first of all, to the parameters RESOLUTION and on links {res}/video.m3u8.

Typically, the media player will try the lowest resolution first, then move on to higher and higher resolutions as the network connection between you and the server improves.

The links in this file are pointers to Media Playlists, usually specified as relative paths from the Master Playlist. They are designed so that if we wanted to take 720p Media Playlistthen you would have followed the link https://my.video.host.com/video_15/720p/video.m3u8.

The Master Playlist can also contain multi-track audio directives (for closed captioning), but for now let's take a closer look at what the Media Playlist is.

Media Playlists

A Media Playlist is another type of file, written in plain text. It is where your video player takes two key sources of information. First, it is a list of media segments, Segments (these are video files with the extension .ts), and secondly, the headers of each segment, which tell the player in what environment such media is executed.

$ curl https://my.video.host.com/video_15/720p/video.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
video0.ts
#EXTINF:4.000,
video1.ts
#EXTINF:4.000,
video2.ts
#EXTINF:4.000,
video3.ts
#EXTINF:4.000,
video4.ts
#EXTINF:2.800,
video5.ts

This Media Playlist describes a video that is 22.8 seconds long (five 4-second segments + one 2.8-second segment).

This playlist describes videos in the format VOD. So we know that this playlist contains all the media information the player needs.

IN TARGETDURATION The maximum duration of each Segment is communicated so the player knows how many segments to buffer in advance. During the live transmission, the player is also told how often to update the playlist file to detect new segments.

Finally, the headlines EXTINF for each Segment, the duration of the next .ts Segment file is specified. In turn, the player knows from the relative paths video#.ts where to load specific media files.

Where are the specific media files located?

At this point, the video player has loaded two .m3u8 playlist files and received a lot of metadata about how to play the video, but it hasn't loaded any actual media files yet.

It is the .ts files referenced in the Media Playlist that actually store the media files we are interested in. So if we want to manage playlists but allow the CDN to serve us the media files themselves, we can't just forward those video#.ts requests to our CDN.

.ts files are short media fragments encoded in the MPEG-2 Transport Stream format that can contain both video and audio along with the video.

Tracking views

Here's what you can use to track views in our HLS streams: As you know, any video player must first load the Master Playlist.

When a user requests a Master Playlist, you can dynamically modify the results by providing a SessionID in each response. This allows you to track user sessions without using cookies or headers:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360
360p/video.m3u8?session_id=12345
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720
720p/video.m3u8?session_id=12345

Now, when a video player selects Media Playlists, it will also include a query string that will allow us to identify the streaming session, ensure that we don't count any double views of the video, and keep track of which parts of the video were downloaded in a given session.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
video0.ts?session_id=12345&duration=4
#EXTINF:4.000,
video1.ts?session_id=12345&duration=4
#EXTINF:4.000,
video2.ts?session_id=12345&duration=4
#EXTINF:4.000,
video3.ts?session_id=12345&duration=4
#EXTINF:4.000,
video4.ts?session_id=12345&duration=4
#EXTINF:2.800,
video5.ts?session_id=12345&duration=2.8

Finally, once the video player has fetched the Segment media files, we can measure the Segment view before forwarding it to the CDN with a 302 code. This way we can see how much Segment was downloaded during that session (in video seconds), as well as which Segments were downloaded.

This method has its limitations. For example, if the media player has loaded a segment, it does not mean that it has shown this fragment to the user. But this is the maximum we can do without a properly equipped media player.

Adding subtitles

Subtitles are included in the Master Playlist as a variant, and then referenced in each video variant so the player knows where to load the subtitles from.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="en_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="en",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/en.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360,SUBTITLES="subs"
360p/video.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720,SUBTITLES="subs"
720p/video.m3u8

Just like with Media Playlists, we need a Media Playlist file to keep track of the subtitles. From this source, the player knows where to load the source files from, and what fraction of the entire stream's length they contain.

$ curl https://my.video.host.com/video_15/subtitles/en.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:22.8
#EXTINF:22.800,
en.vtt

In our case, since we are only serving one short video, we can provide a single Segment pointing to the subtitle file. WebVTTand this file corresponds to the entire length of the video.

If you look into the en.vtt file, you will see something like this:

$ curl https://my.video.host.com/video_15/subtitles/en.vtt

WEBVTT

00:00.000 --> 00:02.000
According to all known laws
of aviation,

00:02.000 --> 00:04.000
there is no way a bee
should be able to fly.

00:04.000 --> 00:06.000
Its wings are too small to get
its fat little body off the ground.

...

The media player is capable of reading files in WebVTT format and displaying each line of subtitles to the user exactly when needed.

When working with longer videos, you may want to further split your VTT files into smaller segments and update the subtitles in the Media Playlist accordingly.

To provide subtitles in different versions and languages, simply add additional lines EXT-X-MEDIA:TYPE=SUBTITLES into the Master Playlist file and set the correct ones NAME, LANGUAGE (if they differ), as well as the URIs of the definitions of additional subtitle options.

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="en_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="en",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/en.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="fr_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="fr",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/fr.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="ja_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="ja",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/ja.m3u8"

Attaching subtitles

If you need to promote a brand (and in other cases, such as adding advertising), try adding video segments to the playlist that would change the content of the video or playlist itself, without the need to separately attach new content and re-encode the entire source file.

Luckily, when working with HLS, you can easily insert segments into the Media Playlist using this neat trick:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
video0.ts
#EXTINF:4.000,
video1.ts
#EXTINF:4.000,
video2.ts
#EXTINF:4.000,
video3.ts
#EXTINF:4.000,
video4.ts
#EXTINF:2.800,
video5.ts
#EXT-X-DISCONTINUITY
#EXTINF:3.337,
trailer0.ts
#EXTINF:1.201,
trailer1.ts
#EXTINF:1.301,
trailer2.ts
#EXT-X-ENDLIST

In this Media Playlist we use the title HLS EXT-X-DISCONTINUITYwhich tells the media player that the following segments may have different bitrates, resolutions, and aspect ratios.

If we provide a title DISCONTINUITYthen we can add other segments at this point along with the usual ones, and they will be able to point to other media source split into .ts files.

As a reminder, HLS in this case allows both absolute and relative paths, so we could either provide the full URL for these trailer#.ts files here, or virtually route them so that they retain the path context to the currently viewed video.

Please note: here not required provide a title DISCONTINUITYand we can name the trailer files according to the principle video{6-8}.tsif we want. But for clarity and for the player to function correctly, the title DISCONTINUITY It is best used in cases where the bitrate and resolution of your trailer do not match those of other segments of the video.

When the video player starts playing a video, it will follow the video5.ts To trailer0.ts without any hesitation, so it will seem as if the trailer is part of the original video.

With this approach, you can dynamically change the contents of the trailer for all video, actively cache .ts Segment trailer files to improve performance, and avoid embedding the trailer at the end of each video source file.

Conclusion

So, we have a video streaming service that we can use to track views, see how long sessions last, dynamically support closed captions, and embed promotional trailers into videos to help grow the platform.

The HLS protocol is not incredibly complex. The vast majority of its information is contained in human-readable files written in plain text. They are easy to study without preparation and track how they are used in production.

When I started this project I knew next to nothing about the protocol itself, but I was able to write a few .m3u8 files and then dig into them to figure out how exactly the protocol worked. From there I was able to write my own implementation of an HLS server that could do all the things needed to stream video for Bluesky.

To learn more about HLS, you can see Here official RFC document. It describes all the possibilities discussed above, and more.

I hope this post inspires you to explore other protocols that you work with on a daily basis. Play around with them, try downloading files that your browser usually interprets for you, and see for yourself how simple these systems are that at first glance seem so complex.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *