overpay twice or go to the “installation”?

Our test NVIDIA A4000 almost confirmed that it is capable of encoding up to 16 independent FullHD video streams in H264 format. Can you multiply the performance with a professional graphics card that costs twice as much? Let’s try to check.

In our second article about encoding (with A4000 test) we missed that the video stream can be of a higher resolution, so it’s worth testing the encoding of files in 4K format. For the sake of completeness, we will also compare encoding on solutions from NVIDIA with an integrated GPU from Intel. Some professionals believe that it is enough to build the same FFmpeg with QuickSync enabled and an external video card will not be needed. Let’s check this statement.

We will not describe in detail the testing process for video cards from NVIDIA and why we need FFmpeg, since information about this is in previous articles (first and second parts). Let’s focus on new results and useful life hacks.

A4000 vs A5000

We use the same test bench from available HOSTKEY serversbut install a video card in it NVIDIA A5000 with more encoders, 24 GB VRAM and higher power consumption.

To begin with, let’s check its work on the number of threads, which turned out to be the limit for the A4000 according to the results of the previous test:

14 threads

gpu

pwr

gtemp

mtemp

sm

meme

enc

dec

mclk

pclk

fb

bar1

idx

W

C

C

%

%

%

%

MHz

MHz

MB

MB

0

97

47

92

3

100

0

7600

1920

3502

33

frame=1015 fps=31 q=28.0 Lsize= 9056kB time=00:00:33.80 bitrate=2194.8kbits/s speed=1.02x

Marvelous! We got figures comparable to those of the A4000. Despite the higher frequency of the chip, the larger amount of video memory used and the higher power consumption, the A5000 managed to encode only 14 streams and lost on the fifteenth. This fiasco once again proves that professional video adapters are designed for other purposes.

Turn on 4K

Now let’s try to start broadcasting the stream with a resolution of 3840×2160 (aka 4K), since there is such a version rabbit file. Encoding by the forces of the central processor alone choked on one thread, when the amount of data increased by a factor of:

frame= 2902 fps=27 q=29.0 size=104448kB time=00:01:33.56 bitrate=9144.7kbits/s dup=436 drop=0 speed=0.878x

What are the capabilities of the GPU (remember, the results of the A4000 and A5000 are comparable)? These are 3 streams.

gpu

pwr

gtemp

mtemp

sm

meme

enc

dec

mclk

pclk

fb

bar1

idx

W

C

C

%

%

%

%

MHz

MHz

MB

MB

0

96

46

100

3

96

0

7600

1920

1112

nine

As you can see, in terms of power consumption and loading of encoding units, the video chip obviously does not work in the enhanced comfort mode, although only about 1 GB of video memory is consumed in this case.

The FFmpeg output confirms that the video card is coping:

frame= 1465 fps=33 q=35.0 Lsize=12584kB time=00:00:48.80 bitrate=2112.4kbits/s dup=159 drop=0 speed=1.09x

But the adapter does not digest 4 threads. Although the loading of iron remains approximately at the same values, frame drawdowns begin:

frame= 614 fps= 26 q=35.0 Lsize=4978kB time=00:00:20.43 bitrate=1995.6kbits/s speed=0.858x

Building FFmpeg with QuickSync support

If you believe developer’s statementQuickSync technology should “using the special media processing capabilities of Intel® Graphics Technologies to accelerate decoding and encoding, allow the processor to perform other tasks in parallel and improve system performance.”

For the tests, we needed a suitable Intel processor (we found a machine with a Core i9-9900K CPU @ 3.60GHz) and the FFmpeg utility built with Quick Sync support. There were no problems with the first one (a chip older than the 6th generation and the presence of a GPU in it is enough, which is not difficult check), but building FFmpeg for the test Ubuntu 20.04 evoked persistent associations with the practical development of the Kama Sutra. In order not to force you to waste precious time, we will describe how we managed to solve the problem.

Since the packages in the repositories are broken, the first step is to build and install the gmmlib and libva libraries on the system, as well as the latest versions of the Intel media driver and Media SDK. To do this, create a GIT folder in the home directory, go into it and execute the following commands in sequence (if any dependencies are missing, install them from the repository; we recommend doing sudo apt install autoconf automake build-essential cmake pkg-config):

git clone https://github.com/intel/gmmlib.git && cd gmmlib
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install

git clone https://github.com/intel/libva.git && cd libva
./autogen.sh --prefix=/usr --libdir=/usr/lib/x86_64-linux-gnu 
make -j8
sudo make install

git clone https://github.com/intel/media-driver.git && cd media-driver
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install

git clone https://github.com/Intel-Media-SDK/MediaSDK.git && cd MediaSDK
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install

Then you need to build FFmpeg with a few magic commands:

git clone https://github.com/ffmpeg/ffmpeg
cd ffmpeg
./configure --enable-libmfx --enable-vaapi --enable-opencl --enable-libvorbis --enable-libvpx --enable-libdrm --enable-gpl --cpu=native --enable-libfdk-aac --enable-libx264 --enable-libx265 --extra-libs=-lpthread --enable-nonfree
make -j8
sudo make install

It is worth making sure that we have support for Quick Sync:

ffmpeg -decoders|grep qsv

The output of the command should be something like this:

V....D av1_qsv              AV1 video (Intel Quick Sync Video acceleration) (codec av1)
V....D h264_qsv             H264 video (Intel Quick Sync Video acceleration) (codec h264)
V....D hevc_qsv             HEVC video (Intel Quick Sync Video acceleration) (codec hevc)
V....D mjpeg_qsv            MJPEG video (Intel Quick Sync Video acceleration) (codec mjpeg)
V....D mpeg2_qsv            MPEG2VIDEO video (Intel Quick Sync Video acceleration) (codec mpeg2video)
V....D vc1_qsv              VC1 video (Intel Quick Sync Video acceleration) (codec vc1)
V....D vp8_qsv              VP8 video (Intel Quick Sync Video acceleration) (codec vp8)
V....D vp9_qsv              VP9 video (Intel Quick Sync Video acceleration) (codec vp9)

Hooray! Everything is ready for the tests.

Encoding testing with Quick Sync

First, let’s check how the processor copes with video encoding in FullHD without Quick Sync: it can withstand a maximum of 4 threads, in which all cores are loaded at 100%

frame= 1461 fps= 33 q=29.0 size=24064kB time=00:00:46.33 bitrate=4254.7kbits/s speed=1.05x

The processor can no longer master the fifth thread, so you can safely proceed to the test with Quick Sync. In the script from the previous article, for this you will need to replace the encoder with h264_qsv, and it will take the following form (you can read more about using QuickSync with FFmpeg here):

#!/bin/bash                                                                                                          
for (( i=0; i<$1; i++ )) do
   ffmpeg -i http://78.0.75.110:5454/ -an -vcodec h264_qsv -y Output-File-$i.mp4 &               
done

We immediately check on 6 threads (+2 to the test on a clean CPU):

frame=291 fps=55 q=29.0 size=1280kB time=00:00:10.13 bitrate=1034.8kbits/s dup=2 drop=0 speed=1.93x

The difference is obvious: the processor load does not exceed 50%, and the available computing resources allow us to predict 11-12 final flows.

We put 11 streams:

frame=157 fps=30 q=38.0 Lsize=628kB time=00:00:05.69 bitrate=903.0kbits/s dup=2 drop=0 speed=1.09x

The processor load increases slightly, but the GPU is already approaching its limit. The twelfth stream drops the bitrate and processing speed to 24 – 28 frames.

Now check the streams in 4K. Unlike AMD, our Intel processor can easily process a single thread at this resolution and without hardware acceleration:

frame=655 fps=31 q=-1.0 Lsize=30637kB time=00:00:21.73 bitrate=11547.9kbits/s speed=1.03x

Unfortunately, he can’t do more. With Quick Sync enabled, the test computer was able to pull three streams at 4K resolution:

frame= 509 fps=31 q=33.0 Lsize=8010kB time=00:00:17.42 bitrate=3764.7kbits/s dup=2 drop=0 speed=1.07x

He saved only on the fourth, but the Nvidia A5000 video card withstood the same amount.

Unfortunately, the solution also has its drawbacks. When using the BMC module (for example, when controlling a machine through IPMI), you will not get access to all the hardware acceleration features, even if the CPU GPU is detected in the system. You will have to choose between the convenience of remote control or getting all the benefits of using Quick Sync.

Results

You can draw your own conclusions. We only note that for video encoding, the difference in the power of video cards is not always determined by their price, and for solving some problems it is worth paying attention to specialized technologies inside the central processors. We also used H264 for testing, but HEVC (H265) or VP1 codecs should in theory give better results, especially at 4K resolutions. If you independently conduct similar tests with the first one (VP1 is currently presented in hardware and in bulk only for decoding), share the results in the comments.

___________

How much money?

The cost of the experiments described above is easy to measure: use our calculator-configurator on this page.

For example, in the simplest configuration, it is as follows:

  • a car with an A4000 will cost 22,000 rubles, 12 streams – 1,800 rubles per stream per month;

  • a car with A5000 will cost 31,000 rubles, 14 streams – 2214 rubles per stream per month;

  • i9-9900K server with QuickSync (QSV) will cost 5000-6000r, 11 threads, 450r per thread.

    Servers for this need to be assembled on motherboards without remote control, which we can do. Contact us!

    By the way, all HOSTKEY servers are provided with our full IPMI server remote management module and server and API control panel. We talked about the device of the latter in this article.

Similar Posts

Leave a Reply