We connect to Excel GPU and speed up Excel 300 times

I came across an optimization problem, and since I’m a big fan of Excel, the choice of tool was quick. The only bad thing: Excel is wildly slow. So, one iteration took at least 35 minutes, and it was planned to make 1275 such iterations (at least)!

The purpose of this small project is to speed up the execution of VBA scripts using all the hardware available to me: GPU and CPU. Well, to the heap, since my library, multitasking was implemented.

For those who like to read only the code and do not like “spreading thoughts along the tree”, the code is hereinstaller herearchived library and examples (Excel/VBscript) here. Examples are also included in the installation (“demo” folder).

Forewarning

Requirements:

  • Excel (it is possible without it, see about abnormal programming at the very end)

  • .net framework v4.0.

  • Windows/System32/ should contain opencl.dll.

Structure

The internals of the library consist of two non-overlapping parts: the configuration of the available equipment (only information, nothing can be changed there) and, in fact, the programming of the devices found.

Configuration

The configuration simply dumps all available information about devices into the table, “down to the very last rotten one.” Logically, OpenCl divides the computer into platforms, within which there are devices. So, on my home computer, only one platform was found with a CPU for 4 processes (Pentium 4417U 2.30GHz) and a 12-core GPU (Intel HD Graphics 610). But on the wife’s computer, as many as 2 platforms were found: the first, with the i3-7100U and 23-core Intel HD Graphics 620, and the second, again with the i3-7100U and the 5-core Hainan GPU. Interestingly, the internet says that the Intel HD Graphics 620 comes with 24 cores. Either OpenCl was missing, or the kernel fell off.

Programming was implemented on two examples: performance evaluation and parallel code execution.

Performance

Performance was evaluated in two ways:

1. Multiplication of large matrices.

2. Code borrowed from CodeProject (“How to Use Your GPU in .NET“).

The time taken to multiply two 2000 by 2000 matrices (double precision calculations) is shown in the figure below. Pay attention to the logarithmic scale!

The time taken to multiply two 2000 by 2000 matrices.
The time taken to multiply two 2000 by 2000 matrices.

The designations CPU and GPU correspond to the use of OpenCl on the CPU and GPU. Native CPU – the case of using the simplest C# program for matrix multiplication. Well, VBA is also in Africa VBA. In addition to measuring time, the results of OpenCl calculations were compared with the results of VBA (everyone agreed!).

A rather unexpected discovery for me was that OpenCl on all CPUs calculates 8 times faster than C # on a single processor. Since I have 2 processors with 4 threads, one would expect a fourfold reduction in execution time. Apparently, it plays a significant role here that the processor driver that compiles the CL code compiles directly to machine code, and does not use an interpreter, like C#.

Well, the results of the performance test in full:

VBA calculation time:

3.1

minutes.

OpenCl on CPU

73

times faster than VBA.

OpenCl on GPU

327

times faster than VBA.

OpenCl on GPU

4.5

times faster than the CPU.

OpenCl on CPU

8.3

times faster than C#, CPU.

C#, CPU

8.9

times faster than VBA.

The next performance test was borrowed from CodeProject. For my home Pentium 4417U (2 x 2.30 GHz) it gave me the following performance results in Gflops:

GFlops, single

GFlops, double

CPU

18.1

9.1

GPU

358.5

89.7

Asynchrony

As everyone probably knows, Excel is painfully single-tasking. To run two tasks at the same time, you need to make quite a lot of effort. Everything described above was launched by the ExecuteSync command. However, in addition to the ExecuteSync function, there are also the ExecuteAsync and ExecuteBackground functions. The difference between the first function and the second is that ExecuteAsync uses a callback, while the second returns information about the completion of execution in the ExecutionCompleted (True / False) property, which must be polled from time to time.

In principle, the ExecuteAsync function was created first, but as we tested it, it was noticed that it was quite difficult to work with callback:

Excel does not like and collapses:

Therefore, in the example I use ExecuteBackground. The result of working in asynchronous mode is simply stunning! The progress bar runs, the processes are executed in parallel on the CPU and GPU, they are completed one by one, and there are 4 GPU processes for one completed CPU process, as predicted by the matrix multiplication performance calculations.

Well, the promised abnormal programming (and everything written before that was normal programming?).

As it turned out, Excel is not really needed! VBscript also supports COM, but with its own limitations: the lack of typing leads to the fact that arrays must be passed through object or ArrayList. However, the configuration works as is, even without additional gestures:

Device configuration via VBscript.
Device configuration via VBscript.

And finally, all in one place:

GitHub

installation

Everything in the archive.

Examples in the “demo” folder.