ISA does not forgive mistakes

4 min


–X86 is a historical misunderstanding, – the master and in 80 did not lose polemic enthusiasm.
“Actually, she owns 95% of the server market,” I replied sluggishly. I didn’t want to argue on a hundred times hackneyed topic
“And I was already confused in these prefixes,” the academician did not relent. – 15 bytes per instruction, this is unthinkable!
– Well, not ice, of course. But who is better?
– Yes, anyone, at least ARM-a.
“I still don’t understand.” Additions with multiplications should be?
– Should.
– And shifts with logical operations?
– Yes.
– And downloads with saves too. What is the difference in how they are called and encoded?

As usual, I realized the correctness of the teacher much later. When I sat down to write a decoder of commands in a dreary winter evening in order to somehow amuse myself. I managed to portray a simple decoder for ARM on VHDL (and I know it so-so) in a couple of days. True, I repent, I had crib. 🙂
image

For X86 it was not possible for a week, or for two, or for a month … Even for the basic set.

The difference here is not only in RISC (Reduced Instruction Set Computing) for ARM and CISC (Complex Instruction Set) for X86. The difference is rather in the path of historical development. More than 40 years ago, and it all began in 1978, the X86 ISA (Instruction Set Architecture) was quite a compact set of commands with its own internal logic. But time passed, bus capacity increased, registers expanded (including SIMD), and the number of teams was constantly growing. Then one guy made an interesting an attempt just count the number of instructions in x86. Either he was too lazy to open manual and count them, either he knew about the existence of “nameless” opcodes (teams that don’t even have a name), or he believed in the omnipotence of logic. It is easy to understand that logic turned out to be powerless. 🙂 By the way, I tried to find a graph of the growth in the number of X86 instructions by year (or by generation). I couldn’t yet (maybe someone has it?). But I found this picture.

image

To be honest, I don’t know how many instructions there are in the X86 ISA. But I remember well all those that I invested in the development myself, without realizing that every new bit in ISA is labeled “keep forever”. And there were several thousand “creators” like me at Intel. And the bits in order to encode all their fantasies were constantly lacking. 🙂 All new prefixes were added to the existing command set: REX, VEX … The last four-byte (EVEX) extension was introduced for the AVX-512. By the way, the whole set of ARM commands (even taking into account SVE) is removed in these same 4 bytes. ARM initially went the other way, building its command system on the basic principles of simplicity, compactness and extensibility.
The difference can be understood if we switch from the point of view of the programmer to the point of view of circuit designer. And it is known that the transistor is a great thing, but it switches slowly and dissipates energy shamelessly. And other things being equal, it would be better if there were fewer of them. Now take a look at decoding algorithm x86 instructions(the problem that I was trying to solve).
image

And as they say, feel the difference in the input paths (front end) for ARM and X86. Here it can be argued that the front end is a very small part of the kernel, only about 10%. Yes, but do not forget that all this logic is multiplied by the number of cores on the chip. And this is serious. Another consideration is that the front end is just the part that is responsible for backward compatibility. You can safely shovel or even throw back end. Intel, by the way, took advantage of this in the early 2000s, replacing the NetBurst (P4) architecture with Core-M (P3). But the front end can not be greatly reduced. Although sometimes I really want to, because of the existing X86 ISA, about 20% is now used. The rest are remnants of the past.
Another disadvantage of the huge length of instructions is the relatively frequent misses in the instruction cache. Roughly speaking, with an equal cache size, the number of misses will be the greater, the greater the length of the instruction. Of course, the size of the caches can be increased. But then again, these are transistors that might not have been if the instructions were shorter. For the same reason, I am very wary of various kinds of VLIW (Very Long Instruction World) architectures. However, they also have the disadvantage that it is very difficult for them to develop compilers. Until now, I deliberately avoided the topic of compilers, since I never developed them myself. I would like to listen to what knowledgeable people will say about the delights of developing compilers for RISC, CISC and VLIW.
And yet, despite the huge accumulated load of legacy, and perhaps thanks to him, the X86 developers were right in the main. The principle of backward compatibility has been adhered to in architecture from the very beginning. All existing software works on the new hardware out of the box. This is what made it possible to build such a deep and developed ecosystem around architecture. And it remains only to take off the hat before the engineers of Intel and AMD, who, despite the not at all perfect ISA design, have maintained leadership in the server segment for many years. However, the legacy load is getting harder.


0 Comments

Leave a Reply