I have already lamented the x86-64 exception of the DAA/DAS BCD commands and wept the cancellation of the INTO integer overflow check command. Now it’s the turn to cry about the discarded BOUND command. As the saying goes, ladies and gentlemen, put your waistcoats and cleavage on. I start crying.
The BOUND command for checking if an index is out of bounds is unique in the x86 architecture in its own way. Because it is explicitly designed to support high-level languages on the processor. In this part, perhaps, only she is one such. The command has two operands, the first of which indicates the register with the current index value, and the second – the memory address of the “tuple” of a pair of boundaries. If the signed number in the first operand is not within the range specified by the second operand, an exception is thrown. Please note that this is not only support for some C, since the array boundaries can be any, even negative.
I was faced with the need to control indexes in those distant times, when punched cards were still in use. Once I could not understand why my program was doing something strange. A colleague suggested turning on index control. In BESM-Algol, there was a standard punched card for this, with a punched diamond symbol and the word KONT. I just soared from such a proposal: “I looked through everything, the indexes are everywhere in order!” But he still put this standard card into the deck. And – oops! – the error was immediately found! Because, being a young and naive programmer, I looked into the text of my program and saw in it only what I wanted to see, and not what is written there in reality.
Soon another story happened. In those golden days of the development of the Energia carrier, we had an employee who was involved in the calculation of dynamic loads. He had to calculate many options, and each calculation took an hour and a half. Now, for some reason, such tasks are called “number grinders” with a touch of disdain. But in those days, almost all programs were like that. So, he ordered two hours of work at BESM-6 for every night, and in the morning he took the result. One evening, two other employees entered the computer room in the hope of running their programs faster and accidentally shoved the deck of the one I’m talking about waiting for input. The deck fell to the floor and crumbled. Well, how it crumbled – only a few punched cards flew off. They picked it up, and it turned out that one of the cards that flew off was the same card with a rhombus and the text KONT. The employees realized that the owner of the deck, of course, does not use it for his long-established program, but simply stores it so as not to look if he starts to change something. To do this, punched cards had to be placed at the very end of the deck behind the last significant punched card, which was officially called the “end of input on A3”, and in jargon it was even indecent – “long end” due to the long punched column of holes. From the end of the deck, she flew off. The KONT card was inserted into its place behind the “long end”, the incident was over.
However, the next morning, the owner of the deck ran around and asked who and what was doing with his program? It turned out that he had no idea about any cards with diamonds. A couple of years ago, he once made up a “passport” of the deck (with a KONT card), and he never touched anything there. And yesterday KONT, it turns out, was thrown out. As a result of the fact that the index control was turned off, his program produced a (correct) result not in an hour and a half, but in a few minutes.
Therefore, from the very beginning, I perceived, firstly, the importance of index control, since an incorrect array index is the easiest way to mess up a program. And secondly, the fact that checks should be treated sparingly, since these are expensive (long-term) operations.
And so, when moving from x86 to x86-64, BOUND died. I met on the network various suggestions why it was thrown out. Say, interfered with the predictor of transitions. But throwing an exception is not a transition at all. Say, there are not enough codes, but here the code 62H is occupied. Also strange. Well, the codes of the discarded PUSHAD / POPAD commands would have occupied, they are not so sorry. They say that the boundaries recorded in memory can be easily corrupted by malware. Yeah, but the stack can’t be corrupted by malware? Etc. etc. Up to the argument: “no one uses it.” But this is a vicious circle – they threw it out, because they don’t use it, but you can’t use it, because they threw it away.
The BOUND team is generally somehow unlucky. For example, in its description in the official documentation of INTEL bullshit is written. I’m not exaggerating. Probably, as a result of a technical error, several words were missing and it turned out that the current index value should be “less than or equal to the upper bound plus the size of the operand.” Just like that, in English on white, it says “…and less than or equal to the upper bound plus the operand size in bytes”.
Moreover, in translation into Russian, in some reference books they showed creativity and replaced the word “plus” with “amount”. So they write that the index must be less than or equal to the sum of the upper bound and the size of the operand. But it is easy to see that no operand size is added to the upper bound in BOUND. For example, if the bound is in memory 10 and the register index is 11, BOUND will throw an exception.
But no matter how much I cry, x86-64 no longer provides for BOUND and you have to live without it. And, as in the case of INTO, the question arises: how to quickly and better check if the index is out of bounds of the array?
Back in the old days, the solution was to specify the valid range of the index directly in the type of the integer variable used for the index. For example, if you describe a variable as something like i int [1..100], specifying the allowable range to be used as an array index with boundaries 1:100, then the index control will go “by itself”. True, at the same time, checks in the program still do not disappear. Although they may become less than if you check the index each time you access the array. But there is another complication here. For example, when sorting, you need to rearrange the i-th and i+1-th elements of the array. Even if i does not go beyond the boundaries of the array, i + 1 can go out, and the allowable range specified for i does not help in this case. Of course, you can create a variable j of type i, assign j=i+1; and then use indexes only as variables, without expressions. But this is ugly, not visual, cumbersome and requires unnecessary actions.
Therefore, I settled on the implementation of an index check when accessing an array, and not an index variable.
Since an array is a set of identical elements, accessing an array element by index will necessarily require multiplying the index by the size of the array element. The x86 has a convenient form of signed multiplication with three operands: a register to put the result in, the first factor (may be a variable in memory), and the second constant factor. To access the elements of an array, this is just what you need. For example, access to the array x by index i if the size of one element of the array is 100 bytes is performed by commands like:
mov esi, offset x ;начало массива x imul eax, i,100 ;умножаем размер элемента на индекс add esi,eax ;получаем в esi адрес i-ого элемента
Explicit checks are now required on x86-64 to control index out of bounds instead of using BOUND. However, directly adding comparison commands and conditional jumps to each access to an array element is too bloated. Therefore, I used a system call that needs to be passed both the current index value and the allowed limits.
Some difficulty is that in the example above, the index does not appear separately, it turned out to be right inside the multiplication instruction. But after all, you can check not the index itself, as such, but already the result of multiplication. And then check it against the bounds, also already multiplied at compile time by the size of the element. This eliminates the need for a separate command to get the index value from the variable. It is also inconvenient that the register where the result of the imul multiplication is written may be due to the distribution of registers by the compiler, and you will have to transfer, for example, this register itself through the stack.
Therefore, I decided to use the “good old” imul in the form with one operand-factor, if necessary, to control the index, since its second factor is always ax / eax / rax, and the result is always written in ax: dx / eax: edx / rax: rdx. In this case, it is not required to pass the index value to the system call – it will always be in this pair of registers after multiplication. And in this case, the system call needs only constant boundaries to be passed. Moreover, the highest part of the product, falling into the register dx / edx / rdx, in general, should always be zero or completely filled with single bits (with a negative index). And therefore, it can not be transmitted as a constant, but only implied.
Let’s consider a simple example. Let there be an array with bounds from 1 to 100 and the size of each element is 100 bytes. It is required to assign some value to the i-th element. An integer variable of 8 bytes is allocated for the index.
dcl a(1:100) char(100); dcl i fixed(63); a(i)='12345678';
In this case, the following commands are obtained without index control (? SMCCM is a line filling system call):
486B3D0028000064 imul q rdi,I,100 BEE8000000 mov q rsi,offset @000000E8h 488DBF8C000000 lea rdi,A+0FFFFFF9Ch[rdi] B064 mov al,100 B108 mov cl,8 E800000000 call ?SMCCM
The same action, but with index control:
6A6458 movsxq rax,100 48F72D00280000 imul q I 6A64 push 100 ;нижняя граница 6810270000 push 10000 ;верхняя граница E800000000 call ?SERV3 ;системный вызов контроля индекса BEE8000000 mov q rsi,offset @000000E8h 488DB88C000000 lea rdi,A+0FFFFFF9Ch[rax] B064 mov al,100 B108 mov cl,8 E800000000 call ?SMCCM
Two boundaries multiplied by the element size are passed to the ?SERV3 system call through the stack, while RAX:RDX contains the product of index i and the element size.
If the array’s lower bound were zero-based, a different kind of index check system call would be used, since it is not necessary to pass the always-zero lower bound:
6A6458 movsxq rax,100 48F72D68280000 imul q I 6810270000 push 10000 ;верхняя граница E800000000 call ?SERV4 ;системный вызов контроля BEE8000000 mov q rsi,offset @000000E8h 488DB8F0000000 lea rdi,A[rax] B064 mov al,100 B108 mov cl,8 E800000000 call ?SMCCM
Within these control system calls, the number written to rax:rdx is compared against the range passed by the constants on the stack:
PUBLIC ?SERV3: PUSH 0 ;ДОБАВЛЯЕМ НОЛЬ ;---- НЕ НИЖЕ НИЖНЕЙ ГРАНИЦЫ ---- SUB [RSP]+18H,RAX SBB [RSP]+000,RDX JL @ CMP Q PTR [RSP]+18H,0 JNZ ERR ;---- НЕ ВЫШЕ ВЕРХНЕЙ ГРАНИЦЫ ---- @: MOV Q PTR [RSP],0 SUB [RSP]+10H,RAX SBB [RSP]+000,RDX JL ERR ;---- КОНТРОЛЬ ПРОШЕЛ ---- ADD RSP,8 ;ВЫБРОСИЛИ НОЛЬ ИЗ СТЕКА RET 16 PUBLIC ?SERV4: PUSH 0 ;ДОБАВЛЯЕМ НУЛИ PUSH 0 ;---- НЕ НИЖЕ НИЖНЕЙ ГРАНИЦЫ ---- SUB [RSP]+08H,RAX SBB [RSP]+000,RDX JL @ CMP Q PTR [RSP]+08H,0 JNZ ERR ;---- НЕ ВЫШЕ ВЕРХНЕЙ ГРАНИЦЫ ---- @: MOV Q PTR [RSP],0 SUB [RSP]+18H,RAX SBB [RSP]+000,RDX JL ERR ;---- КОНТРОЛЬ ПРОШЕЛ ---- ADD RSP,16 ;ВЫБРОСИЛИ ДВА НУЛЯ ИЗ СТЕКА RET 8
It may seem that checking the higher half of the product in the RDX register is redundant. It is unlikely that in practice the index will be able to exceed 263. However, as a result of program errors, the index variable may also turn out to be just a hodgepodge of zeros and ones. Therefore, it is still more reliable to check the older half of the product. Just in case.
Of course, there are cases where index monitoring is not required. For example, if the program has a loop with the title
do i=lbound(a) to hbound(a);
and no one interferes with changing the index i inside this loop, then using the built-in functions lbound and hbound guarantees that the index does not go beyond the bounds of the array. The problem is that proving this requires painstaking analysis of the program by the compiler. For simple compilers, such complete analysis is often unattainable.
Therefore, I use a simpler way to reduce checks, which consists in looking for linear sections of code, i.e. sections where there are no subroutine calls, jumps or labels that can be controlled from anywhere in the program. If in such a linear section there are calls to array elements by index, then for the first call the index check is always set, and for the next calls it is checked that these are the same boundaries and the same index (and the index has not changed since the previous check). In this case, the check is no longer needed, since it has already been performed earlier.
For example, arrays have different element sizes but the same bounds:
dcl a(1:100) char(100); dcl b(1:100) char(50); dcl i fixed(63); a(i)=b(i);
Then only one index check is enough, which shortens the code and speeds up execution:
6A6458 movsxq rax,100 48F72D803B0000 imul q I 488BF8 mov q rdi,rax 6A64 push 100 ;нижняя граница 6810270000 push 10000 ;верхняя граница E800000000 call ?SERV3 ;контроль индекса 486B35803B000032 imul q rsi,I,50 ;умножение уже без контроля 488DB6C6270000 lea rsi,B+0FFFFFFCEh[rsi] 488DBF84000000 lea rdi,A+0FFFFFF9Ch[rdi] B064 mov al,100 B132 mov cl,50 E800000000 call ?SMCCM
The number of different commands in x86-64 is already approaching a thousand. In particular, recently added commands related to checking the validity of addresses BNDCL, BNDCU, BNDCN, BNDLDX, BNDMK, BNDMOV, BNDSTX in the abbreviations of which the word BOUND is guessed (and, by the way, also generating exceptions when the checked address goes out of the valid range), but the simplest BOUND command, the only command to support high-level language constructs that has been working since the days of 80286 on all types of processors, was somehow considered unnecessary. Although, in my opinion, it would be more rational to simply expand it with the REX prefix to 64-bit values.
As with DAA/DAS and INTO, the BOUND command can, of course, be functionally replaced by a combination of several other commands. Although it bloats the code. You can also leave them in the code and treat exceptions from these commands as if they were forbidden. Though it slows down the execution. It’s also a shame that the great idea of supporting high-level languages by the processor does not find understanding.