Subtraction is functionally complete

More specifically, then fully functional floating point subtraction IEEE-754 . This means that you can create any binary circuit using floating point subtraction alone.

To understand how to do this, you need to start from the bottom. Quote from Section 6.3 of IEEE 754-2019:

6.3 Sign bit

[…] When neither the input nor the result is NaN, […]; sum or difference sign xyconsidered as the sum x+(−y), the sign differs from at most one of the terms; […]. These rules must apply even when the operands or results are zero or infinity.

When the sum of two operands with opposite signs (or the difference of two operands with the same sign) is zero, the sign of that sum (or difference) must be +0 for all rounding direction attributes except roundTowardNegative; with this attribute, the sign of the zero sum (or difference) must be −0.

Let’s break this down.

  1. Subtraction xy treated as a sum x+(−y).

  2. Zero can have a sign, −0 and 0 are different entities (although when compared using == they are considered equal).

  3. If both terms have the same sign, then the result must have that sign. However for xy this means that if x And y have different signs, the result must have a sign x.

  4. If x And y have the same sign and xy equals zero, the result will be +0 for all rounding modes except roundTowardNegativeat which it will be −0.

Since in almost every context the rounding mode is used by default roundTiesToEven, we will assume that it applies. However, everything works similarly even for roundTowardNegative.

Truth table

So what does this give us when subtracting zeros?

-0 - -0 = +0  # Одинаковый знак, должен быть +0.
-0 - +0 = -0  # Разные знаки, знак от первого аргумента.
+0 - -0 = +0  # Разные знаки, знак от первого аргумента.
+0 - +0 = +0  # Одинаковый знак, должен быть +0.

Interesting… What if we designate −0 as false and +0 as true?

A B | O
----+--
0 0 | 1
0 1 | 0
1 0 | 1
1 1 | 1

The resulting truth table will be equivalent to A∨¬Bor BA (also known as valve IMPLY, only with rearranged arguments). It turns out that our truth table functionally completethat is, with the help of this one gate you can create arbitrary circuits.

Subtraction circuits

Let’s write a demo in Python. First, let’s define the constants and prepare their beautiful output:

Although they are separate entities, IEEE 754 floating point rules treat +0 and −0 as equal when compared. That is, to distinguish them, we need to extract the sign before comparing with 0.

import math

f_false = -0.0
f_true = 0.0
f_repr = lambda x: True if math.copysign(1, x) > 0 else False

Now we can create a NOT gate by taking advantage of the fact that −0−x reverses the sign to zero x:

f_not = lambda x: f_false - x

Let’s test this:

>>> f_repr(f_not(f_false))
True
>>> f_repr(f_not(f_true))
False

Great! We can also create an OR gate by noting that if we reverse the sign of the second argument before subtracting, we always get +0 (true) unless both arguments are −0 (false):

f_or = lambda a, b: a - f_not(b)

Let’s test this:

>>> f_repr(f_or(f_false, f_false))
False
>>> f_repr(f_or(f_true,  f_false))
True
>>> f_repr(f_or(f_false, f_true))
True
>>> f_repr(f_or(f_true, f_true))
True

Now that we have OR and NOT, we can make all the other gates, for example:

f_and = lambda a, b: f_not(f_or(f_not(a), f_not(b)))
f_xor = lambda a, b: f_or(f_and(f_not(a), b), f_and(a, f_not(b)))
>>> f_repr(f_and(f_false, f_false))
False
>>> f_repr(f_and(f_true,  f_false))
False
>>> f_repr(f_and(f_false, f_true))
False
>>> f_repr(f_and(f_true, f_true))
True

>>> f_repr(f_xor(f_false, f_false))
False
>>> f_repr(f_xor(f_true,  f_false))
True
>>> f_repr(f_xor(f_false, f_true))
True
>>> f_repr(f_xor(f_true, f_true))
False

Program integers

You may have heard of soft-float – software implementations of floating point using integers. Let’s do the opposite: create a software implementation of integers using floating point operations alone. Let’s implement it in Rust so that you can look at the finished assembly code and make sure it’s terrible slowness splendor.

type Bit = f32;
const ZERO: Bit = -0.0;
const ONE: Bit = 0.0;

fn not(x: Bit) -> Bit { ZERO - x }
fn or(a: Bit, b: Bit) -> Bit { a - not(b) }
fn and(a: Bit, b: Bit) -> Bit { not(or(not(a), not(b))) }
fn xor(a: Bit, b: Bit) -> Bit { or(and(not(a), b), and(a, not(b))) }
fn adder(a: Bit, b: Bit, c: Bit) -> (Bit, Bit) {
    let s = xor(xor(a, b), c);
    let c = or(and(xor(a, b), c), and(a, b));
    (s, c)
}

type SoftU8 = [Bit; 8];

pub fn softu8_add(a: SoftU8, b: SoftU8) -> SoftU8 {
    let (s0, c) = adder(a[0], b[0], ZERO);
    let (s1, c) = adder(a[1], b[1], c);
    let (s2, c) = adder(a[2], b[2], c);
    let (s3, c) = adder(a[3], b[3], c);
    let (s4, c) = adder(a[4], b[4], c);
    let (s5, c) = adder(a[5], b[5], c);
    let (s6, c) = adder(a[6], b[6], c);
    let (s7, _) = adder(a[7], b[7], c);
    [s0, s1, s2, s3, s4, s5, s6, s7]
}

// Хм? u8? Что это? Тс-с-с-с....
pub fn to_softu8(x: u8) -> SoftU8 {
    std::array::from_fn(|i| if (x >> i) & 1 == 1 { ONE } else { ZERO })
}

pub fn from_softu8(x: SoftU8) -> u8 {
    (0..8).filter(|i| x[*i].signum() > 0.0).map(|i| 1 << i).sum()
}

fn main() {
    let a = to_softu8(23);
    let b = to_softu8(19);
    println!("{}", from_softu8(softu8_add(a, b)));
}

It’s terrible, but it works and outputs 42 correctly. And to add two 8-bit integers you need Total ≈120 floating point instructions:

On x86-64 there is no floating point negation instruction; instead, the compiler simply generates an XOR with a mask that toggles the most significant bit, that is, the sign bit of the IEEE-754 floating point number.

example::softu8_add:
        mov     rax, rdi
        movups  xmm2, xmmword ptr [rsi]
        movups  xmm0, xmmword ptr [rdx]
        movaps  xmm3, xmm2
        subps   xmm3, xmm0
        movaps  xmm4, xmm0
        subps   xmm4, xmm2
        movaps  xmm1, xmmword ptr [rip + .LCPI0_0]
        xorps   xmm4, xmm1
        subps   xmm4, xmm3
        xorps   xmm3, xmm3
        subss   xmm3, xmm4
        movaps  xmm6, xmm0
        xorps   xmm6, xmm1
        subss   xmm6, xmm2
        xorps   xmm6, xmm1
        subss   xmm6, xmm3
        movaps  xmm3, xmm4
        shufps  xmm3, xmm4, 85
        movaps  xmm5, xmm6
        subss   xmm5, xmm3
        xorps   xmm5, xmm1
        movaps  xmm10, xmm6
        xorps   xmm10, xmm1
        subss   xmm10, xmm3
        movaps  xmm7, xmm0
        shufps  xmm7, xmm0, 85
        xorps   xmm7, xmm1
        movaps  xmm3, xmm2
        shufps  xmm3, xmm2, 85
        subss   xmm7, xmm3
        xorps   xmm7, xmm1
        movaps  xmm8, xmm4
        unpckhpd        xmm8, xmm4
        movaps  xmm3, xmm0
        unpckhpd        xmm3, xmm0
        xorps   xmm3, xmm1
        movaps  xmm9, xmm2
        unpckhpd        xmm9, xmm2
        subss   xmm3, xmm9
        xorps   xmm3, xmm1
        xorps   xmm11, xmm11
        movaps  xmm9, xmm4
        shufps  xmm9, xmm4, 255
        subss   xmm7, xmm10
        movaps  xmm10, xmm7
        xorps   xmm10, xmm1
        subss   xmm10, xmm8
        subss   xmm3, xmm10
        unpcklps        xmm7, xmm3
        shufps  xmm6, xmm7, 64
        addps   xmm11, xmm4
        movlhps xmm5, xmm4
        subps   xmm4, xmm6
        movss   xmm4, xmm11
        subps   xmm7, xmm8
        xorps   xmm7, xmm1
        shufps  xmm5, xmm7, 66
        subps   xmm5, xmm4
        xorps   xmm3, xmm1
        subss   xmm3, xmm9
        shufps  xmm0, xmm0, 255
        xorps   xmm0, xmm1
        shufps  xmm2, xmm2, 255
        subss   xmm0, xmm2
        xorps   xmm0, xmm1
        movups  xmmword ptr [rdi], xmm5
        movups  xmm2, xmmword ptr [rdx + 16]
        movaps  xmm5, xmm2
        xorps   xmm5, xmm1
        movups  xmm7, xmmword ptr [rsi + 16]
        subss   xmm5, xmm7
        xorps   xmm5, xmm1
        movaps  xmm4, xmm2
        shufps  xmm4, xmm2, 85
        xorps   xmm4, xmm1
        movaps  xmm6, xmm2
        movaps  xmm8, xmm7
        movaps  xmm9, xmm7
        subps   xmm9, xmm2
        subps   xmm2, xmm7
        shufps  xmm7, xmm7, 85
        subss   xmm4, xmm7
        xorps   xmm4, xmm1
        movhlps xmm6, xmm6
        xorps   xmm6, xmm1
        movhlps xmm8, xmm8
        subss   xmm6, xmm8
        xorps   xmm6, xmm1
        xorps   xmm2, xmm1
        subps   xmm2, xmm9
        subss   xmm0, xmm3
        movaps  xmm3, xmm0
        xorps   xmm3, xmm1
        subss   xmm3, xmm2
        subss   xmm5, xmm3
        unpcklps        xmm0, xmm5
        xorps   xmm5, xmm1
        movaps  xmm3, xmm2
        shufps  xmm3, xmm2, 85
        subss   xmm5, xmm3
        subss   xmm4, xmm5
        movaps  xmm3, xmm4
        xorps   xmm3, xmm1
        movaps  xmm5, xmm2
        unpckhpd        xmm5, xmm2
        subss   xmm3, xmm5
        subss   xmm6, xmm3
        unpcklps        xmm4, xmm6
        movlhps xmm0, xmm4
        movaps  xmm3, xmm2
        subps   xmm3, xmm0
        subps   xmm0, xmm2
        xorps   xmm0, xmm1
        subps   xmm0, xmm3
        movups  xmmword ptr [rdi + 16], xmm0
        ret

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *