Algorithm for dividing 2W-bit numbers using operations with numbers of bit depth W

Using 32-bit integers as an example, we consider a scalable division algorithm that uses numbers with half the width (16 bits). To illustrate the performance of the algorithm, the code of a test application in C++ is given.

Description of the algorithm

Let's imagine some {2W}-bit number X as:

X = (A, B) = A \cdot {2}^{W} + B,

Where A, B – senior and junior parts {W}-bit components of the number, respectively.

Then division of two numbers, X And Ycan be written as a fraction:

Z = {{X} \over {Y}} = {{A \cdot {2}^{W} + B} \over {C \cdot {2}^{W} + D}}.

Note that if C>0″ src=”https://habrastorage.org/getpro/habr/upload_files/4fb/660/495/4fb6604952678952f3d37508f77ceaf3.svg” width=”50″ height=”17″/>then the result of division is some <img data-lazyloaded=-bit number. Let's limit ourselves to this case. For C=0in the example below, an algorithm for dividing a “wide” number by a “narrow” one is implemented in C++, based on the representation of the dividend N as a product of the quotient Q and divisor D plus the remainder term R:

N=Q \cdot D + R.

We further assume that A\geqC, otherwise, the result of division is zero. Let's imagine a number A as:

A = \left \lfloor {{A} \over {C}} \right \rfloor \cdot C + A \mod C = q \cdot C + r.

Here q = \left \lfloor {{A} \over {C}} \right \rfloor – integer part from division, and r = A\mod C – remainder of the division A on C.

Let's rewrite the fraction:

Z = {{q \cdot C \cdot {2}^{W} + B + r \cdot {2}^{W} } \over {C \cdot {2}^{W} + D}}.

Let's neglect the term Bto get a lower bound for the division result plus further simplify the formula:

Z_1 = {(q \cdot C + r)}{{{2}^{W}} \over {C \cdot {2}^{W} + D}} \leq Z.

Let's select the term q as the main component:

Z_1 = q + {{r \cdot {2}^{W} - D \cdot q} \over {C \cdot {2}^{W} + D}}.

Let's make a change of variables \Delta=2W-D The point is that “severe” cases correspond to the parameter D large enough, so the entered delta parameter will be small and the formula will quickly lead to the result:

Z_1 = q + {{(rq) \cdot {2}^{W} + \Delta \cdot q} \over {(C+1) \cdot {2}^{W} - \Delta}}.

The numerator and denominator of the fraction contain “wide” numbers (with digits 2W). Since it is possible to use the algorithm for dividing a wide number by a narrow one, we will achieve the “narrowness” of the number in the denominator by removing the multiplier C+1 outside the brackets and performing the division sequentially:

Z_1 = q + { { {(rq) \cdot {2}^{W} + \Delta \cdot q} \over {{2}^{W} - {{\Delta} \over {C+1}} } } \over {C + 1} }.

Thus, the “broad” numerator sequentially divide by “narrow” denominators. The first denominator can sometimes be equal to the multiplier 2^W, which needs to be monitored in the algorithm: in the CPU registers the number will actually be reset to zero and an exception will occur. Alternatively, one can subtract one in advance and not worry about boundary conditions, since for this algorithm C>0″ src=”https://habrastorage.org/getpro/habr/upload_files/99b/2d7/ac9/99b2d7ac9de96008d7b7389903514908.svg” width=”50″ height=”17″/>.  We do the same with the variable <img data-lazyloaded=which will ultimately give the final version:

\Delta = 2^W - 1 - D,$$$$Z_1 = q + {{ {(rq) \cdot {2}^{W} + \Delta \cdot q} \over {{2}^{W } - 1 - {{\Delta} \over {C+1}}} } \over {C + 1} },$$ $$q = \left \lfloor {{A} \over {C}} \right \rfloor,$$ $$r = A \mod C.

A numerical experiment showed that one iteration described above is sufficient. Physically, this is explained by the fact that the algorithm is designed specifically for C>0″ src=”https://habrastorage.org/getpro/habr/upload_files/b52/48d/bdb/b5248dbdbdd7efd5880883f7be179a15.svg” width=”50″ height=”17″/>, which leads to a fairly large number in the denominator and a relatively small quotient.  However, to obtain the final result, “fine tuning” is required — sequential addition or subtraction of one depending on the current division error, which can be done in a loop and in one step get the remainder of the division.  To do this, it is necessary to implement an operator for multiplying a “wide” number by a “narrow” one, and additionally, one should monitor for overflow, as a result of which it will be necessary to reduce the quotient by one.</p><p>Note that the proposed algorithm is not tied to the type of numbers: integer, floating point, in an alternative number system, etc. The basis of the algorithm assumes the possibility of dividing an arbitrary number into two halves: high and low.  Signed bits are handled by separate logic so that unsigned numbers are actually processed.</p><h3>Example implementation of the division algorithm in C++</h3><pre><code>#include <limits.h>
#include <algorithm>
#include <cassert>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <list>
#include <random>
#include <string>

using u64 = uint64_t;
using i64 = int64_t;
using u32 = uint32_t;
using u16 = uint16_t;

struct Quadrupole {
    u16 A;
    u16 B;
    u16 C;
    u16 D;
};

struct Signess {
    int s1;
    int s2;
};

static auto const seed = std::random_device{}();

/***
 * Генератор случайных чисел.
 */
auto rollu16 = [urbg = std::mt19937{seed},
                distr = std::uniform_int_distribution<u16>{}]() mutable -> u16 {
    return distr(urbg);
};

static constexpr char DIGITS[10]{'0', '1', '2', '3', '4',
                                 '5', '6', '7', '8', '9'};

// High/Low структура 32-битного числа со знаком и флагом переполнения.
// Для иллюстрации алгоритма деления двух U32 чисел реализованы основные
// арифметические операторы, кроме умножения двух U32 чисел.
// Если дополнительно реализовать полноценное умножение, то структура
// позволяет себя масштабировать: реализовывать 128-битные числа, 
// 256-битные и т.д.
struct U32 {
    static constexpr int mHalfWidth = (sizeof(u16) * CHAR_BIT) / 2;
    u16 mHigh = 0;
    u16 mLow = 0;
    int mSign = 0;
    int mOverflow = 0;

/**
 * Оператор проверки на ненулевое значение.
 */
bool operator()() {
    return (mLow != 0) || (mHigh != 0) || (mOverflow != 0);
}

U32 operator+(U32 rhs) const {
    U32 res{};
    U32 X = *this;
    if (X.mSign != 0 && rhs.mSign == 0) {
        X.mSign = 0;
        res = rhs - X;
        return res;
    }
    if (X.mSign == 0 && rhs.mSign != 0) {
        rhs.mSign = 0;
        res = X - rhs;
        return res;
    }
    res.mLow = X.mLow + rhs.mLow;
    const u16 c1 = res.mLow < std::min(X.mLow, rhs.mLow);
    res.mHigh = X.mHigh + rhs.mHigh;
    const int c2 = res.mHigh < std::min(X.mHigh, rhs.mHigh);
    auto tmp = res.mHigh;
    res.mHigh = tmp + c1;
    const int c3 = res.mHigh < std::min(tmp, c1);
    res.mOverflow = c2 || c3;
    if (X.mSign != 0 && rhs.mSign != 0) {
        res.mSign = 1;
    }
    return res;
}

U32 operator-(U32 rhs) const {
    U32 res{};
    U32 X = *this;
    if (mSign != 0 && rhs.mSign == 0) {
        rhs.mSign = 1;
        res = rhs + X;
        return res;
    }
    if (mSign == 0 && rhs.mSign != 0) {
        rhs.mSign = 0;
        res = X + rhs;
        return res;
    }
    if (mSign != 0 && rhs.mSign != 0) {
        rhs.mSign = 0;
        X.mSign = 0;
        res = rhs - X;
        return res;
    }
    res.mLow = X.mLow - rhs.mLow;
    res.mHigh = X.mHigh - rhs.mHigh;
    const int c1 = X.mLow < rhs.mLow;
    if (c1 != 0) {
        res.mLow -= 1 u;
        res.mLow += 1 u;
        res.mHigh -= 1 u;
        const int c2 = X.mHigh < (rhs.mHigh + 1 u);
        if (c2 != 0) {
            res.mHigh = -res.mHigh - 1 u;
            res.mLow = -res.mLow;
            res.mSign = 1;
        }
    } else if (X.mHigh < rhs.mHigh) {
        res.mHigh = (res.mLow != 0) ? -res.mHigh - 1 u : -res.mHigh;
        res.mLow = -res.mLow;
        res.mSign = 1;
    }
    return res;
}

U32 mult16(u16 x, u16 y) const {
    constexpr u16 mask = (1 u << mHalfWidth) - 1;
    auto x_low = x & mask;
    auto y_low = y & mask;
    auto x_high = x >> mHalfWidth;
    auto y_high = y >> mHalfWidth;
    auto t1 = x_low * y_low;
    auto t = t1 >> mHalfWidth;
    auto t21 = x_low * y_high;
    auto q = t21 >> mHalfWidth;
    auto p = t21 & mask;
    auto t22 = x_high * y_low;
    auto s = t22 >> mHalfWidth;
    auto r = t22 & mask;
    auto t3 = x_high * y_high;
    U32 res{};
    res.mLow = t1;
    auto div = (q + s) + ((p + r + t) >> mHalfWidth);
    auto mod = (t21 << mHalfWidth) + (t22 << mHalfWidth);
    res.mLow += mod;
    res.mHigh += div;
    res.mHigh += t3;
    res.mOverflow = res.mHigh < t3 ? 1 : 0;
    return res;
}

U32 operator*(u16 rhs) {
    U32 tmpB = mult16(mLow, rhs);
    U32 tmpA = mult16(mHigh, rhs);
    tmpA.mHigh = tmpA.mLow;
    tmpA.mLow = 0;
    tmpB = tmpB + tmpA;
    if (this->mSign != 0) {
        tmpB.mSign = 1;
    }
    return tmpB;
}

U32 operator/(u16 y) const {
    U32 rem = *this;
    auto iter1 = [&rem](u16 v) {
        U32 res{};
        auto q0 = rem.mLow / v;
        auto r0 = rem.mLow % v;
        auto q1 = rem.mHigh / v;
        auto r1 = rem.mHigh % v;
        res.mLow = q0;
        res.mHigh = q1;
        rem.mLow = r0;
        rem.mHigh = r1;
        return res;
    };
    auto iter2 = [&rem](u16 v) {
        U32 res{};
        auto d = (1 u << mHalfWidth) / v;
        auto t = (1 u << mHalfWidth) % v;
        res.mLow = d * rem.mHigh << mHalfWidth;
        res.mLow += ((t * rem.mHigh << mHalfWidth) + rem.mLow) / v;
        rem.mLow = 0;
        rem.mHigh = 0;
        return res;
    };
    U32 result{};
    result = result + iter1(y);
    result = result + iter2(y);
    if (this->mSign != 0) {
        result.mSign = 1;
    }
    return result;
}

U32& operator/=(u16 y) {
    *this = *this / y;
    return *this;
}

U32 operator/(const U32 other) const {
    U32 X = *this;
    U32 Y = other;
    constexpr U32 ZERO{.mHigh = 0, .mLow = 0, .mSign = 0, .mOverflow = 0};
    constexpr U32 UNIT{.mHigh = 0, .mLow = 1, .mSign = 0, .mOverflow = 0};
    constexpr U32 UNIT_NEG{
        .mHigh = 0, .mLow = 1, .mSign = 1, .mOverflow = 0};
    if (mHigh < Y.mHigh) {
        return ZERO;
    }
    if (Y.mHigh == 0) {
        U32 res = X / Y.mLow;
        if (Y.mSign != 0) {
            res.mSign = res.mSign != 0 ? 0 : 1;
        }
        return res;
    }
    if (X.mSign != 0 && Y.mSign != 0) {
        X.mSign = 0;
        Y.mSign = 0;
    }
    bool make_sign_inverse = false;
    if ((X.mSign != 0 && Y.mSign == 0) || (X.mSign == 0 && Y.mSign != 0)) {
        X.mSign = 0;
        Y.mSign = 0;
        make_sign_inverse = true;
    }
    assert((Y.mHigh != 0) || (Y.mLow != 0));
    const u16 Q = mHigh / Y.mHigh;
    const u16 R = mHigh % Y.mHigh;
    const u16 Delta = (1 u << 2 * mHalfWidth) - 1 - Y.mLow;
    const U32 DeltaQ = mult16(Delta, Q);
    U32 W1{};
    W1.mHigh = R >= Q ? R - Q : Q - R;
    W1.mLow = 0;
    W1.mSign = R >= Q ? 0 : 1;
    W1 = W1 + DeltaQ;
    u16 C1 = Y.mHigh + 1 u;
    C1 = C1 != 0 ? C1 : (1 u << 2 * mHalfWidth) - 1;
    u16 W2 = (1 u << 2 * mHalfWidth) - 1 - Delta / C1;
    U32 Quotient = W1 / W2;
    Quotient = Quotient / C1;
    U32 result = U32{.mHigh = 0, .mLow = Q} + Quotient;
    U32 N = Y * result.mLow;
    if (N.mOverflow != 0) {
        result.mLow -= 1;
        N = Y * result.mLow;
    }
    U32 Error = (result.mSign == 0) ? X - N : X + N;
    U32 More = Error - Y;
    bool do_inc = More.mSign == 0;
    bool do_dec = Error.mSign == 1;
    while (do_dec || do_inc) {
        result = result + (do_inc ? UNIT : (do_dec ? UNIT_NEG : ZERO));
        if (do_dec) {
            Error = Error + Y;
        }
        if (do_inc) {
            Error = Error - Y;
        }
        More = Error - Y;
        do_inc = More.mSign == 0;
        do_dec = Error.mSign == 1;
    }
    if (make_sign_inverse) {
        result.mSign = result.mSign != 0 ? 0 : 1;
    }
    return result;
}

/**
 * Возвращает строковое представление числа.
 */
std::string value() const {
    std::string result{};
    if (this->mOverflow != 0) {
        result =

conclusions

An algorithm for dividing numbers consisting of high and low halves has been proposed and tested, scaling to an arbitrary bit depth in multiples {2}^{N}. This algorithm, in a sense, is a variant of “smart” long division: first, the first approximation is calculated, equal to the division of the higher halves of the number, and then the second, equal to the corrected first. The corrector is equal to the sequential division of a certain wide number into two narrow ones.

The proposed algorithm easily scales to the 128-bit version using built-in 64-bit arithmetic. However, the option with scaling, for example, to 256-bits, requires the implementation of a full multiplication in the U128 structure, which can be done by scaling the implemented “half” multiplication operator: U32 by u16. You will also need to implement the bit shift operator. Ultimately, when all the necessary operators are implemented, it is possible to implement a template structure with an arbitrary bit depth 2^Nrecursively (by dividing in half) into the arithmetic of 64-bit built-in numbers.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *