## Saturday, July 19, 2008

### Cpp bignum arithmetic: part II

After having studied addition, subtraction and multiplication of C preprocessor bignums, let us now turn to division. To gain some insight into the mechanics of the division algorithm we reproduce a division layout the way we were taught to do at school:

‎         20553
‎8879265 |432
‎ 239
‎ 2392
‎ 2326
‎ 1665
‎ 369

To divide 8879265 by 432, we begin by isolating the first 3-digit group of the dividend, 887, and calculate how many times the divisor 432 makes it into it; we have that 887 = 432·2 + 23, so 2 is the first digit of our result, and we write down the remainder 23. Next, we append the next digit of the dividend, 9 , to the remainder and proceed iteratively:

• 887 = 432·2 + 23,
• 239 = 432·0 + 239,
• 2392 = 432·5 + 232,
• 2326 = 432·5 + 166,
• 1665 = 432·3 + 369

Let us assume we have a macro BN_QDIGIT_REMAINDER(bn1,bn2) that returns a sequence (q)(r), where q is the greatest digit between 0 and 9 such that bn2·qbn1 and r is the bignum representing the value bn1bn2·q. We can then implement the division of bn1 = (bn11)···(b1n) by bn2 (bn21)···(b2m)by folding over reverse((bn1m)···(b1n)) with the following recursive state definition:

state
0 = ((bn11)···(b1m-1), bn2,empty sequence),

where the functions first and second simply return the first and second component of a sequence of size 2. This is the complete Boost.Preprocessor-based algorithm:

‎#define BN_DIV(bn1,bn2) \
‎BOOST_PP_IF( \
‎ BN_LESS(bn1,bn2), \
‎ BN_DIV_CASE1, \
‎ BN_DIV_CASE2)(bn1,bn2)

‎#define BN_DIV_CASE1(bn1,bn2) (0)

‎#define BN_DIV_CASE2(bn1,bn2) \
‎BN_DIV_POSTPROCESS( \
‎ BOOST_PP_SEQ_TAIL(BN_DIV_STATE_RESULT(\
‎ BOOST_PP_SEQ_FOLD_LEFT( \
‎ BN_DIV_OP, \
‎ ( \
‎ (0) \
‎ BOOST_PP_SEQ_FIRST_N( \
‎ BOOST_PP_DEC(BOOST_PP_SEQ_SIZE(bn2)),bn1)) \
‎ (bn2) \
‎ ((~)), \
‎ BOOST_PP_SEQ_REST_N( \
‎ BOOST_PP_DEC(BOOST_PP_SEQ_SIZE(bn2)),bn1)))))

‎#define BN_DIV_POSTPROCESS(result) \
‎BOOST_PP_IF( \
‎ BOOST_PP_EQUAL(0,BOOST_PP_SEQ_ELEM(0,result)), \
‎ BOOST_PP_SEQ_TAIL, \
‎ BN_DIV_POSTPROCESS_CASE2)(result)

‎#define BN_DIV_POSTPROCESS_CASE2(result) result

‎#define BN_DIV_OP(s,state,x) \
‎BN_DIV_OP_IMPL( \
‎ BN_QDIGIT_REMAINDER( \
‎ BN_DIV_STATE_BN2(state)), \
‎ BN_DIV_STATE_BN2(state), \
‎ BN_DIV_STATE_RESULT(state))

‎#define BN_DIV_OP_IMPL(qr,bn2,result) \
‎(BOOST_PP_SEQ_ELEM(1,qr)) \
‎(bn2) \
‎(result(BOOST_PP_SEQ_ELEM(0,qr)))

‎#define BN_DIV_STATE_BN2(state) BOOST_PP_SEQ_ELEM(1,state)
‎#define BN_DIV_STATE_RESULT(state) BOOST_PP_SEQ_ELEM(2,state)

We are then left with the task of implementing the macro BN_QDIGIT_REMAINDER(bn1,bn2). A brute-force approach is to try the digits 9,8,...,0 until we met the first one satisfying bn2·qbn1, but this is unnecesarily wasteful: we find in Brinch Hansen's paper "Multiple-length division revisited" the following nice theorem: if bn1 and bn2 are represented in such a way that length(bn1) = length(bn2) +1 (even if that means bn1 has to be padded with some some leading zeros) and we define

qe =min(9, floor((bn11·100 + bn12·10 + bn13)/(bn21·10 + bn22))),

then

qqeq + 1,

that is, we can use the first three digits of bn1 and the first two digits of bn2 to estimate q and will be off at most by one:

‎#define BN_QDIGIT_REMAINDER(bn1,bn2) \
‎BN_QDIGIT_REMAINDER_IMPL( \
‎ BN_ZERO_SEQ( \
‎ BOOST_PP_IF( \
‎ BOOST_PP_LESS_EQUAL( \
‎ BOOST_PP_SEQ_SIZE(bn1), \
‎ BOOST_PP_SEQ_SIZE(bn2)), \
‎ BOOST_PP_INC( \
‎ BOOST_PP_SUB( \
‎ BOOST_PP_SEQ_SIZE(bn2), \
‎ BOOST_PP_SEQ_SIZE(bn1))), \
‎ 0))bn1, \
‎ bn2)

‎#define BN_ZERO_SEQ(n) \
‎BOOST_PP_REPEAT(n,BN_ZERO_SEQ_OP,~)

‎#define BN_ZERO_SEQ_OP(z,n,data) (0)

‎#define BN_QDIGIT_REMAINDER_IMPL(bn1,bn2) \
‎BOOST_PP_IF( \
‎ BOOST_PP_LESS(BOOST_PP_SEQ_SIZE(bn2),2), \
‎ BN_QDIGIT_REMAINDER_CASE1, \
‎ BN_QDIGIT_REMAINDER_CASE2)(bn1,bn2)

‎#define BN_QDIGIT_REMAINDER_CASE1(bn1,bn2) \
‎BN_QDIGIT_REMAINDER_CASE1_RETURN( \
‎ BOOST_PP_MUL(BOOST_PP_SEQ_ELEM(0,bn1),10), \
‎ BOOST_PP_SEQ_ELEM(1,bn1)), \
‎ BOOST_PP_SEQ_ELEM(0,bn2))

‎#define BN_QDIGIT_REMAINDER_CASE1_RETURN(n1,n2) \
‎(BOOST_PP_DIV(n1,n2))((BOOST_PP_MOD(n1,n2)))

‎#define BN_QDIGIT_REMAINDER_CASE2(bn1,bn2) \
‎BN_QDIGIT_REMAINDER_CASE2_TEST( \
‎ bn1,bn2, \
‎ BN_DIV_RRR_DD( \
‎ BOOST_PP_MUL(BOOST_PP_SEQ_ELEM(0,bn1),10), \
‎ BOOST_PP_SEQ_ELEM(1,bn1)), \
‎ BOOST_PP_SEQ_ELEM(2,bn1), \
‎ BOOST_PP_SEQ_ELEM(0,bn2), \
‎ BOOST_PP_SEQ_ELEM(1,bn2)))

‎#define BN_QDIGIT_REMAINDER_CASE2_TEST(bn1,bn2,q) \
‎BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL(bn1,bn2,q,BN_DIGIT_MUL(bn2,q))

‎#define BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL(bn1,bn2,q,bn2q) \
‎BOOST_PP_IF( \
‎ BN_LESS_WITH_ZERO(bn1,bn2q), \
‎ BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE1, \
‎ BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE2)(bn1,bn2,q,bn2q)

‎#define BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE1(bn1,bn2,q,bn2q) \
‎(BOOST_PP_DEC(q))(BN_SUB(bn1,BN_SUB(bn2q,bn2)))

‎#define BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE2(bn1,bn2,q,bn2q) \
‎(q)(BN_SUB(bn1,bn2q))

‎#define BN_LESS_WITH_ZERO(bn1,bn2) \
‎BOOST_PP_IF( \
‎ BOOST_PP_EQUAL(BOOST_PP_SEQ_ELEM(0,bn1),0), \
‎ BN_LESS_WITH_ZERO_CASE1, \
‎ BN_LESS)(bn1,bn2)

‎#define BN_LESS_WITH_ZERO_CASE1(bn1,bn2) \
‎BN_LESS(BOOST_PP_SEQ_TAIL(bn1),bn2)

The code above invokes the macro BN_DIV_RRR_DD(r1r2,r3,d1,d2) whose behavior is defined by:

BN_DIV_RRR_DD(r1r2,r3,d1,d2) = floor((r1r2·10 + r3)/(d1·10 + d2)).

The operation cannot be done directly because Boost.Preprocessor-based arithmetics are limited to numbers between 0 and BOOST_PP_LIMIT_MAG = 256. Instead, floor(r1r2/d1) is used as an initial upper-bound estimation which is then adjusted to the correct value using arithmetic operations that never exceed the BOOST_PP_LIMIT_MAG limit. The implementation of BN_DIV_RRR_DD is somewhat cumbersome but conceptually simple.

The complete implementation of all four arithmetic operations --addition, subtraction, multiplication and division-- is provided. Cpp bignum arithmetic is not very fast; for instance, given the following calculation of 30!:

‎BN_TO_LITERAL(
‎ BN_MUL((3)(0),BN_MUL((2)(9),BN_MUL((2)(8),BN_MUL((2)(7),
‎ BN_MUL((2)(6),BN_MUL((2)(5),BN_MUL((2)(4),BN_MUL((2)(3),
‎ BN_MUL((2)(2),BN_MUL((2)(1),BN_MUL((2)(0),BN_MUL((1)(9),
‎ BN_MUL((1)(8),BN_MUL((1)(7),BN_MUL((1)(6),BN_MUL((1)(5),
‎ BN_MUL((1)(4),BN_MUL((1)(3),BN_MUL((1)(2),BN_MUL((1)(1),
‎ BN_MUL((1)(0),BN_MUL((9) ,BN_MUL((8) ,BN_MUL((7),
‎ BN_MUL((6) ,BN_MUL((5) ,BN_MUL((4) ,BN_MUL((3),
‎ (2))))))))))))))))))))))))))))))

it takes around 20 seconds for this expression to expand to

265252859812191058636308480000000

using GCC 3.4.4 for Cygwin in a Windows box with a 1.66 GHz Intel Core 2 Duo and 1 GB RAM. Within the same environment, the long division

‎BN_TO_LITERAL(
‎ BN_DIV(
‎ (2)(6)(5)(2)(5)(2)(8)(5)(9)(8)(1)(2)(1)(9)(1)(0)(5)
‎ (8)(6)(3)(6)(3)(0)(8)(4)(8)(0)(0)(0)(0)(0)(0)(0),
‎ (1)(8)(3)(7)(0)(8)(0)(0)))

takes around 12 seconds. In a later entry, we will explore ways to improve the performance of bignum multiplication and division through memoization techniques.