Bannalia: trivial notes on themes diverse: Cpp bignum arithmetic: part II

After having studied addition, subtraction and multiplication of C preprocessor bignums, let us now turn to division. To gain some insight into the mechanics of the division algorithm we reproduce a division layout the way we were taught to do at school:

‎         20553
‎8879265 |432
‎ 239
‎ 2392
‎  2326
‎   1665
‎    369

To divide 8879265 by 432, we begin by isolating the first 3-digit group of the dividend, 887, and calculate how many times the divisor 432 makes it into it; we have that 887 = 432·2 + 23, so 2 is the first digit of our result, and we write down the remainder 23. Next, we append the next digit of the dividend, 9 , to the remainder and proceed iteratively:

887 = 432·2 + 23,
239 = 432·0 + 239,
2392 = 432·5 + 232,
2326 = 432·5 + 166,
1665 = 432·3 + 369

Let us assume we have a macro BN_QDIGIT_REMAINDER(bn1,bn2) that returns a sequence (q)(r), where q is the greatest digit between 0 and 9 such that bn2·q ≤ bn1 and r is the bignum representing the value bn1 − bn2·q. We can then implement the division of bn1 = (bn11)···(b1n) by bn2 (bn21)···(b2m)by folding over reverse((bn1m)···(b1n)) with the following recursive state definition:

state_i = (lead_i,bn2,res_i);
state₀ = ((bn11)···(b1m-1), bn2,empty sequence),
lead_i+1 = first(BN_QDIGIT_REMAINDER(lead_i,bn2)),
res_i+1 = append(res_i, second(BN_QDIGIT_REMAINDER(lead_i,bn2))),

where the functions first and second simply return the first and second component of a sequence of size 2. This is the complete Boost.Preprocessor-based algorithm:

‎#define BN_DIV(bn1,bn2) \
‎BOOST_PP_IF( \
‎  BN_LESS(bn1,bn2), \
‎  BN_DIV_CASE1, \
‎  BN_DIV_CASE2)(bn1,bn2)
‎
‎#define BN_DIV_CASE1(bn1,bn2) (0)
‎
‎#define BN_DIV_CASE2(bn1,bn2) \
‎BN_DIV_POSTPROCESS( \
‎  BOOST_PP_SEQ_TAIL(BN_DIV_STATE_RESULT(\
‎    BOOST_PP_SEQ_FOLD_LEFT( \
‎      BN_DIV_OP, \
‎      ( \
‎        (0) \
‎        BOOST_PP_SEQ_FIRST_N( \
‎         BOOST_PP_DEC(BOOST_PP_SEQ_SIZE(bn2)),bn1)) \
‎      (bn2) \
‎      ((~)), \
‎      BOOST_PP_SEQ_REST_N( \
‎        BOOST_PP_DEC(BOOST_PP_SEQ_SIZE(bn2)),bn1)))))
‎
‎#define BN_DIV_POSTPROCESS(result) \
‎BOOST_PP_IF( \
‎  BOOST_PP_EQUAL(0,BOOST_PP_SEQ_ELEM(0,result)), \
‎  BOOST_PP_SEQ_TAIL, \
‎  BN_DIV_POSTPROCESS_CASE2)(result)
‎
‎#define BN_DIV_POSTPROCESS_CASE2(result) result
‎
‎#define BN_DIV_OP(s,state,x) \
‎BN_DIV_OP_IMPL( \
‎  BN_QDIGIT_REMAINDER( \
‎    BN_DIV_STATE_LEAD(state)(x), \
‎    BN_DIV_STATE_BN2(state)), \
‎  BN_DIV_STATE_BN2(state), \
‎  BN_DIV_STATE_RESULT(state))
‎
‎#define BN_DIV_OP_IMPL(qr,bn2,result) \
‎(BOOST_PP_SEQ_ELEM(1,qr)) \
‎(bn2) \
‎(result(BOOST_PP_SEQ_ELEM(0,qr)))
‎
‎#define BN_DIV_STATE_LEAD(state) BOOST_PP_SEQ_ELEM(0,state)
‎#define BN_DIV_STATE_BN2(state) BOOST_PP_SEQ_ELEM(1,state)
‎#define BN_DIV_STATE_RESULT(state) BOOST_PP_SEQ_ELEM(2,state)

We are then left with the task of implementing the macro BN_QDIGIT_REMAINDER(bn1,bn2). A brute-force approach is to try the digits 9,8,...,0 until we met the first one satisfying bn2·q ≤ bn1, but this is unnecesarily wasteful: we find in Brinch Hansen's paper "Multiple-length division revisited" the following nice theorem: if bn1 and bn2 are represented in such a way that length(bn1) = length(bn2) +1 (even if that means bn1 has to be padded with some some leading zeros) and we define

q_e =min(9, floor((bn11·100 + bn12·10 + bn13)/(bn21·10 + bn22))),

then

q ≤ q_e ≤ q + 1,

that is, we can use the first three digits of bn1 and the first two digits of bn2 to estimate q and will be off at most by one:

‎#define BN_QDIGIT_REMAINDER(bn1,bn2) \
‎BN_QDIGIT_REMAINDER_IMPL( \
‎  BN_ZERO_SEQ( \
‎    BOOST_PP_IF( \
‎      BOOST_PP_LESS_EQUAL( \
‎        BOOST_PP_SEQ_SIZE(bn1), \
‎        BOOST_PP_SEQ_SIZE(bn2)), \
‎      BOOST_PP_INC( \
‎        BOOST_PP_SUB( \
‎          BOOST_PP_SEQ_SIZE(bn2), \
‎          BOOST_PP_SEQ_SIZE(bn1))), \
‎      0))bn1, \
‎  bn2)
‎
‎#define BN_ZERO_SEQ(n) \
‎BOOST_PP_REPEAT(n,BN_ZERO_SEQ_OP,~)
‎
‎#define BN_ZERO_SEQ_OP(z,n,data) (0)
‎
‎#define BN_QDIGIT_REMAINDER_IMPL(bn1,bn2) \
‎BOOST_PP_IF( \
‎  BOOST_PP_LESS(BOOST_PP_SEQ_SIZE(bn2),2), \
‎  BN_QDIGIT_REMAINDER_CASE1, \
‎  BN_QDIGIT_REMAINDER_CASE2)(bn1,bn2)
‎
‎#define BN_QDIGIT_REMAINDER_CASE1(bn1,bn2) \
‎BN_QDIGIT_REMAINDER_CASE1_RETURN( \
‎  BOOST_PP_ADD( \
‎    BOOST_PP_MUL(BOOST_PP_SEQ_ELEM(0,bn1),10), \
‎    BOOST_PP_SEQ_ELEM(1,bn1)), \
‎  BOOST_PP_SEQ_ELEM(0,bn2))
‎
‎#define BN_QDIGIT_REMAINDER_CASE1_RETURN(n1,n2) \
‎(BOOST_PP_DIV(n1,n2))((BOOST_PP_MOD(n1,n2)))
‎
‎#define BN_QDIGIT_REMAINDER_CASE2(bn1,bn2) \
‎BN_QDIGIT_REMAINDER_CASE2_TEST( \
‎  bn1,bn2, \
‎  BN_DIV_RRR_DD( \
‎    BOOST_PP_ADD( \
‎      BOOST_PP_MUL(BOOST_PP_SEQ_ELEM(0,bn1),10), \
‎      BOOST_PP_SEQ_ELEM(1,bn1)), \
‎    BOOST_PP_SEQ_ELEM(2,bn1), \
‎    BOOST_PP_SEQ_ELEM(0,bn2), \
‎    BOOST_PP_SEQ_ELEM(1,bn2)))
‎
‎#define BN_QDIGIT_REMAINDER_CASE2_TEST(bn1,bn2,q) \
‎BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL(bn1,bn2,q,BN_DIGIT_MUL(bn2,q))
‎
‎#define BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL(bn1,bn2,q,bn2q) \
‎BOOST_PP_IF( \
‎  BN_LESS_WITH_ZERO(bn1,bn2q), \
‎  BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE1, \
‎  BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE2)(bn1,bn2,q,bn2q)
‎
‎#define BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE1(bn1,bn2,q,bn2q) \
‎(BOOST_PP_DEC(q))(BN_SUB(bn1,BN_SUB(bn2q,bn2)))
‎
‎#define BN_QDIGIT_REMAINDER_CASE2_TEST_IMPL_CASE2(bn1,bn2,q,bn2q) \
‎(q)(BN_SUB(bn1,bn2q))
‎
‎#define BN_LESS_WITH_ZERO(bn1,bn2) \
‎BOOST_PP_IF( \
‎  BOOST_PP_EQUAL(BOOST_PP_SEQ_ELEM(0,bn1),0), \
‎  BN_LESS_WITH_ZERO_CASE1, \
‎  BN_LESS)(bn1,bn2)
‎
‎#define BN_LESS_WITH_ZERO_CASE1(bn1,bn2) \
‎BN_LESS(BOOST_PP_SEQ_TAIL(bn1),bn2)

The code above invokes the macro BN_DIV_RRR_DD(r1r2,r3,d1,d2) whose behavior is defined by:

BN_DIV_RRR_DD(r1r2,r3,d1,d2) = floor((r1r2·10 + r3)/(d1·10 + d2)).

The operation cannot be done directly because Boost.Preprocessor-based arithmetics are limited to numbers between 0 and BOOST_PP_LIMIT_MAG = 256. Instead, floor(r1r2/d1) is used as an initial upper-bound estimation which is then adjusted to the correct value using arithmetic operations that never exceed the BOOST_PP_LIMIT_MAG limit. The implementation of BN_DIV_RRR_DD is somewhat cumbersome but conceptually simple.

The complete implementation of all four arithmetic operations --addition, subtraction, multiplication and division-- is provided. Cpp bignum arithmetic is not very fast; for instance, given the following calculation of 30!:

‎BN_TO_LITERAL(
‎  BN_MUL((3)(0),BN_MUL((2)(9),BN_MUL((2)(8),BN_MUL((2)(7),
‎  BN_MUL((2)(6),BN_MUL((2)(5),BN_MUL((2)(4),BN_MUL((2)(3),
‎  BN_MUL((2)(2),BN_MUL((2)(1),BN_MUL((2)(0),BN_MUL((1)(9),
‎  BN_MUL((1)(8),BN_MUL((1)(7),BN_MUL((1)(6),BN_MUL((1)(5),
‎  BN_MUL((1)(4),BN_MUL((1)(3),BN_MUL((1)(2),BN_MUL((1)(1),
‎  BN_MUL((1)(0),BN_MUL((9)   ,BN_MUL((8)   ,BN_MUL((7),
‎  BN_MUL((6)   ,BN_MUL((5)   ,BN_MUL((4)   ,BN_MUL((3),
‎  (2))))))))))))))))))))))))))))))

it takes around 20 seconds for this expression to expand to

265252859812191058636308480000000

using GCC 3.4.4 for Cygwin in a Windows box with a 1.66 GHz Intel Core 2 Duo and 1 GB RAM. Within the same environment, the long division

‎BN_TO_LITERAL(
‎  BN_DIV(
‎    (2)(6)(5)(2)(5)(2)(8)(5)(9)(8)(1)(2)(1)(9)(1)(0)(5)
‎    (8)(6)(3)(6)(3)(0)(8)(4)(8)(0)(0)(0)(0)(0)(0)(0),
‎    (1)(8)(3)(7)(0)(8)(0)(0)))

takes around 12 seconds. In a later entry, we will explore ways to improve the performance of bignum multiplication and division through memoization techniques.

Bannalia: trivial notes on themes diverse

Saturday, July 19, 2008

Cpp bignum arithmetic: part II

No comments :

Post a Comment