Real Talk: Integer Arithmetic

Time for real talk, boys and girls. We gotta address the bread and butter of we who massage code into machines: basic inte­ger arith­metic. That’s right,

  • absolute value |x|
  • two-inte­ger aver­age \frac{a + b}{2}

I know, I know, you take your vita­min 0s and 1s and SELECT steel-cut rows for break­fast, but this is more funda­men­tal than that; it’s about the core algos you learned in univer­sity, and how you’re imple­ment­ing them with seri­ous bugs because you thought these two were so obvi.

Keep your suspenders in suspense though, because we’re not going to wave our hands and say anything is outside of the scope of our recruiter’s 30-minute phone screen. Today, we’re gonna ditch our rock­star coder company hood­ies and strap on fake embed­ded engi­neer neck­beards to look at those corner cases. Over­flows matter, we care about round­ing and fence­posts, and INT_MIN and INT_MAX are defi­nitely gonna be tested inputs.

abs

Let’s start simple. What’s the big deal with abs? Well, for one, the C stan­dard doesn’t actu­ally define this func­tion over all possi­ble inte­ger inputs1. In partic­u­lar, it isn’t defined for INT_MIN—better known to C++ coders as std::numeric_limits::min()2. Let’s give it a shot:

cout << numeric_limits<int>::min() << "\n";
cout << abs(numeric_limits<int>::min()) << "\n";

If we run this on a regu­lar x86 computer, we get this some­what unex­pected result:

-2147483648
-2147483648

That’s right; the absolute value of the most nega­tive 32-bit inte­ger is… the most nega­tive 32-bit inte­ger. This actu­ally happens for any bit width, and it’s really because of the way abs works on our 2’s comple­ment comput­ers. Let’s try to see it better:

\text{abs}(x) = \begin{cases} -x, & \text{if }x < 0 \text{, in other words } x \in [-2^{31}, 0) \\ x, & \text{if }x \geq 0 \text{, in other words } x \in [0, 2^{31}-1] \end{cases}

I included the 32-bit inte­ger limits to demon­strate the ranges of numbers that abs is forced to map one another. Let’s try to visu­al­ize how this mapping works:

\text{abs}(x) = \begin{cases} x, & \text{if }x \geq 0 \\ 1, & \text{if }x = -1 \\ 2, & \text{if }x = -2 \\ \vdots \\ 2^{31} - 2, & \text{if }x = -2^{31} + 2 \\ 2^{31} - 1, & \text{if }x = -2^{31} + 1 \text{ (note: }2^{31} - 1 \text{ is maximum signed 32-bit integer)} \\ ???, & \text{if }x = -2^{31} \end{cases}

That “???” is because 2^{31} can’t be repre­sented as a signed 32-bit inte­ger, yet \text{abs}(-2^{31}) needs to equal some­thing. Strictly follow­ing our earlier defi­n­i­tion of abs, we’d try to negate it anyways. By a cruel trick of binary arith­metic, we get -(-2^{31}) = -2^{31}.

Damn it abs. You had one job!

Well isn’t that just great? Because there are more nega­tive numbers than posi­tive numbers in conven­tional computer inte­ger math, it’s impos­si­ble to define an absolute value func­tion where nega­tive inputs always turn into their posi­tive coun­ter­parts. This inescapable conse­quence of includ­ing zero in a number system (binary) that has equal quan­ti­ties of even and odd numbers always both­ered me.

Now that I’ve rocked your world, what am I going to do about it? Well… I can’t fix abs. Sorry. But, I can raise aware­ness!

No wait, I got it. Real talk: we’re gonna use nega­tive absolute value, or nabs from now on. It’s like upside-down abs; it takes in inte­gers and negates the posi­tive ones. Check it:

\text{nabs}(x) = \begin{cases} -x, & \text{if }x > 0 \text{, in other words } x \in (0, 2^{31}-1]\\ x, & \text{if }x \leq 0 \text{, in other words } x \in [-2^{31}, 0] \end{cases}

BOOM. Posi­tive inte­gers map to their nega­tives, and nega­tive inte­gers stay where they are. Check it:

\text{nabs}(x) = \begin{cases} x, & \text{if }x \leq 0 \\ -1, & \text{if }x = 1 \\ -2, & \text{if }x = 2 \\ \vdots \\ -2^{31} + 2, & \text{if }x = 2^{31} - 2 \\ -2^{31} + 1, & \text{if }x = 2^{31} - 1 \\ \end{cases}

See? No prob­lems. Now you hate your­self for ever using abs. Here, have a docu­mented opti­mized branch-free imple­men­ta­tion on me (click to expand).

/**
 * Negative absolute value. Used to avoid undefined behavior for most negative
 * integer (see C99 standard 7.20.6.1.2 and footnote 265 for the description of
 * abs/labs/llabs behavior).
 *
 * @param i 32-bit signed integer
 * @return negative absolute value of i; defined for all values of i
 */
int32_t nabs(int32_t i) {
#if (((int32_t)-1) >> 1) == ((int32_t)-1)
    // signed right shift sign-extends (arithmetic)
    const int32_t negSign = ~(i >> 31); // splat sign bit into all 32 and complement
    // if i is positive (negSign is -1), xor will invert i and sub will add 1
    // otherwise i is unchanged
    return (i ^ negSign) - negSign;
#else
    return i < 0 ? i : -i;
#endif
}

But wait, you say. How do I use this pizza shate func­tion in my code? It gives me nasty nelly nega­tive numbers! WHAT I DO HERE LOL.

No worries! Think about when you normally use absolute value. Maybe you’re compar­ing the magni­tude of some possi­bly nega­tive number with a constant?

const int a = 51;
const int b = 85;

if (abs(a - b) < 100) {
    std::cout << "a and b are within 100 of each other." << std::endl;
}

This works just as well with our bullet­proof nabs. Just negate the constant and flip the logic!

if (nabs(a - b) > -100) {
    std::cout << "a and b are within 100 of each other." << std::endl;
}

Or maybe you’re comput­ing SAD on some pixels?

int sad = 0;
for (size_t i = 0; i < size; ++i) {
    sad += abs(p1[i] - p2[i]);
}

You could instead flip the sign and use nabs!

int sad = 0;
for (size_t i = 0; i < size; ++i) {
    sad -= nabs(p1[i] - p2[i]);
}

To be fair, this specific exam­ple didn’t fix the over­flow prob­lem. See, the only case for which abs and nabs are differ­ent is with INT_MIN. However, the expres­sions sad + abs(INT_MIN) and sad - nabs(INT_MIN) are actu­ally equiv­a­lent.

OK, maybe not that kind of sad.

Never­the­less, there are cases when this is useful. In this case, SAD is frequently computed on 8-bit pixels using a 16- or 32-bit accu­mu­la­tor for the differ­ence. So our nabs func­tion would be for 8-bit inte­gers, and sad + abs_8(CHAR_MIN) and sad - nabs_8(CHAR_MIN) would certainly not be equiv­a­lent. See?

int8_t abs_8(int8_t);
int8_t nabs_8(int8_t);

int sum = 0;
sum += abs_8(-128); // WRONG: this is like subtracting 128 from sad
sum -= nabs_8(-128); // RIGHT: this is like adding 128 to sad

Don’t forget about embed­ded systems coding either: on micro­con­trollers, you’ll use values of differ­ent bit widths all the time. What’s more, the signals coming in from ADCs and timer captures can be highly dynamic, often chang­ing non-deter­min­is­ti­cally. As in all cases when you can’t predict the inputs to your program, it’s crit­i­cal to use robust routines like nabs to safe­guard your robot or device from math errors.

avg

Oh look, I ran out of words. I guess I’ll have to post about over­flow-safe, round­ing-correct avg next time! Your broken merge­sort will just have to wait. 🙁

  1. C99 stan­dard 7.20.6.1.2 and foot­note 265 []
  2. The ++ in C++ is for extra verbosity. []