Real Talk: Integer Arithmetic

2013/8/14

Time for real talk, friends-o. We gotta address the bread and butter of we who massage code into machines: basic integer arithmetic. That’s right,

absolute value $| x |$
two-integer average $\frac{a + b}{2}$

I know, I know, you take your vitamin 0s and 1s and SELECT steel-cut rows for breakfast, but this is more fundamental than that; it’s about the core algos you learned in university, and how you’re implementing them with serious bugs because you thought these two were so obvi.

Keep your suspenders in suspense though, because we’re not going to wave our hands and say anything is outside of the scope of our recruiter’s 30-minute phone screen. Today, we’re gonna ditch our rockstar coder company hoodies and strap on fake embedded engineer neckbeards to look at those corner cases. Overflows matter, we care about rounding and fenceposts, and INT_MIN and INT_MAX are definitely gonna be tested inputs.

abs §

Let’s start simple. What’s the big deal with abs? Well, for one, the C standard doesn’t actually define this function over all possible integer inputs^*. In particular, it isn’t defined for INT_MIN—better known to C++ coders as std::numeric_limits<int>::min()^†. Let’s give it a shot:

cout << numeric_limits<int>::min() << "\n";
cout << abs(numeric_limits<int>::min()) << "\n";

If we run this on a regular x86 computer, we get this somewhat unexpected result:

-2147483648
-2147483648

That’s right; the absolute value of the most negative 32-bit integer is... the most negative 32-bit integer. This actually happens for any bit width, and it’s really because of the way abs works on our 2’s complement computers. Let’s try to see it better:

abs (x) = {\begin{cases} - x, & if x < 0, in other words x \in [- 2^{31}, 0) \\ x, & if x \geq 0, in other words x \in [0, 2^{31} - 1] \end{cases}

I included the 32-bit integer limits to demonstrate the ranges of numbers that abs is forced to map one another. Let's try to visualize how this mapping works:

abs (x) = {\begin{cases} x, & if x \geq 0 \\ 1, & if x = - 1 \\ 2, & if x = - 2 \\ ⋮ \\ 2^{31} - 2, & if x = - 2^{31} + 2 \\ 2^{31} - 1, & if x = - 2^{31} + 1 (note: 2^{31} - 1 is maximum signed 32-bit integer) \\ ? ? ?, & if x = - 2^{31} \end{cases}

That “???” is because $2^{31}$ can’t be represented as a signed 32-bit integer, yet $abs (- 2^{31})$ needs to equal something. Strictly following our earlier definition of abs, we’d try to negate it anyways. By a cruel trick of binary arithmetic, we get $- (- 2^{31}) = - 2^{31}$ .

Well isn't that just great? Because there are more negative numbers than positive numbers in conventional computer integer math, it's impossible to define an absolute value function where negative inputs always turn into their positive counterparts. This inescapable consequence of including zero in a number system (binary) that has equal quantities of even and odd numbers always bothered me.

Now that I’ve rocked your world, what am I going to do about it? Well… I can’t fix abs. Sorry. But, I can raise awareness!

No wait, I got it. Real talk: we’re gonna use negative absolute value, or nabs from now on. It’s like upside-down abs; it takes in integers and negates the positive ones. Check it:

nabs (x) = {\begin{cases} - x, & if x > 0, in other words x \in (0, 2^{31} - 1] \\ x, & if x \leq 0, in other words x \in [- 2^{31}, 0] \end{cases}

BOOM. Positive integers map to their negatives, and negative integers stay where they are. Check it:

nabs (x) = {\begin{cases} x, & if x \leq 0 \\ - 1, & if x = 1 \\ - 2, & if x = 2 \\ ⋮ \\ - 2^{31} + 2, & if x = 2^{31} - 2 \\ - 2^{31} + 1, & if x = 2^{31} - 1 \end{cases}

See? No problems. Now you hate yourself for ever using abs. Here, have a documented branch-free implementation on me.

/* SPDX-License-Identifier: MIT OR Apache-2.0 */
/**
 * Negative absolute value. Used to avoid undefined behavior for most negative
 * integer (see C99 standard 7.20.6.1.2 and footnote 265 for the description of
 * abs/labs/llabs behavior).
 *
 * @param i 32-bit signed integer
 * @return negative absolute value of i; defined for all values of i
 */
int32_t nabs(int32_t i) {
#if (((int32_t)-1) >> 1) == ((int32_t)-1)
    // signed right shift sign-extends (arithmetic)
    const int32_t negSign = ~(i >> 31); // splat sign bit into all 32 and complement
    // if i is positive (negSign is -1), xor will invert i and sub will add 1
    // otherwise i is unchanged
    return (i ^ negSign) - negSign;
#else
    return i < 0 ? i : -i;
#endif
}

But wait, you say. How do I use this pizza shate function in my code? It gives me nasty nelly negative numbers! WHAT I DO HERE LOL.

No worries! Think about when you normally use absolute value. Maybe you’re comparing the magnitude of some possibly negative number with a constant?

const int a = 51;
const int b = 85;

if (abs(a - b) < 100) {
    std::cout << "a and b are within 100 of each other." << std::endl;
}

This works just as well with our bulletproof nabs. Just negate the constant and flip the logic!

if (nabs(a - b) > -100) {
    std::cout << "a and b are within 100 of each other." << std::endl;
}

Or maybe you’re computing SAD on some pixels?

int sad = 0;
for (size_t i = 0; i < size; ++i) {
    sad += abs(p1[i] - p2[i]);
}

You could instead flip the sign and use nabs!

int sad = 0;
for (size_t i = 0; i < size; ++i) {
    sad -= nabs(p1[i] - p2[i]);
}

To be fair, this specific example didn’t fix the overflow problem. See, the only case for which abs and nabs are different is with INT_MIN. However, the expressions sad + abs(INT_MIN) and sad - nabs(INT_MIN) are actually equivalent.

Nevertheless, there are cases when this is useful. In this case, SAD is frequently computed on 8-bit pixels using a 16- or 32-bit accumulator for the difference. So our nabs function would be for 8-bit integers, and sad + abs_8(CHAR_MIN) and sad - nabs_8(CHAR_MIN) would certainly not be equivalent. See?

int8_t abs_8(int8_t);
int8_t nabs_8(int8_t);

int sum = 0;
sum += abs_8(-128); // WRONG: this is like subtracting 128 from sad
sum -= nabs_8(-128); // RIGHT: this is like adding 128 to sad

Don’t forget about embedded systems coding either: on microcontrollers, you’ll use values of different bit widths all the time. What’s more, the signals coming in from ADCs and timer captures can be highly dynamic, often changing non-deterministically. As in all cases when you can’t predict the inputs to your program, it’s critical to use robust routines like nabs to safeguard your robot or device from math errors.

avg §

Oh look, I ran out of words. I guess I’ll have to post about overflow-safe, rounding-correct avg next time! Your broken mergesort will just have to wait. 🙁

Real Talk: Integer Arithmetic

abs §

avg §

Every Post by Year