Real Talk: Integer Arithmetic

Time for real talk, boys and girls. We got­ta address the bread and but­ter of we who mas­sage code into machi­nes: basic inte­ger arith­metic. That’s right,

• absolute val­ue
• two-inte­ger aver­age

I know, I know, you take your vit­a­m­in 0s and 1s and SELECT steel-cut rows for break­fast, but this is more fun­da­men­tal than that; it’s about the core algos you learned in uni­ver­si­ty, and how you’re imple­ment­ing them with seri­ous bugs because you thought the­se two were so obvi.

Keep your sus­penders in sus­pense though, because we’re not going to wave our hands and say any­thing is out­side of the scope of our recruiter’s 30-min­ute phone screen. Today, we’re gonna ditch our rock­star coder com­pa­ny hood­ies and strap on fake embed­ded engi­neer neck­beards to look at those cor­ner cas­es. Over­flows mat­ter, we care about round­ing and fen­ce­posts, and INT_MIN and INT_MAX are def­i­nite­ly gonna be test­ed inputs.

abs

Let’s start sim­ple. What’s the big deal with abs? Well, for one, the C stan­dard doesn’t actu­al­ly define this func­tion over all pos­si­ble inte­ger inputs1. In par­tic­u­lar, it isn’t defined for INT_MIN—bet­ter known to C++ coders as std::numeric_limits::min()2. Let’s give it a shot:

cout << numeric_limits<int>::min() << "\n";
cout << abs(numeric_limits<int>::min()) << "\n";

If we run this on a reg­u­lar x86 com­put­er, we get this some­what unex­pect­ed result:

-2147483648
-2147483648

That’s right; the absolute val­ue of the most neg­a­tive 32-bit inte­ger is… the most neg­a­tive 32-bit inte­ger. This actu­al­ly hap­pens for any bit width, and it’s real­ly because of the way abs works on our 2’s com­ple­ment com­put­ers. Let’s try to see it bet­ter:

I includ­ed the 32-bit inte­ger lim­its to demon­strate the ranges of num­bers that abs is forced to map one anoth­er. Let’s try to visu­al­ize how this map­ping works:

That “???” is because can’t be rep­re­sent­ed as a signed 32-bit inte­ger, yet needs to equal some­thing. Strict­ly fol­low­ing our ear­lier def­i­n­i­tion of abs, we’d try to negate it any­ways. By a cru­el trick of bina­ry arith­metic, we get .

Well isn’t that just great? Because there are more neg­a­tive num­bers than pos­i­tive num­bers in con­ven­tion­al com­put­er inte­ger math, it’s impos­si­ble to define an absolute val­ue func­tion where neg­a­tive inputs always turn into their pos­i­tive coun­ter­parts. This inescapable con­se­quence of includ­ing zero in a num­ber sys­tem (bina­ry) that has equal quan­ti­ties of even and odd num­bers always both­ered me.

Now that I’ve rocked your world, what am I going to do about it? Well… I can’t fix abs. Sor­ry. But, I can raise aware­ness!

No wait, I got it. Real talk: we’re gonna use neg­a­tive absolute val­ue, or nabs from now on. It’s like upside-down abs; it takes in inte­gers and negates the pos­i­tive ones. Check it:

BOOM. Pos­i­tive inte­gers map to their neg­a­tives, and neg­a­tive inte­gers stay where they are. Check it:

See? No prob­lems. Now you hate your­self for ever using abs. Here, have a doc­u­ment­ed opti­mized branch-free imple­men­ta­tion on me (click to expand).

/**
* Negative absolute value. Used to avoid undefined behavior for most negative
* integer (see C99 standard 7.20.6.1.2 and footnote 265 for the description of
* abs/labs/llabs behavior).
*
* @param i 32-bit signed integer
* @return negative absolute value of i; defined for all values of i
*/
int32_t nabs(int32_t i) {
#if (((int32_t)-1) >> 1) == ((int32_t)-1)
// signed right shift sign-extends (arithmetic)
const int32_t negSign = ~(i >> 31); // splat sign bit into all 32 and complement
// if i is positive (negSign is -1), xor will invert i and sub will add 1
// otherwise i is unchanged
return (i ^ negSign) - negSign;
#else
return i < 0 ? i : -i;
#endif
}

But wait, you say. How do I use this piz­za shate func­tion in my code? It gives me nasty nel­ly neg­a­tive num­bers! WHAT I DO HERE LOL.

No wor­ries! Think about when you nor­mal­ly use absolute val­ue. May­be you’re com­par­ing the mag­ni­tude of some pos­si­bly neg­a­tive num­ber with a con­stant?

const int a = 51;
const int b = 85;

if (abs(a - b) < 100) {
std::cout << "a and b are within 100 of each other." << std::endl;
}

This works just as well with our bul­let­proof nabs. Just negate the con­stant and flip the log­ic!

if (nabs(a - b) > -100) {
std::cout << "a and b are within 100 of each other." << std::endl;
}

Or may­be you’re com­put­ing SAD on some pix­els?

int sad = 0;
for (size_t i = 0; i < size; ++i) {
sad += abs(p1[i] - p2[i]);
}

You could instead flip the sign and use nabs!

int sad = 0;
for (size_t i = 0; i < size; ++i) {
sad -= nabs(p1[i] - p2[i]);
}

To be fair, this speci­fic exam­ple didn’t fix the over­flow prob­lem. See, the only case for which abs and nabs are dif­fer­ent is with INT_MIN. How­ev­er, the expres­sions sad + abs(INT_MIN) and sad - nabs(INT_MIN) are actu­al­ly equiv­a­lent.

Nev­er­the­less, there are cas­es when this is use­ful. In this case, SAD is fre­quent­ly com­put­ed on 8-bit pix­els using a 16- or 32-bit accu­mu­la­tor for the dif­fer­ence. So our nabs func­tion would be for 8-bit inte­gers, and sad + abs_8(CHAR_MIN) and sad - nabs_8(CHAR_MIN) would cer­tain­ly not be equiv­a­lent. See?

int8_t abs_8(int8_t);
int8_t nabs_8(int8_t);

int sum = 0;
sum += abs_8(-128); // WRONG: this is like subtracting 128 from sad
sum -= nabs_8(-128); // RIGHT: this is like adding 128 to sad

Don’t for­get about embed­ded sys­tems cod­ing either: on micro­con­trollers, you’ll use val­ues of dif­fer­ent bit widths all the time. What’s more, the sig­nals com­ing in from ADCs and timer cap­tures can be high­ly dynam­ic, often chang­ing non-deter­min­is­ti­cal­ly. As in all cas­es when you can’t pre­dict the inputs to your pro­gram, it’s crit­i­cal to use robust rou­ti­nes like nabs to safe­guard your robot or device from math errors.

avg

Oh look, I ran out of words. I guess I’ll have to post about over­flow-safe, round­ing-cor­rect avg next time! Your bro­ken merge­sort will just have to wait. 🙁

1. C99 stan­dard 7.20.6.1.2 and foot­note 265 []
2. The ++ in C++ is for extra ver­bosi­ty. []