Real Talk: Integer Arithmetic
Time for real talk, friends-o. We gotta address the bread and butter of we who massage code into machines: basic integer arithmetic. That’s right,
- absolute value
- two-integer average
I know, I know, you take your vitamin 0s and 1s and SELECT
steel-cut rows for breakfast, but this is more fundamental than that; it’s about the core algos you learned in university, and how you’re implementing them with serious bugs because you thought these two were so obvi.
Keep your suspenders in suspense though, because we’re not going to wave our hands and say anything is outside of the scope of our recruiter’s 30-minute phone screen. Today, we’re gonna ditch our rockstar coder company hoodies and strap on fake embedded engineer neckbeards to look at those corner cases. Overflows matter, we care about rounding and fenceposts, and INT_MIN
and INT_MAX
are definitely gonna be tested inputs.
abs §
Let’s start simple. What’s the big deal with abs
? Well, for one, the C standard doesn’t actually define this function over all possible integer inputs*. In particular, it isn’t defined for INT_MIN
—better known to C++ coders as std::numeric_limits<int>::min()
†. Let’s give it a shot:
cout << numeric_limits<int>::min() << "\n";
cout << abs(numeric_limits<int>::min()) << "\n";
If we run this on a regular x86 computer, we get this somewhat unexpected result:
-2147483648
-2147483648
That’s right; the absolute value of the most negative 32-bit integer is... the most negative 32-bit integer. This actually happens for any bit width, and it’s really because of the way abs
works on our 2’s complement computers. Let’s try to see it better:
I included the 32-bit integer limits to demonstrate the ranges of numbers that abs
is forced to map one another. Let's try to visualize how this mapping works:
That “???” is becauseabs
, we’d try to negate it anyways. By a cruel trick of binary arithmetic, we get
Well isn't that just great? Because there are more negative numbers than positive numbers in conventional computer integer math, it's impossible to define an absolute value function where negative inputs always turn into their positive counterparts. This inescapable consequence of including zero in a number system (binary) that has equal quantities of even and odd numbers always bothered me.
Now that I’ve rocked your world, what am I going to do about it? Well… I can’t fix abs
. Sorry. But, I can raise awareness!
No wait, I got it. Real talk: we’re gonna use negative absolute value, or nabs
from now on. It’s like upside-down abs
; it takes in integers and negates the positive ones. Check it:
BOOM. Positive integers map to their negatives, and negative integers stay where they are. Check it:
See? No problems. Now you hate yourself for ever using abs
. Here, have a documented branch-free implementation on me.
/* SPDX-License-Identifier: MIT OR Apache-2.0 */
/**
* Negative absolute value. Used to avoid undefined behavior for most negative
* integer (see C99 standard 7.20.6.1.2 and footnote 265 for the description of
* abs/labs/llabs behavior).
*
* @param i 32-bit signed integer
* @return negative absolute value of i; defined for all values of i
*/
int32_t nabs(int32_t i) {
#if (((int32_t)-1) >> 1) == ((int32_t)-1)
// signed right shift sign-extends (arithmetic)
const int32_t negSign = ~(i >> 31); // splat sign bit into all 32 and complement
// if i is positive (negSign is -1), xor will invert i and sub will add 1
// otherwise i is unchanged
return (i ^ negSign) - negSign;
#else
return i < 0 ? i : -i;
#endif
}
But wait, you say. How do I use this pizza shate function in my code? It gives me nasty nelly negative numbers! WHAT I DO HERE LOL.
No worries! Think about when you normally use absolute value. Maybe you’re comparing the magnitude of some possibly negative number with a constant?
const int a = 51;
const int b = 85;
if (abs(a - b) < 100) {
std::cout << "a and b are within 100 of each other." << std::endl;
}
This works just as well with our bulletproof nabs
. Just negate the constant and flip the logic!
if (nabs(a - b) > -100) {
std::cout << "a and b are within 100 of each other." << std::endl;
}
Or maybe you’re computing SAD on some pixels?
int sad = 0;
for (size_t i = 0; i < size; ++i) {
sad += abs(p1[i] - p2[i]);
}
You could instead flip the sign and use nabs
!
int sad = 0;
for (size_t i = 0; i < size; ++i) {
sad -= nabs(p1[i] - p2[i]);
}
To be fair, this specific example didn’t fix the overflow problem. See, the only case for which abs
and nabs
are different is with INT_MIN
. However, the expressions sad + abs(INT_MIN)
and sad - nabs(INT_MIN)
are actually equivalent.
Nevertheless, there are cases when this is useful. In this case, SAD is frequently computed on 8-bit pixels using a 16- or 32-bit accumulator for the difference. So our nabs
function would be for 8-bit integers, and sad + abs_8(CHAR_MIN)
and sad - nabs_8(CHAR_MIN)
would certainly not be equivalent. See?
int8_t abs_8(int8_t);
int8_t nabs_8(int8_t);
int sum = 0;
sum += abs_8(-128); // WRONG: this is like subtracting 128 from sad
sum -= nabs_8(-128); // RIGHT: this is like adding 128 to sad
Don’t forget about embedded systems coding either: on microcontrollers, you’ll use values of different bit widths all the time. What’s more, the signals coming in from ADCs and timer captures can be highly dynamic, often changing non-deterministically. As in all cases when you can’t predict the inputs to your program, it’s critical to use robust routines like nabs
to safeguard your robot or device from math errors.
avg §
Oh look, I ran out of words. I guess I’ll have to post about overflow-safe, rounding-correct avg
next time! Your broken mergesort will just have to wait. 🙁