C++ Corrections
It takes some courage and a heap of self-importance to keep a blog alive for two decades without giving in to the urge to burn it all then either:
- reboot—not remake—the whole site with a fresh WordPress installation and a “Hello, World!” page that stands solo for four years, collecting vulnerabilities
- run screaming into the hills and let the sea swallow up the DNS caches
If you’re reading this, then I held fast against both compulsions*. The most terrifying part of preserving the teen-aged content in this new packaging has to be stumbling upon the sins of your past self. In my case, my past self also happened to be teenaged and even worse, a C++ embedded systems coder, and hence has a world to atone for.
My 2010 post on decoding Wii Nunchuk state packets made the cut for this site’s third decade. But first, I gotta set the record as straight as my third decade of writing C++ allows.
The Code’s Goals §
The layout of the I2C state reports from the Nunchuk make for an interesting problem: the fields of the report are quite different in size and their bits are splashed around in the report. As a reminder of what it looks like:
Byte ↓ | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|---|
0 | Jx[7:0] | |||||||
1 | Jy[7:0] | |||||||
2 | Ax[9:2] | |||||||
3 | Ay[9:2] | |||||||
4 | Az[9:2] | |||||||
5 | Az[1:0] | Ay[1:0] | Ax[1:0] | BC | BZ | |||
Bit → | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
Legend §
- J = joystick axis (unsigned, i.e. 27 at center)
- A = accelerometer axis (unsigned, i.e. 211 under zero acceleration)
- B = button state (0 when down)
This defined bit layout is what I’ll refer to as an example of a wire protocol or wire packet, where the “wire” is arguably a misnomer because we aren’t dealing with the physical order that the bits are transmitted; I2C is abstracting that away for us.
In my original code†, I needed to do two things to decode this 6-byte packet:
- Map fields to their locations in raw bytes by name
- Extract and assemble bits, i.e. fields smaller than a byte
typedef struct __attribute__((__packed__)) {
union {
struct {
uint8_t joyX;
uint8_t joyY;
uint8_t accelX_HI;
uint8_t accelY_HI;
uint8_t accelZ_HI;
uint8_t buttonZ : 1;
uint8_t buttonC : 1;
uint8_t accelX_LO : 2;
uint8_t accelY_LO : 2;
uint8_t accelZ_LO : 2;
};
uint8_t buffer[6];
};
} NunchukRegisters_t;
Both goals were (supposedly) fulfilled by writing to the buffer
field in this union, then reading out the fields of the anonymous internal struct, some of which have sub-octet‡ bit-widths defined. This looked great, and the code very much mirrored the structure of the data.
nunchukData.joyX = nunchukRegs.joyX;
nunchukData.joyY = nunchukRegs.joyY;
nunchukData.accelX = (nunchukRegs.accelX_HI << 2) | nunchukRegs.accelX_LO;
nunchukData.accelY = (nunchukRegs.accelY_HI << 2) | nunchukRegs.accelY_LO;
nunchukData.accelZ = (nunchukRegs.accelZ_HI << 2) | nunchukRegs.accelZ_LO;
nunchukData.buttonZ = !nunchukRegs.buttonZ;
nunchukData.buttonC = !nunchukRegs.buttonC;
Unfortunately, it was not so simple. Important nuances in C and C++ specifications and compiler writers’ direction made this kind of raw data destructuring a crime to write and a gamble to run. And as we’ll discover, the whole “struct
aliasing” approach to serialization is unsalvageable.
Problem 1: Type aliasing by union §
This problem is (or at least should be) well-known by systems programmers now: it’s undefined behavior (UB) in C++ to perform type-punning of an object by reading from a union member other than the one last written to§. This has to do with strict aliasing rules that dedicate memory used through the lifetime of an object to only be used for that object, hence fixing the type at that memory.
This kind of type punning is defined in C¶, so this isn’t a necessarily a problem with the original HÄRDBÖRD code, but I doubt 2010 Xo made that distinction.
This also used to work when compiler authors decided to enforce strict aliasing in pursuit of unlocking optimizations, but of course said authors still needed a way for coders to express the same intent to move bits. At this point in C++’s saga, the best way to express that you wish to move the bit representation of a type around while satisfying the rules around object lifetime is to create the object using std::bit_cast
or (less-ideally) create an object then memcpy
to and from it. The “right way” also enjoys proper optimization now—a difficult thing to come by in the transitional period when I first encountered this (GCC 4-ish).
A major advantage to using std::bit_cast
is that it can be constexpr
friendly:
constexpr unsigned char kRawData[] = {0xff, 0x00, 0x01, 0xa0, 0x04, 0x5b};
constexpr auto kRegister = std::bit_cast<NunchukRegisters_t>(kRawData);
However, if we run this through a few compilers, we find that Clang doesn’t yet support constexpr
std::bit_cast
with bit-fields and MSVC only supported it starting in version 16.10. This low level of support and general neglect of bit-fields from compiler writers is a great reason to avoid them.
Problem 2: Struct packing §
The original code relied on all the fields in the inner anonymous struct to be laid out tightly-packed in memory in the same order as declared.
“Same order as declared” is actually guaranteed by both C and C++♠, but the tight packing is not. In order to maintain alignment for different types, the compiler is free to add padding bytes that occupy space in the struct but are not storage for the elements.
With that said, the uint8
types in the original HÄRDBÖRD code do actually pack tightly regardless of the __attribute__((__packed__))
because no padding needs to be added to bring them into their natural alignment (which is 1). You can see that even in C++, this struct ends up at 6 bytes long, implying tight packing.
However, there’s a glaring error that the attribute is actually applied to the wrong struct
: the outer anonymous struct (which is itself superfluous♥) doesn’t directly hold the fields of the data, so applying the attribute means nothing.
To show this, I’ll add a test field to the struct:
typedef struct __attribute__((__packed__)) {
union {
struct {
uint8_t joyX;
+ uint16_t newField;
uint8_t joyY;
…
};
uint8_t buffer[6];
};
} NunchukRegisters_t;
In the same Godbolt workspace, we can see that this new two-byte field adds four bytes to the original struct: two as storage for the new newField
field, one as padding ahead of newField
to align it, and yet one more to ensure that the struct
itself is aligned to its greatest alignment member (in this case, making its size even for the two-byte uint16_t newField
).
To fix both the superfluous struct
wrapper and more importantly, the incorrect packing attribute placement, I’ll move it into the inner anonymous struct:
-typedef struct __attribute__((__packed__)) {
- union {
- struct {
+typedef union {
+ struct [[gnu::packed]] {
uint8_t joyX;
uint16_t newField;
uint8_t joyY;
…
};
uint8_t buffer[6];
- };
} NunchukRegisters_t;
At the same time, I’m also using the C++11 attribute syntax [[gnu::packed]]
♦ in place of __attribute__((__packed__))
because of its consistent placement for declaring types and gentle fallback for unsupported attributes.
Problem 3: Bit-fields bad §
As I showed in an earlier example, constexpr
, std::bit_cast
, and bit-fields don’t play nicely together. What else don’t work nicely together with bit-fields?
C/C++ addressing system—the entirety of C/C++’s memory system is byte-oriented: sizes are in byte units and addresses increment by one byte at a time. Using bits and bit-widths (regardless of whether they’re bit-fields or otherwise) introduces additional mental overhead that lives outside of that byte-sized world and that means bugs. Remember, bit positions and bit widths are not simply “addresses/sizes multiplied by
CHAR_BIT
” but in fact totally separate addressing systems that each exist within each integer object: when addressing bits of a byte, it’s not as if you can address the first bit of the next byte as the “ninth” bit.Parallel access—because bit-fields that are placed in the same storage unit or object are, from the C/C++ perspective, sharing a memory location♣, writing to bit-fields can’t be guaranteed to be atomic, making them a hazard to use for across threads or (more pertinently) interrupt contexts. This is usually because writing to a bit-field is a non-atomic “read-modify-write¤” combination rather than a single atomic operation, even in cases where the latter is available in the hardware**.
Packing/straddling—the C standard has rules about bit-fields being packed into storage units when possible, but it’s also intentionally lax about what happens when a bit-field doesn’t fit††:
If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.
Of course, there’s no warning for when you’re overflowing the current storage unit with bit-fields, like if we did this:
typedef union { struct [[gnu::packed]] { … uint8_t buttonZ : 1; uint8_t buttonC : 1; uint8_t accelX_LO : 2; uint8_t accelY_LO : 2; - uint8_t accelZ_LO : 2; + uint8_t accelZ_LO : 3; // 1+1+2+2+3 bits overflows the byte }; uint8_t buffer[6]; } NunchukRegisters_t;
Moreover, we can’t even introspectively detect where the
accelZ_LO
field ends up because it’s UB to take the address of a bit-field‡‡, to whichoffsetof
is no exception.Ordering of the bit-fields relative to each other—have you noticed that the bit-fields in the last byte of the struct are written in the reverse order of how it was written in the report layout table? This isn’t only unfortunate, but it’s yet again a case of implementation-defined behavior. The spec says§§:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.
But getting the bits from the right locations was the whole point of writing the Nunchuk decoder like this! The idea was to have the
struct
layout mirror the real wire layout, but given that bit-fields are the only way to represent fields smaller than a byte in C/C++, this restriction alone makes that goal impossible to achieve.
So not only are bit-fields a neglected stepchild of C/C++ and its compilers, incurring large mental cost and incompatibilities with other features, but also fail to deliver on the actual promise of reliably establishing a struct layout for wire transmission. There isn’t really a fix here for writing a struct that looks like the Nunchuk wire packet, other than looking to languages like Zig¶¶.
As far as I know, to implement wire protocol decoding at the bit level in C/C++, bitmasking is the only way to go. In the Nunchuk example, reading the lowest 2 bits of the y-axis acceleration might look like:
buffer[5] >> 4 & 0b11
When combined with code for assembling the most significant 8 bits, this doesn’t look quite so great:
constexpr int16_t GetJoyY(std::array<uint8_t, 6> report_bytes) {
return (uint16_t{report_bytes[1]} << 2) | (report_bytes[5] >> 4 & 0b11);
}
And it’s even worse if you hide some of the magic number constants and put it under a class indentation.
class NunchukReport {
…
constexpr int16_t joyX() const {
return (uint16_t{bytes_[JOY_X]} << kAccelLoBitWidth) |
(bytes_[REG_5] >> kRegister5AccelXLoBits &
MakeBitMask(kAccelLoBitWidth));
}
…
};
But, we chose C++ and this is our life now.
Conclusion §
If I had to write this Nunchuk wire protocol decoder today, I’d be careful to work with the wire bytes directly rather than type punning them to a loosely-packed struct
(even with std::bit_cast
available). Given that bit-fields are not practical to use to refer to the smallest fields in the layout, we might as well address the bytes by offset using named constants:
enum RegisterOffset {
JOY_X,
JOY_Y,
ACCEL_X_HI,
ACCEL_Y_HI,
ACCEL_Z_HI,
REG_5,
};
I would then massage the bits out using the tried-and-true methods of shifting and masking. Basically, this is a barely-dressed up way to programmatically reach into the byte buffer and pull out the bits we want rather than aliasing them into a named, formatted schema.
What if I wanted to handle lots of different wire packet formats? I’m not sure if there is a good solution. At Google, Emboss looked great for systems that could handle more tooling than a barebones CMake + compiler setup. Kaitai Struct and Protlr are similar in that they are also standalone tools defining a declarative schema that necessarily add a code generation stage. But, I’ve yet to see any (let alone a compelling) library that allows the developer to define a schema in only C++♠♠. Libraries like Frozen certainly prove that compile-time data is practical to traverse and hence use for generating run-time code, so there is hope.
Finally, at a higher level of code organization, I would make sure to inject the underlying I2C library dependency then separate the wire communication part of the code from the wire decoding part. This would allow the I2C flow sequencing and data decryption to be tested separately from the hardware as well as separately from the bit decoding described above. Testing how bits move from the raw buffer into a 16-bit output should not require a real microcontroller hardware harnessed to a real Nunchuk.
However, with all that said, the original code does actually work without UB even though that might have been an afterthought: union type punning is defined in C, struct packedness isn’t necessary given the layout, and bit-field order is consistent for a given compiler and target. At any rate, 2010 Xo definitely acquitted himself by writing the whole project in less time than it took for 2023 Xo to write this post. ∎