C++ Corrections

2023/10/24

It takes some courage and a heap of self-importance to keep a blog alive for two decades without giving in to the urge to burn it all then either:

reboot—not remake—the whole site with a fresh WordPress installation and a “Hello, World!” page that stands solo for four years, collecting vulnerabilities
run screaming into the hills and let the sea swallow up the DNS caches

If you’re reading this, then I held fast against both compulsions^*. The most terrifying part of preserving the teen-aged content in this new packaging has to be stumbling upon the sins of your past self. In my case, my past self also happened to be teenaged and even worse, a C++ embedded systems coder, and hence has a world to atone for.

My 2010 post on decoding Wii Nunchuk state packets made the cut for this site’s third decade. But first, I gotta set the record as straight as my third decade of writing C++ allows.

The Code’s Goals §

The layout of the I²C state reports from the Nunchuk make for an interesting problem: the fields of the report are quite different in size and their bits are splashed around in the report. As a reminder of what it looks like:

Byte ↓	7	6	5	4	3	2	1	0
0	J_x[7:0]
1	J_y[7:0]
2	A_x[9:2]
3	A_y[9:2]
4	A_z[9:2]
5	A_z[1:0]		A_y[1:0]		A_x[1:0]		B_C	B_Z
Bit →	7	6	5	4	3	2	1	0

Legend §

J = joystick axis (unsigned, i.e. 2⁷ at center)
A = accelerometer axis (unsigned, i.e. 2¹¹ under zero acceleration)
B = button state (0 when down)

This defined bit layout is what I’ll refer to as an example of a wire protocol or wire packet, where the “wire” is arguably a misnomer because we aren’t dealing with the physical order that the bits are transmitted; I²C is abstracting that away for us.

In my original code^†, I needed to do two things to decode this 6-byte packet:

Map fields to their locations in raw bytes by name
Extract and assemble bits, i.e. fields smaller than a byte

typedef struct __attribute__((__packed__)) {
  union {
    struct {
      uint8_t joyX;
      uint8_t joyY;
      uint8_t accelX_HI;
      uint8_t accelY_HI;
      uint8_t accelZ_HI;
      uint8_t buttonZ   : 1;
      uint8_t buttonC   : 1;
      uint8_t accelX_LO : 2;
      uint8_t accelY_LO : 2;
      uint8_t accelZ_LO : 2;
    };

    uint8_t buffer[6];
  };
} NunchukRegisters_t;

Both goals were (supposedly) fulfilled by writing to the buffer field in this union, then reading out the fields of the anonymous internal struct, some of which have sub-octet^‡ bit-widths defined. This looked great, and the code very much mirrored the structure of the data.

nunchukData.joyX = nunchukRegs.joyX;
nunchukData.joyY = nunchukRegs.joyY;
nunchukData.accelX = (nunchukRegs.accelX_HI << 2) | nunchukRegs.accelX_LO;
nunchukData.accelY = (nunchukRegs.accelY_HI << 2) | nunchukRegs.accelY_LO;
nunchukData.accelZ = (nunchukRegs.accelZ_HI << 2) | nunchukRegs.accelZ_LO;
nunchukData.buttonZ = !nunchukRegs.buttonZ;
nunchukData.buttonC = !nunchukRegs.buttonC;

Unfortunately, it was not so simple. Important nuances in C and C++ specifications and compiler writers’ direction made this kind of raw data destructuring a crime to write and a gamble to run. And as we’ll discover, the whole “struct aliasing” approach to serialization is unsalvageable.

Problem 1: Type aliasing by union §

This problem is (or at least should be) well-known by systems programmers now: it’s undefined behavior (UB) in C++ to perform type-punning of an object by reading from a union member other than the one last written to^§. This has to do with strict aliasing rules that dedicate memory used through the lifetime of an object to only be used for that object, hence fixing the type at that memory.

This kind of type punning is defined in C^¶, so this isn’t a necessarily a problem with the original HÄRDBÖRD code, but I doubt 2010 Xo made that distinction.

This also used to work when compiler authors decided to enforce strict aliasing in pursuit of unlocking optimizations, but of course said authors still needed a way for coders to express the same intent to move bits. At this point in C++’s saga, the best way to express that you wish to move the bit representation of a type around while satisfying the rules around object lifetime is to create the object using std::bit_cast or (less-ideally) create an object then memcpy to and from it. The “right way” also enjoys proper optimization now—a difficult thing to come by in the transitional period when I first encountered this (GCC 4-ish).

A major advantage to using std::bit_cast is that it can be constexpr friendly:

constexpr unsigned char kRawData[] = {0xff, 0x00, 0x01, 0xa0, 0x04, 0x5b};
constexpr auto kRegister = std::bit_cast<NunchukRegisters_t>(kRawData);

However, if we run this through a few compilers, we find that Clang doesn’t yet support constexpr std::bit_cast with bit-fields and MSVC only supported it starting in version 16.10. This low level of support and general neglect of bit-fields from compiler writers is a great reason to avoid them.

Problem 2: Struct packing §

The original code relied on all the fields in the inner anonymous struct to be laid out tightly-packed in memory in the same order as declared.

“Same order as declared” is actually guaranteed by both C and C++^♠, but the tight packing is not. In order to maintain alignment for different types, the compiler is free to add padding bytes that occupy space in the struct but are not storage for the elements.

With that said, the uint8 types in the original HÄRDBÖRD code do actually pack tightly regardless of the __attribute__((__packed__)) because no padding needs to be added to bring them into their natural alignment (which is 1). You can see that even in C++, this struct ends up at 6 bytes long, implying tight packing.

However, there’s a glaring error that the attribute is actually applied to the wrong struct: the outer anonymous struct (which is itself superfluous^♥) doesn’t directly hold the fields of the data, so applying the attribute means nothing.

To show this, I’ll add a test field to the struct:

 typedef struct __attribute__((__packed__)) {
   union {
     struct {
       uint8_t joyX;
+      uint16_t newField;
       uint8_t joyY;
       …
     };

     uint8_t buffer[6];
   };
 } NunchukRegisters_t;

In the same Godbolt workspace, we can see that this new two-byte field adds four bytes to the original struct: two as storage for the new newField field, one as padding ahead of newField to align it, and yet one more to ensure that the struct itself is aligned to its greatest alignment member (in this case, making its size even for the two-byte uint16_t newField).

To fix both the superfluous struct wrapper and more importantly, the incorrect packing attribute placement, I’ll move it into the inner anonymous struct:

-typedef struct __attribute__((__packed__)) {
-  union {
-    struct {
+typedef union {
+  struct [[gnu::packed]] {
       uint8_t joyX;
       uint16_t newField;
       uint8_t joyY;
 …
     };

     uint8_t buffer[6];
-  };
 } NunchukRegisters_t;

At the same time, I’m also using the C++11 attribute syntax [[gnu::packed]]^♦ in place of __attribute__((__packed__)) because of its consistent placement for declaring types and gentle fallback for unsupported attributes.

Problem 3: Bit-fields bad §

As I showed in an earlier example, constexpr, std::bit_cast, and bit-fields don’t play nicely together. What else don’t work nicely together with bit-fields?

C/C++ addressing system—the entirety of C/C++’s memory system is byte-oriented: sizes are in byte units and addresses increment by one byte at a time. Using bits and bit-widths (regardless of whether they’re bit-fields or otherwise) introduces additional mental overhead that lives outside of that byte-sized world and that means bugs. Remember, bit positions and bit widths are not simply “addresses/sizes multiplied by CHAR_BIT” but in fact totally separate addressing systems that each exist within each integer object: when addressing bits of a byte, it’s not as if you can address the first bit of the next byte as the “ninth” bit.
Parallel access—because bit-fields that are placed in the same storage unit or object are, from the C/C++ perspective, sharing a memory location^♣, writing to bit-fields can’t be guaranteed to be atomic, making them a hazard to use for across threads or (more pertinently) interrupt contexts. This is usually because writing to a bit-field is a non-atomic “read-modify-write^¤” combination rather than a single atomic operation, even in cases where the latter is available in the hardware^**.
Packing/straddling—the C standard has rules about bit-fields being packed into storage units when possible, but it’s also intentionally lax about what happens when a bit-field doesn’t fit^††:
If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.
Of course, there’s no warning for when you’re overflowing the current storage unit with bit-fields, like if we did this:
```
 typedef union {
   struct [[gnu::packed]] {
     …
     uint8_t buttonZ   : 1;
     uint8_t buttonC   : 1;
     uint8_t accelX_LO : 2;
     uint8_t accelY_LO : 2;
-    uint8_t accelZ_LO : 2;
+    uint8_t accelZ_LO : 3;  // 1+1+2+2+3 bits overflows the byte
   };

   uint8_t buffer[6];
 } NunchukRegisters_t;
```
Moreover, we can’t even introspectively detect where the accelZ_LO field ends up because it’s UB to take the address of a bit-field^‡‡, to which offsetof is no exception.
Ordering of the bit-fields relative to each other—have you noticed that the bit-fields in the last byte of the struct are written in the reverse order of how it was written in the report layout table? This isn’t only unfortunate, but it’s yet again a case of implementation-defined behavior. The spec says^§§:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.
But getting the bits from the right locations was the whole point of writing the Nunchuk decoder like this! The idea was to have the struct layout mirror the real wire layout, but given that bit-fields are the only way to represent fields smaller than a byte in C/C++, this restriction alone makes that goal impossible to achieve.

So not only are bit-fields a neglected stepchild of C/C++ and its compilers, incurring large mental cost and incompatibilities with other features, but also fail to deliver on the actual promise of reliably establishing a struct layout for wire transmission. There isn’t really a fix here for writing a struct that looks like the Nunchuk wire packet, other than looking to languages like Zig^¶¶.

As far as I know, to implement wire protocol decoding at the bit level in C/C++, bitmasking is the only way to go. In the Nunchuk example, reading the lowest 2 bits of the y-axis acceleration might look like:

buffer[5] >> 4 & 0b11

When combined with code for assembling the most significant 8 bits, this doesn’t look quite so great:

constexpr int16_t GetJoyY(std::array<uint8_t, 6> report_bytes) {
    return (uint16_t{report_bytes[1]} << 2) | (report_bytes[5] >> 4 & 0b11);
}

And it’s even worse if you hide some of the magic number constants and put it under a class indentation.

class NunchukReport {
…
  constexpr int16_t joyX() const {
    return (uint16_t{bytes_[JOY_X]} << kAccelLoBitWidth) |
           (bytes_[REG_5] >> kRegister5AccelXLoBits &
            MakeBitMask(kAccelLoBitWidth));
  }
…
};

But, we chose C++ and this is our life now.

Conclusion §

If I had to write this Nunchuk wire protocol decoder today, I’d be careful to work with the wire bytes directly rather than type punning them to a loosely-packed struct (even with std::bit_cast available). Given that bit-fields are not practical to use to refer to the smallest fields in the layout, we might as well address the bytes by offset using named constants:

enum RegisterOffset {
  JOY_X,
  JOY_Y,
  ACCEL_X_HI,
  ACCEL_Y_HI,
  ACCEL_Z_HI,
  REG_5,
};

I would then massage the bits out using the tried-and-true methods of shifting and masking. Basically, this is a barely-dressed up way to programmatically reach into the byte buffer and pull out the bits we want rather than aliasing them into a named, formatted schema.

What if I wanted to handle lots of different wire packet formats? I’m not sure if there is a good solution. At Google, Emboss looked great for systems that could handle more tooling than a barebones CMake + compiler setup. Kaitai Struct and Protlr are similar in that they are also standalone tools defining a declarative schema that necessarily add a code generation stage. But, I’ve yet to see any (let alone a compelling) library that allows the developer to define a schema in only C++^♠♠. Libraries like Frozen certainly prove that compile-time data is practical to traverse and hence use for generating run-time code, so there is hope.

Finally, at a higher level of code organization, I would make sure to inject the underlying I²C library dependency then separate the wire communication part of the code from the wire decoding part. This would allow the I²C flow sequencing and data decryption to be tested separately from the hardware as well as separately from the bit decoding described above. Testing how bits move from the raw buffer into a 16-bit output should not require a real microcontroller hardware harnessed to a real Nunchuk.

However, with all that said, the original code does actually work without UB even though that might have been an afterthought: union type punning is defined in C, struct packedness isn’t necessary given the layout, and bit-field order is consistent for a given compiler and target. At any rate, 2010 Xo definitely acquitted himself by writing the whole project in less time than it took for 2023 Xo to write this post. ∎

Feetnote §

For now. Though, this site’s “dark year” while I contemplated the latter has cut its “link juice” (“SEO mojo?” “Google voodoo?”) to a trickle. ⏎
Shown here using <cstdint>’s uint8_t instead of uint8, which was provided by the Reese PIC24 startup code/HAL I used at the time. ⏎
As in, less than 8. ⏎
By way of exclusion from the list at ISO/IEC 14882:2017 §8.2.1 [basic.lval] ¶11 ⏎
ISO/IEC 9899:2018 §6.5.2.3 ¶3, footnote 97 ⏎
And notably not by Rust, by default. ⏎
There might be some value to ensuring that the wrapper is ultimately struct instead of union but not in this case of a inner type used by the decoder routine. ⏎
The gnu namespacing for attributes is accepted by both Clang and GCC. ⏎
Which are, again, in byte-sized units. ⏎
Not to be confused with an atomic read-modify-write, which is largely an abstraction that’s likely generated using a looped atomic compare-and-swap (CAS) anyhow. ⏎
This doesn’t really apply to the Nunchuk decoder because the struct’s bit-fields aren’t written to, but it would be an issue in the reverse (encoding) direction. ⏎
ISO/IEC 9899:2018 §6.7.2.1 ¶11 ⏎
ISO/IEC 9899:2018 §6.5.3.2 ¶1 ⏎
ISO/IEC 9899:2018 §6.7.2.1 ¶11 ⏎
Which has first-class support for packed structs and arbitrary fixed-width integers, making “bit-fields” look just like other primitives in a struct. ⏎
Understandable, given that expressing some of the concepts in those schemas like arithmetic encoding and substructures can be gnarly enough in a from-scratch domain-specific language (DSL), let alone wrapped up in template and macro magic. ⏎