Skip to main content
geeks have feelings

C++ Corrections

It takes some courage and a heap of self-importance to keep a blog alive for two decades without giving in to the urge to burn it all then either:

If you’re reading this, then I held fast against both compulsions*. The most terrifying part of preserving the teen-aged content in this new packaging has to be stumbling upon the sins of your past self. In my case, my past self also happened to be teenaged and even worse, a C++ embedded systems coder, and hence has a world to atone for.

My 2010 post on decoding Wii Nunchuk state packets made the cut for this site’s third decade. But first, I gotta set the record as straight as my third decade of writing C++ allows.

The Code’s Goals §

The layout of the I2C state reports from the Nunchuk make for an interesting problem: the fields of the report are quite different in size and their bits are splashed around in the report. As a reminder of what it looks like:

Byte ↓76543210
0Jx[7:0]
1Jy[7:0]
2Ax[9:2]
3Ay[9:2]
4Az[9:2]
5Az[1:0]Ay[1:0]Ax[1:0]BCBZ
Bit →76543210

Legend §

This defined bit layout is what I’ll refer to as an example of a wire protocol or wire packet, where the “wire” is arguably a misnomer because we aren’t dealing with the physical order that the bits are transmitted; I2C is abstracting that away for us.

In my original code, I needed to do two things to decode this 6-byte packet:

  1. Map fields to their locations in raw bytes by name
  2. Extract and assemble bits, i.e. fields smaller than a byte
typedef struct __attribute__((__packed__)) {
  union {
    struct {
      uint8_t joyX;
      uint8_t joyY;
      uint8_t accelX_HI;
      uint8_t accelY_HI;
      uint8_t accelZ_HI;
      uint8_t buttonZ   : 1;
      uint8_t buttonC   : 1;
      uint8_t accelX_LO : 2;
      uint8_t accelY_LO : 2;
      uint8_t accelZ_LO : 2;
    };

    uint8_t buffer[6];
  };
} NunchukRegisters_t;

Both goals were (supposedly) fulfilled by writing to the buffer field in this union, then reading out the fields of the anonymous internal struct, some of which have sub-octet bit-widths defined. This looked great, and the code very much mirrored the structure of the data.

nunchukData.joyX = nunchukRegs.joyX;
nunchukData.joyY = nunchukRegs.joyY;
nunchukData.accelX = (nunchukRegs.accelX_HI << 2) | nunchukRegs.accelX_LO;
nunchukData.accelY = (nunchukRegs.accelY_HI << 2) | nunchukRegs.accelY_LO;
nunchukData.accelZ = (nunchukRegs.accelZ_HI << 2) | nunchukRegs.accelZ_LO;
nunchukData.buttonZ = !nunchukRegs.buttonZ;
nunchukData.buttonC = !nunchukRegs.buttonC;

Unfortunately, it was not so simple. Important nuances in C and C++ specifications and compiler writers’ direction made this kind of raw data destructuring a crime to write and a gamble to run. And as we’ll discover, the whole “struct aliasing” approach to serialization is unsalvageable.

Problem 1: Type aliasing by union §

This problem is (or at least should be) well-known by systems programmers now: it’s undefined behavior (UB) in C++ to perform type-punning of an object by reading from a union member other than the one last written to§. This has to do with strict aliasing rules that dedicate memory used through the lifetime of an object to only be used for that object, hence fixing the type at that memory.

This kind of type punning is defined in C, so this isn’t a necessarily a problem with the original HÄRDBÖRD code, but I doubt 2010 Xo made that distinction.

This also used to work when compiler authors decided to enforce strict aliasing in pursuit of unlocking optimizations, but of course said authors still needed a way for coders to express the same intent to move bits. At this point in C++’s saga, the best way to express that you wish to move the bit representation of a type around while satisfying the rules around object lifetime is to create the object using std::bit_cast or (less-ideally) create an object then memcpy to and from it. The “right way” also enjoys proper optimization now—a difficult thing to come by in the transitional period when I first encountered this (GCC 4-ish).

A major advantage to using std::bit_cast is that it can be constexpr friendly:

constexpr unsigned char kRawData[] = {0xff, 0x00, 0x01, 0xa0, 0x04, 0x5b};
constexpr auto kRegister = std::bit_cast<NunchukRegisters_t>(kRawData);

However, if we run this through a few compilers, we find that Clang doesn’t yet support constexpr std::bit_cast with bit-fields and MSVC only supported it starting in version 16.10. This low level of support and general neglect of bit-fields from compiler writers is a great reason to avoid them.

Problem 2: Struct packing §

The original code relied on all the fields in the inner anonymous struct to be laid out tightly-packed in memory in the same order as declared.

“Same order as declared” is actually guaranteed by both C and C++, but the tight packing is not. In order to maintain alignment for different types, the compiler is free to add padding bytes that occupy space in the struct but are not storage for the elements.

With that said, the uint8 types in the original HÄRDBÖRD code do actually pack tightly regardless of the __attribute__((__packed__)) because no padding needs to be added to bring them into their natural alignment (which is 1). You can see that even in C++, this struct ends up at 6 bytes long, implying tight packing.

However, there’s a glaring error that the attribute is actually applied to the wrong struct: the outer anonymous struct (which is itself superfluous) doesn’t directly hold the fields of the data, so applying the attribute means nothing.

To show this, I’ll add a test field to the struct:

 typedef struct __attribute__((__packed__)) {
   union {
     struct {
       uint8_t joyX;
+      uint16_t newField;
       uint8_t joyY;
      };

     uint8_t buffer[6];
   };
 } NunchukRegisters_t;

In the same Godbolt workspace, we can see that this new two-byte field adds four bytes to the original struct: two as storage for the new newField field, one as padding ahead of newField to align it, and yet one more to ensure that the struct itself is aligned to its greatest alignment member (in this case, making its size even for the two-byte uint16_t newField).

To fix both the superfluous struct wrapper and more importantly, the incorrect packing attribute placement, I’ll move it into the inner anonymous struct:

-typedef struct __attribute__((__packed__)) {
-  union {
-    struct {
+typedef union {
+  struct [[gnu::packed]] {
       uint8_t joyX;
       uint16_t newField;
       uint8_t joyY;
      };

     uint8_t buffer[6];
-  };
 } NunchukRegisters_t;

At the same time, I’m also using the C++11 attribute syntax [[gnu::packed]] in place of __attribute__((__packed__)) because of its consistent placement for declaring types and gentle fallback for unsupported attributes.

Problem 3: Bit-fields bad §

As I showed in an earlier example, constexpr, std::bit_cast, and bit-fields don’t play nicely together. What else don’t work nicely together with bit-fields?

So not only are bit-fields a neglected stepchild of C/C++ and its compilers, incurring large mental cost and incompatibilities with other features, but also fail to deliver on the actual promise of reliably establishing a struct layout for wire transmission. There isn’t really a fix here for writing a struct that looks like the Nunchuk wire packet, other than looking to languages like Zig¶¶.

As far as I know, to implement wire protocol decoding at the bit level in C/C++, bitmasking is the only way to go. In the Nunchuk example, reading the lowest 2 bits of the y-axis acceleration might look like:

buffer[5] >> 4 & 0b11

When combined with code for assembling the most significant 8 bits, this doesn’t look quite so great:

constexpr int16_t GetJoyY(std::array<uint8_t, 6> report_bytes) {
    return (uint16_t{report_bytes[1]} << 2) | (report_bytes[5] >> 4 & 0b11);
}

And it’s even worse if you hide some of the magic number constants and put it under a class indentation.

class NunchukReport {constexpr int16_t joyX() const {
    return (uint16_t{bytes_[JOY_X]} << kAccelLoBitWidth) |
           (bytes_[REG_5] >> kRegister5AccelXLoBits &
            MakeBitMask(kAccelLoBitWidth));
  }};

But, we chose C++ and this is our life now.

Conclusion §

If I had to write this Nunchuk wire protocol decoder today, I’d be careful to work with the wire bytes directly rather than type punning them to a loosely-packed struct (even with std::bit_cast available). Given that bit-fields are not practical to use to refer to the smallest fields in the layout, we might as well address the bytes by offset using named constants:

enum RegisterOffset {
  JOY_X,
  JOY_Y,
  ACCEL_X_HI,
  ACCEL_Y_HI,
  ACCEL_Z_HI,
  REG_5,
};

I would then massage the bits out using the tried-and-true methods of shifting and masking. Basically, this is a barely-dressed up way to programmatically reach into the byte buffer and pull out the bits we want rather than aliasing them into a named, formatted schema.

What if I wanted to handle lots of different wire packet formats? I’m not sure if there is a good solution. At Google, Emboss looked great for systems that could handle more tooling than a barebones CMake + compiler setup. Kaitai Struct and Protlr are similar in that they are also standalone tools defining a declarative schema that necessarily add a code generation stage. But, I’ve yet to see any (let alone a compelling) library that allows the developer to define a schema in only C++♠♠. Libraries like Frozen certainly prove that compile-time data is practical to traverse and hence use for generating run-time code, so there is hope.

Finally, at a higher level of code organization, I would make sure to inject the underlying I2C library dependency then separate the wire communication part of the code from the wire decoding part. This would allow the I2C flow sequencing and data decryption to be tested separately from the hardware as well as separately from the bit decoding described above. Testing how bits move from the raw buffer into a 16-bit output should not require a real microcontroller hardware harnessed to a real Nunchuk.

However, with all that said, the original code does actually work without UB even though that might have been an afterthought: union type punning is defined in C, struct packedness isn’t necessary given the layout, and bit-field order is consistent for a given compiler and target. At any rate, 2010 Xo definitely acquitted himself by writing the whole project in less time than it took for 2023 Xo to write this post. ∎

Every Post by Year

  1. 2023
    1. Ducati Timing Belt Replacement
    2. C++ Corrections
  2. 2016
    1. Liftlord
    2. Sensorless Brushless Can’t Even
  3. 2015
    1. Big Data: Test & Refresh
  4. 2014
    1. The Orange Involute
    2. Big Data EVT
  5. 2013
    1. Integer Arithmetic Continued
    2. Real Talk: Integer Arithmetic
    3. Why Microsoft’s 3D Printing Rocks
    4. Flapjack Stator Thoughts
    5. Delicious Axial Flux Flapjack
  6. 2012
    1. How to teach how to PCB?
    2. Fixed-point atan2
    3. It Was Never About the Mileage
    4. Trayrace
    5. BabyCorntrolling
    6. Conkers
    7. BabyCorntroller
    8. Templated numerical integrators in C++
  7. 2011
    1. Bringing up Corntroller
    2. Assembly-izing Tassel
    3. Corn-Troller: Tassel
    4. 5 V to 3.3 V with Preferred Resistors
  8. 2010
    1. HÄRDBÖRD: Interesting Bits
    2. HÄRDBÖRD: Hardcore Electric Longboard
    3. Mistakes to Make on a Raytracer
    4. US International Dvorak
  9. 2009
    1. Raxo
    2. Better Spheres, Fewer Triangles
    3. Donald Knuth Finally Sells Out
    4. Harpy – Sumo Bots 2009