When I started trialling VS Code and ended up distracted reverse-engineering extensions, my goal was to document a simple mechanism to re-order the way architectures are presented to the loader in universal Mach-O binaries. This is that post.

Why would this be necessary? For me, it was the need to test a custom Mach-O loader, to make sure it can handle certain special cases. This exercise ended up being a straightforward, practical introduction to the Mach-O file format, which in some ways was more valuable, any why I decided to share the process.

Intro to universal Mach-O binaries

This post focuses on universal (a.k.a. “fat”) Mach-O binaries, unless otherwise noted. Specifically, those containing object files for x86_64 (Intel) and arm641 (Apple Silicon) architectures. There are other potential architectures inside such binaries, since Apple also used the format when moving from PowerPC to Intel. As we’ll see, the specific architectures don’t really matter when it comes to reordering, but since I did not test with older binaries, I won’t claim this works as-is for those combinations.

The header describing universal binaries is defined in mach-o/fat.h, and the essential information is rather short:

#define FAT_MAGIC       0xcafebabe
#define FAT_CIGAM       0xbebafeca      /* NXSwapLong(FAT_MAGIC) */

struct fat_header {
        uint32_t        magic;          /* FAT_MAGIC */
        uint32_t        nfat_arch;      /* number of structs that follow */
};

struct fat_arch {
        cpu_type_t      cputype;        /* cpu specifier (int) */
        cpu_subtype_t   cpusubtype;     /* machine specifier (int) */
        uint32_t        offset;         /* file offset to this object file */
        uint32_t        size;           /* size of this object file */
        uint32_t        align;          /* alignment as a power of 2 */
};

Each universal binary starts with a header that is always stored in big-endian format on disk, and which contains the magic number (0xcafebabe) and the total number of architectures the binary contains. Headers for each architecture then follow, specifying the CPU, its subtype, an absolute offset into the binary for the object file corresponding to the architecture, that object file’s size, and an alignment.

To list architectures in a binary on macOS, either the file utility, or lipo -archs can be used; the output is different, but both parse and display the headers, in the order they’re present in the binary.

Reordering architecture headers

The order as displayed is in fact entirely determined by how these headers are laid out inside the binary. Think of the universal binary as a container for its multiple architectures. Each one of those is a (non-universal) binary in its own right. In fact, lipo can be used to “thin” a binary to only a specified architecture, which essentially parses these headers, identifies the one corresponding to the architecture that was requested, skips to the offset indicated in the header, and dumps the next size bytes out.

┌────────────────────────────┐
│ ┌──────────┐               │
│┌┤fat_header├──────────────┐│
││└──────────┘              ││
││┌────────────────────────┐││
│││magic                   │││
││├────────────────────────┤││
│││num archs               │││
││└────────────────────────┘││
│└──────────────────────────┘│
│ ┌────────────┐             │
│┌┤arch_headers├────────────┐│
││└────────────┘            ││
││┌──────────────────────┬─┐││
│││ cputype              │0│││
│││ cpusubtye            └─┤││
│││ offset                 │││
│││ size                   │││
│││ align                  │││
││├──────────────────────┬─┤││
│││ cputype              │1│││
│││ cpusubtye            └─┤││
│││ offset                 │││
│││ size                   │││
│││ align                  │││
││└──────┬───────┬─────────┘││
││       │   …   │          ││
││       └───────┘          ││
│└──────────────────────────┘│
└────────────────────────────┘

This means that, in order to reorder the way architectures in the binary are processed, the simplest approach is to reorder the headers to get the desired sequence. It is possible (but insufficient) to shift the actual architecture code in the binary as well, but there’s no benefit, except a (dubious) performance claim that if the offset is earlier, there’s less seeking necessary to get to the relevant code. That’s out of scope for this post, but certainly something one can try.

lipo has a way to carve out architectures from a binary, and also supports creating a universal binary from specific architecture binaries, but the ordering is fixed, based on a sort by alignment, to save space2, so it’s unsuitable for reordering. The best option is a small tool that can shift the arch headers as needed. Fortunately, we don’t need to understand the arch headers (not that it’s hard) for this task, it’s sufficient to figure out how many there are, read 20 bytes for each, and save to a new file with a different ordering.

The Python code is fairly straightforward, though it doesn’t allow for anything fancy like specifying what order the architectures should be output in etc. Simply shifts “left”, so for example for the file binary, which has the following 3 architectures on my macOS install, the first shift would go from 0 1 2 to 1 2 0 etc. as indicated below:

arch     | x86_64 arm64 arm64e -> arm64 arm64e x86_64 -> arm64e x86_64 arm64
position |    0     1      2   ->   1      2      0   ->    2      0     1

Obviously (and reassuringly) shifting all the way to the original yields the same SHA256 hash for the binary.

To begin, read in the fat_header to confirm this has multiple architectures, then read how many architectures we’re dealing with:

offset = 0
magic = struct.unpack(">I", inbin.read(4))[0]
offset += 4
inbin.seek(offset)

if magic != MAGIC:
    print("not a universal binary")
    return 2

# next value tells us how many archs the binary contains
narchs = struct.unpack(">I", inbin.read(4))[0]
offset += 4
inbin.seek(offset)

if narchs < 2:
    print("not enough archs: %d", narchs)
    return 3

Once we know how many architectures are in the binary, we can simply read their headers, 20 bytes at the time, but we do not need to parse them. This would be different if we also had to reorder the object files inside the binary, since offsets would need to be adjusted, but that’s not the case here.

headers = []
for _ in range(narchs):
    headers.append(inbin.read(20))
    offset += 20  # promise
    inbin.seek(offset)

Then write to the output binary, maintaining the fat_header and using a simple shift in the loop to write the arch headers themselves. Finally, copy the remainder of the file:

# write back magic
outbin.write(struct.pack(">I", magic))
# write out how many narchs
outbin.write(struct.pack(">I", narchs))
# put headers in, shifting "left"
for idx in range(1, narchs + 1):
    outbin.write(headers[idx % narchs])
# inneficiently copy the remaining bytes
outbin.write(inbin.read())

Final notes

There is not much to this once there’s an understanding of how the headers in the binary are laid out. This is somewhat documented in the open source code Apple publishes, though resources like Jonathan Levin’s books make for much easier references to learn from.

As for follow-up work (besides improved error checking) a better way to understand the Mach-O format is to also enable object file reordering in the binary, and/or have a way to specify the sequence for header reordering.


  1. Frequently this will in fact be arm64e, but in most cases discussed here, it is not necessary to make the distinction. ↩︎

  2. Search for * create_fat to find the function in lipo.c ↩︎