Reordering Architecture Headers in a Universal Mach-O Binary
Table of Contents
When I started trialling VS Code and ended up distracted reverse-engineering extensions, my goal was to document a simple mechanism to re-order the way architectures are presented to the loader in universal Mach-O binaries. This is that post.
Why would this be necessary? For me, it was the need to test a custom Mach-O loader, to make sure it can handle certain special cases. This exercise ended up being a straightforward, practical introduction to the Mach-O file format, which in some ways was more valuable, any why I decided to share the process.
Intro to universal Mach-O binaries #
This post focuses on universal (a.k.a. “fat”) Mach-O binaries, unless otherwise
noted. Specifically, those containing object files for x86_64
(Intel) and
arm64
1 (Apple Silicon) architectures. There are other potential
architectures inside such binaries, since Apple also used the format when moving
from PowerPC to Intel. As we’ll see, the specific architectures don’t really
matter when it comes to reordering, but since I did not test with older
binaries, I won’t claim this works as-is for those combinations.
The header describing universal binaries is defined in
mach-o/fat.h
,
and the essential information is rather short:
|
|
Each universal binary starts with a header that is always stored in
big-endian
format on disk, and which contains the magic number (0xcafebabe
) and the total
number of architectures the binary contains. Headers for each architecture then
follow, specifying the CPU, its subtype, an absolute offset into the binary for
the object file corresponding to the architecture, that object file’s size, and
an alignment.
To list architectures in a binary on macOS, either the file
utility, or
lipo -archs
can be used; the output is different, but both parse and display
the headers, in the order they’re present in the binary.
Reordering architecture headers #
The order as displayed is in fact entirely determined by how these headers are
laid out inside the binary. Think of the universal binary as a container for its
multiple architectures. Each one of those is a (non-universal) binary in its own
right. In fact, lipo
can be used to “thin” a binary to only a specified
architecture, which essentially parses these headers, identifies the one
corresponding to the architecture that was requested, skips to the offset
indicated in the header, and dumps the next size
bytes out.
┌────────────────────────────┐
│ ┌──────────┐ │
│┌┤fat_header├──────────────┐│
││└──────────┘ ││
││┌────────────────────────┐││
│││magic │││
││├────────────────────────┤││
│││num archs │││
││└────────────────────────┘││
│└──────────────────────────┘│
│ ┌────────────┐ │
│┌┤arch_headers├────────────┐│
││└────────────┘ ││
││┌──────────────────────┬─┐││
│││ cputype │0│││
│││ cpusubtye └─┤││
│││ offset │││
│││ size │││
│││ align │││
││├──────────────────────┬─┤││
│││ cputype │1│││
│││ cpusubtye └─┤││
│││ offset │││
│││ size │││
│││ align │││
││└──────┬───────┬─────────┘││
││ │ … │ ││
││ └───────┘ ││
│└──────────────────────────┘│
└────────────────────────────┘
This means that, in order to reorder the way architectures in the binary are processed, the simplest approach is to reorder the headers to get the desired sequence. It is possible (but insufficient) to shift the actual architecture code in the binary as well, but there’s no benefit, except a (dubious) performance claim that if the offset is earlier, there’s less seeking necessary to get to the relevant code. That’s out of scope for this post, but certainly something one can try.
lipo
has a way to carve out architectures from a binary, and also supports
creating a universal binary from specific architecture binaries, but the
ordering is fixed, based on a sort by alignment, to save space2, so
it’s unsuitable for reordering. The best option is a small tool that can shift
the arch headers as needed. Fortunately, we don’t need to understand the arch
headers (not that it’s hard) for this task, it’s sufficient to figure out how
many there are, read 20 bytes for each, and save to a new file with a different
ordering.
The
Python code is fairly straightforward,
though it doesn’t allow for anything fancy like specifying what order the
architectures should be output in etc. Simply shifts “left”, so for example for
the file
binary, which has the following 3 architectures on my macOS install,
the first shift would go from 0 1 2
to 1 2 0
etc. as indicated below:
arch | x86_64 arm64 arm64e -> arm64 arm64e x86_64 -> arm64e x86_64 arm64
position | 0 1 2 -> 1 2 0 -> 2 0 1
Obviously (and reassuringly) shifting all the way to the original yields the same SHA256 hash for the binary.
To begin, read in the fat_header
to confirm this has multiple architectures,
then read how many architectures we’re dealing with:
|
|
Once we know how many architectures are in the binary, we can simply read their headers, 20 bytes at the time, but we do not need to parse them. This would be different if we also had to reorder the object files inside the binary, since offsets would need to be adjusted, but that’s not the case here.
|
|
Then write to the output binary, maintaining the fat_header
and using a simple
shift in the loop to write the arch headers themselves. Finally, copy the
remainder of the file:
|
|
Final notes #
There is not much to this once there’s an understanding of how the headers in the binary are laid out. This is somewhat documented in the open source code Apple publishes, though resources like Jonathan Levin’s books make for much easier references to learn from.
As for follow-up work (besides improved error checking) a better way to understand the Mach-O format is to also enable object file reordering in the binary, and/or have a way to specify the sequence for header reordering.