※ hackd

Capabilities for agent delegation

roguesys — Fri, 09 Jan 2026 00:55:52 GMT

This is one of the more interesting ideas I've come across in terms of securing agentic workloads.

Atuin and Tailscale and containers and 1Password

roguesys — Thu, 19 Jun 2025 17:47:24 GMT

I'm not sure how it took me this long to discover[^1] atuin but I've been missing out for sure. Atuin is essentially (much) "better shell history" in a bunch of ways, as it:

uses a database to enable more kinds of queries on the history
has per-directory (or workspace) contextual filters, to better limit what is shown
syncs your shell history across devices, via an end-to-end encrypted sync server

That last feature was one of the things I was most excited about, although 2 is really great as well in practice. While there is a public sync server available, I thought I should run and host my own, given that's an option and I'm that sort of person.

So, we're setting up a self-hosted Atuin sync server reachable via Tailscale, running in containers (Podman, but Docker would work), with 1Password providing secrets management, running on macOS. The completed configuration is available on GitHub, if you want to skip ahead. The documentation from Atuin is pretty thorough, but I did have to figure out a bit more glue, hence this post.

[^1]: It was probably this comic by Julia Evans that finally made me look.

Containers (Podman, Docker etc.)

I use Podman as most container workloads don't need root. In this particular setup, we'll deploy 3 containers via podman-compose:

Atuin server
the Postgres database for the Atuin server
Tailscale routing so we can reach the Atuin server from our tailnet

Setting up the Postgres and Atuin containers is pretty well-covered in the official self-hosting docs, though I did have to figure out a small thing and ended up submitting a PR to fix the upstream docs.

Atuin server

Annotated container config:

atuin:
  image: ghcr.io/atuinsh/atuin:latest
  # start after the DB and Tailscale
  depends_on:
    - db
    - tailscale
  restart: always
  command: server start
  # Make config persistent
  volumes:
    - './config:/config'
  # Map the ports
  ports:
    - 8888:8888
  environment:
    # Listen an all interfaces.
    ATUIN_HOST: '0.0.0.0'
    # same port as above
    ATUIN_PORT: 8888
    # I don't keep registration open generally, but you probably need to set
    # this to "true" once to set up a first account.
    ATUIN_OPEN_REGISTRATION: 'false'
    # These env vars will be injected later. Make sure that `db` matches your
    # database container's name!
    ATUIN_DB_URI: postgres://${ATUIN_DB_USERNAME}:${ATUIN_DB_PASSWORD}@db/${ATUIN_DB_NAME}
    RUST_LOG: info,atuin_server=debug
  network_mode: service:tailscale

The main trouble I had was the DB container name not matching the ATUIN_DB_URI string. Now that the upstream docs are fixed, it's unlikely you'd run into this issue.

</aside>

Given that we're using Tailscale to access the server, I don't worry about setting up TLS.

Postgres database

Annotated container config:

db:
  image: postgres:17
  depends_on:
    - tailscale
  restart: unless-stopped
  volumes:
    # Don't remove permanent storage for index database files!
    - './database:/var/lib/postgresql/data/'
    # If the WAL ever gets corrupt (e.g., due to abrupt container stops), fix it with:
    # docker run -it -v ./database:/var/lib/postgresql/data/ postgres:17 /bin/bash
    # su postgres && cd /var/lib/postgresql/data && pg_resetwal
  environment:
    POSTGRES_USER: ${ATUIN_DB_USERNAME}
    POSTGRES_PASSWORD: ${ATUIN_DB_PASSWORD}
    POSTGRES_DB: ${ATUIN_DB_NAME}

Tailscale

We want our Tailscale container to be able to route to and from our Atuin server, while being able to independently authenticate to the network so that we don't lose connectivity periodically. Tailscale docs are pretty good on how to set up containers, including OAuth configuration.

Annotated container config:

tailscale:
  image: ghcr.io/tailscale/tailscale:latest
  restart: always
  # TS_AUTHKEY from the OAuth config
  environment:
    TS_AUTHKEY: ${TS_AUTHKEY}
    TS_HOSTNAME: atuin
    TS_STATE_DIR: /var/lib/tailscale
    TS_EXTRA_ARGS: --advertise-tags=tag:container
  volumes:
    - './tailscale:/var/lib/tailscale/'
  devices:
    - /dev/net/tun:/dev/net/tun
  cap_add:
    - net_admin

1Password

I've known for a while that 1Password has a CLI integration, allowing secrets access from the command line, in scripts etc. This seemed like a good spot to try that. My approach was to reference the secrets as environment variables (via mise), have their values be paths into a 1Password vault, and start my cluster by having op inject the values at start-up time.

So, for example, to set the database password for Atuin, in mise.toml I have:

[env]
ATUIN_DB_PASSWORD = "op://Personal/atuin/password"

All together now (hopefully)

With the finished compose.yaml for the cluster, and having set up the respective secrets in 1Password, all that's left to do is:

op run -- podman compose up -d

This should prompt for authentication with 1Password, and then bring up all the containers and have everything ready to go! As mentioned, you may need to allow for registration once, to set up one account that you'll use. Or keep it on if you're doing this for a team, friend group, polycule, etc. Given the server has to be reached over Tailscale, it won't matter too much if you leave this on.

If you have issues, check the logs for clues. I've had the occasional issue with the DB WAL.

podman compose logs -n -t -f

You can also configure 1Password integration (and mise populating the env) is working as expected by printing the environment unmasked:

op run --no-masking -- printenv # might want to pipe into grep ATUIN

Atuin client

Get Atuin (the client) from the official place or use homebrew / another package manager of your choice. You'll need a little configuration to tell your client which is the right server to use.

Here's my (annotated) configuration as an example, but only the sync_address is really needed. For the rest, find what works for you, and make sure to look at the Atuin docs:

# Tailnet address and correct port; remember this is HTTP.
sync_address = "http://atuin.<your-tailnet>.ts.net:8888"
# Workspaces use per-git-repository history, rather than directory-only
workspaces = true
# Prefer to see the workspace history by default.
filter_mode_shell_up_key_binding = "workspace" # or global, host, directory, etc
# Run commands directly; tab to edit.
enter_accept = true
# Use Ctrl-0 .. Ctrl-9 instead of Alt-0 .. Alt-9 UI shortcuts; better on macOS.
ctrl_n_shortcuts = true
# Invert window?
invert = false
# Window style
style = "auto"

At this point you should be able to run atuin status and see you're green and connecting to the server you've just configured!

macOS reflective code loading analysis

roguesys — Tue, 08 Feb 2022 02:45:24 GMT

Background

Reflective code loading is an interesting attack technique, useful when there's a desire to conceal or protect the code that's being executed on a system. This is primarily accomplished by avoiding the creation of any files on disk (e.g., downloading the binary that's then going to run) or other execution artifacts that would become indicators of behavior. Instead, code can be downloaded and executed directly in the memory of an otherwise benign process. While not completely interchangeable, sometimes we see this technique also referred to as "fileless", "artifact-free", and/or "in-memory (only)" as those are desirable characteristics.

Probably the best known implementation of this technique on macOS relies on functionality exposed by dyld, and first documented in "The Mac Hacker's Handbook" by Dino Dai Zovi and Charlie Miller in 2009, though it's been also discussed/popularized by Patrick Wardle at BlackHat 2015, and by Stephanie Archibald at INFILTRATE '17. The high-level approach is:

receive code over a socket
if it's a binary, change a field in the Mach-O header from MH_EXEC to MH_BUNDLE - needed for the subsequent steps as the necessary APIs expect a bundle[^bundle]
use NSCreateObjectFileImageFromMemory to, well, create an object file from the memory region that contains the binary
use NSLinkModule to link in the necessary shared libraries
call functions or pass execution to the loaded code (after figuring out the entry point, in the case it was actually a binary)

[^bundle]: If we're to go by Apple's old documentation, the motivation for this API is to allow loading of plug-ins in applications that might want them. Bundles (in the Mach-O sense) fit this purpose rather well, though since it's trivial to turn a binary into a bundle as shown

This technique has been used by malware authors rather recently, notably the Lazarus Group in the 2019 version of Apple Jeus.

There's just one thing: this technique hasn't been truly fileless for… some time.

How it (really) works today

@bluec0re initially got me looking into this in depth, after noticing during a debug run that artifacts are, in fact, created when these specific functions are called. Given the public perception around this in-memory loading technique, this was a bit of a surprise, so I spent some time doing a bit more research and putting it together for this analysis.

I used @its_a_feature's simple macos_execute_from_memory PoC for testing (there's also Archibald's PoC, for historical reference). For convenience/ease of iteration, these PoCs do actually load a target binary from disk, rather than over the network, but the execution is meant to be in-memory only.

To figure out what's going on, a good starting place is to run the code while printing anything that has to do with dyld, given that's where the two functions are. There are some helpful environment variables to set before running the code:

clang -g -o main main.c
DYLD_PRINT_APIS=1 DYLD_PRINT_LIBRARIES=1 ./main

// some output ellided
dyld[80184]: NSCreateObjectFileImageFromMemory(0x105010000, 0x00008258)
dyld[80184]: NSCreateObjectFileImageFromMemory() copy 0x105010000 to 0x10501c000
dyld[80184]: NSLinkModule(0x6000029d4270, module)
dyld[80184]: dlopen("/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-VTd6S38q", 0x80000080)
dyld[80184]: <DDBBB7CE-78F7-3E78-AD09-2C79ED030E2A> /private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-VTd6S38q
dyld[80184]:       dlopen(NSCreateObjectFileImageFromMemory-VTd6S38q) => 0x20a24d4a0
dyld[80184]: NSLinkModule(0x6000029d4270, module) => 0x20a24d4a0
dyld[80184]: NSLookupSymbolInModule(0x20a24d4a0, _execute)
dyld[80184]: NSLookupSymbolInModule(0x20a24d4a0, _execute) => 0x105037f84
old timey mode: Executed!
dyld[80184]: NSUnLinkModule(0x20a24d4a0)
dyld[80184]: dlclose(0x20a24d4a0)
dyld[80184]: NSDestroyObjectFileImage(0x6000029d4270)

Looks like NSLinkModule uses dlopen to load the NSCreateObjectFileImageFromMemory-VTd6S38q file from a temporary location. To confirm and get a bit more detail, I monitored the execution of ./main using Patrick Wardle's File Monitor to see what files are being accessed. To narrow the output, I'm piping this through jq to filter out for events related to the PoC, and extracting only the type and the file destination for each event:

sudo /Applications/FileMonitor.app/Contents/MacOS/FileMonitor | \
jq  '.
   | select(.file.process.path | contains("macos_execute_from_memory/main"))
   | {event: .event, dest: .file.destination}' | \
tee filemon-filtered.json # also saving for futher analysis
# start ./main in another terminal, Ctrl-C the FileMonitor process when done

{
  "event": "ES_EVENT_TYPE_NOTIFY_OPEN",
  "dest": "…/macos_execute_from_memory/test.bundle"
}
{
  "event": "ES_EVENT_TYPE_NOTIFY_CLOSE",
  "dest": "…/macos_execute_from_memory/test.bundle"
}
{
  "event": "ES_EVENT_TYPE_NOTIFY_CREATE",
  "dest": "/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-7zEgh32K"
}
{
  "event": "ES_EVENT_TYPE_NOTIFY_WRITE",
  "dest": "/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-7zEgh32K"
}
{
  "event": "ES_EVENT_TYPE_NOTIFY_CLOSE",
  "dest": "/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-7zEgh32K"
}
{
  "event": "ES_EVENT_TYPE_NOTIFY_OPEN",
  "dest": "/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-7zEgh32K"
}
{
  "event": "ES_EVENT_TYPE_NOTIFY_CLOSE",
  "dest": "/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-7zEgh32K"
}
// open-close twice more
{
  "event": "ES_EVENT_TYPE_NOTIFY_UNLINK",
  "dest": "/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-7zEgh32K"
}

During the execution of ./main a file is created, accessed a few times, then deleted. The name changes slightly with every run (the 7zEgh32K above, compared to VTd6S38q previously) and based on the path it's fairly obvious this is a temporary file. I wanted to understand more about when it's created, and if its presence means this particular technique is no longer as interesting as we once thought: it can be replaced with a much less brittle combo of dlopen and dlsym, but it's also not artifact-free.

So next I ran ./main in lldb with a breakpoint on NSCreateObjectFileImageFromMemory, stepping through instructions to figure out what's going on:

(lldb) b NSCreateObjectFileImageFromMemory
Breakpoint 1: where = libdyld.dylib`NSCreateObjectFileImageFromMemory, address = 0x0000000180314fa0
(lldb) r
Process 94604 launched: 'macos_execute_from_memory/main' (arm64)
Process 94604 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x000000018e3b8fa0 libdyld.dylib`NSCreateObjectFileImageFromMemory
libdyld.dylib`NSCreateObjectFileImageFromMemory:
->  0x18e3b8fa0 <+0>:  mov    x3, x2
    0x18e3b8fa4 <+4>:  mov    x2, x1
    0x18e3b8fa8 <+8>:  mov    x1, x0
    0x18e3b8fac <+12>: adrp   x8, 366208
Target 0: (main) stopped.

After for a while we eventually get to dyld4::APIs::NSCreateObjectFileImageFromMemory(void const*, unsigned long, __NSObjectFileImage**) which superficially looks like the function called in main.c but, as we'll see a bit later on, is not.

Continuing on we reach first libdyld.dylib\NSCreateObjectFileImageFromMemoryand thendyld`dyld4::APIs::NSLinkModule(__NSObjectFileImage*, char const*, unsigned int)in a similar fashion. Here we'll see a call tomkstemp` that's a give-away we're likely creating a temporary file:

(lldb)
Process 97220 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x000000018e3bdea8 libdyld.dylib`dyld4::LibSystemHelpers::getenv(char const*) const
libdyld.dylib`dyld4::LibSystemHelpers::getenv:
->  0x18e3bdea8 <+0>: mov    x0, x1
    0x18e3bdeac <+4>: b      0x18e3c329c               ; symbol stub for: getenv

libdyld.dylib`dyld4::LibSystemHelpers::mkstemp:
    0x18e3bdeb0 <+0>: mov    x0, x1
    0x18e3bdeb4 <+4>: b      0x18e3c336c               ; symbol stub for: mkstemp
Target 0: (main) stopped.
(lldb)
Process 97220 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x000000018e3bdeac libdyld.dylib`dyld4::LibSystemHelpers::getenv(char const*) const + 4
libdyld.dylib`dyld4::LibSystemHelpers::getenv:
->  0x18e3bdeac <+4>: b      0x18e3c329c               ; symbol stub for: getenv

libdyld.dylib`dyld4::LibSystemHelpers::mkstemp:
    0x18e3bdeb0 <+0>: mov    x0, x1
    0x18e3bdeb4 <+4>: b      0x18e3c336c               ; symbol stub for: mkstemp

libdyld.dylib`dyld4::LibSystemHelpers::getTLVGetAddrFunc:
    0x18e3bdeb8 <+0>: adrp   x16, -5
Target 0: (main) stopped.

Right after the return from mkstemp, I checked the temporary file location to see a file created there:

fd NSCreateObjectFileImageFromMemory /private/var/folders/
/private/var/folders/q8/28dylf2973q_22bqzy2_lplr0000gn/T/NSCreateObjectFileImageFromMemory-wySJJ7WJ

The SHA-256 checksum of this temp file matched the SHA-256 checksum of the file that was loaded in-memory by the PoC, so it's pretty clear at this point that NSLinkModule writes the file out, then uses dlopen to load it back to do the rest. dlopen is part of a stable API, unlike the long-deprecated NSCreateObjectFileImageFromMemory and NSLinkModule.

Tracing the source code

Apple publishes the source code to dyld, though with some delay after the corresponding OS release that introduces a new version. For example, the latest available source is 852, which corresponds to macOS 11, but the current macOS 12.2 version is dyld-941.5.

strings /usr/lib/dyld | rg "dyld-\b\d\d\d\b" # also /System/Library/dyld/dyld_shared_cache_arm64e
@(#)PROGRAM:dyld  PROJECT:dyld-941.5

An interesting fact is that the function we saw earlier as dyld4::APIs::NSCreateObjectFileImageFromMemory is from the dyld4 namespace, which is not present in any of the source code releases (there's only dyld and dyld3 respectively.) During debugging I saw a few functions called that are part of the dyld3 namespace, so my guess is that all of these will coexist, at least for a while. For the purposes of this analysis, I'll focus on the 852 version of dyld, but I'll keep an eye out for whenever 941 drops, to check out dyld4 there.

There are a few implementations of NSCreateObjectFileImageFromMemory in the source for 852, in:

src/dyldAPIs.cpp added with the dyld-43 release, about 17 years ago.
src/dyldAPIsInLibSystem.cpp added by dyld-519.2.1 in 2017
dyld3/APIs_macOS.cpp where the relevant portion was also added with dyld-519.2.1 in 2017.

The same is true for NSLinkModule, just a few lines below in those same files.

It looks like gUseDyld3 controls whether the version that's being used is the one in the dyld3 namespace, versus the "original" from the dyld-43 release. There are a few places this is assigned to throughout the code base, though only one spot where it's set to true, in libdyldEntryVector.cpp. Following the path further through the build config files in Xcode and the source itself, I believe the entry vector is a dependency to building libdyld.dylib. So that's how we get to the code path that uses the new (dyld3) implementation of NSCreateObjectFileImageFromMemory and NSLinkModule, the latter of which creates a temp file and loads it via dlopen.

How long has this been the case?

I admit that, while researching, I went through a few different theories as to when this behavior changed. Given I hadn't seen anyone discuss it, seemed like a recent change. But when I began tracing through the dyld source, I figured maybe this changed around 2017 when the initial dyld3 code was added, and somehow nobody noticed. I even went to search VirusTotal for files named according to this pattern, and found quite a few, dating back to 2018.

Somehow though, nobody noticing just… didn't seem right?

"Carbon" dating `dyld`

While version 941.5 isn't out, I was curious to check out how execution of this code path looks now, so I opened my local copy of dyld in Ghidra. Searching for NSLinkModule returned a single location, in the dyld4 namespace as we expected. The decompiled[^demangler] code does bear resemblance to the source in version 852:

[^demangler]: The GNU demangler that's part of Ghidra threw a bunch of errors and did not demangle any names, so I ran the decompiled code through c++filt manually and adjusted as necessary. Still, I may have made some errors re: parameters, which was fine for my purposes but may not be for yours.

void dyld4::APIs::NSLinkModule(__NSObjectFileImage*, char const*, unsigned int)

{
  int iVar1;
  char *pcVar2;
  size_t sVar3;
  undefined8 uVar4;
  long *plVar5;
  char acStack1096 [1024];
  long local_48;

  local_48 = ___stack_chk_guard;
  if (*(char *)(param_1[1] + 0x90) != '\0') {
    dyld4::RuntimeState::log(param_1,"NSLinkModule(%p, %s)\n");
  }
  if (param_2[1] == (char *)0x0) {
    uVar4 = 0;
  }
  else {
    *param_2 = (char *)0x0;
    pcVar2 = (char *)(**(code **)(*(long *)param_1[0xd] + 0x80))((long *)param_1[0xd],"TMPDIR");
    if ((pcVar2 == (char *)0x0) || (sVar3 = _strlen(pcVar2), sVar3 < 3)) {
      _strlcpy(acStack1096,"/tmp/",0x400);
    }
    else {
      _strlcpy(acStack1096,pcVar2,0x400);
      sVar3 = _strlen(pcVar2);
      if (pcVar2[sVar3 - 1] != '/') {
        _strlcat(acStack1096,"/",0x400);
      }
    }
    _strlcat(acStack1096,"NSCreateObjectFileImageFromMemory-XXXXXXXX",0x400);
    iVar1 = (**(code **)(*(long *)param_1[0xd] + 0x88))((long *)param_1[0xd],acStack1096);
    if (iVar1 != -1) {
      pcVar2 = (char *)_pwrite(iVar1,param_2[1],(size_t)param_2[2],0);
      if (pcVar2 == param_2[2]) {
        plVar5 = (long *)param_1[0xd];
        sVar3 = _strlen(acStack1096);
        pcVar2 = (char *)(**(code **)(*plVar5 + 8))(plVar5,sVar3 + 1);
        *param_2 = pcVar2;
        _strcpy(pcVar2,acStack1096);
      }
      _close(iVar1);
    }
    uVar4 = 0x80000080;
  }
  if (*param_2 != (char *)0x0) {
    pcVar2 = (char *)(**(code **)(*param_1 + 0x70))(param_1,*param_2,uVar4);
    param_2[4] = pcVar2;
    if (pcVar2 != (char *)0x0) {
      pcVar2 = (char *)dyld4::Loader::loadAddress(dyld4::RuntimeState& pcVar2 >> 1,param_1);
      param_2[3] = pcVar2;
      if (param_2[1] != (char *)0x0) {
        _unlink(*param_2);
      }
      if (*(char *)(param_1[1] + 0x90) != '\0') {
        dyld4::RuntimeState::log(param_1,"NSLinkModule(%p, %s) => %p\n");
      }
      pcVar2 = param_2[4];
      goto LAB_00029c40;
    }
    if (*(char *)(param_1[1] + 0x90) != '\0') {
      (**(code **)(*param_1 + 0x80))(param_1);
      dyld4::RuntimeState::log(param_1,"NSLinkModule(%p, %s) => NULL (%s)\n");
    }
  }
  pcVar2 = (char *)0x0;
LAB_00029c40:
  if (___stack_chk_guard != local_48) {
                    /* WARNING: Subroutine does not return */
    ___stack_chk_fail(pcVar2);
  }
  return;
}

Line 32 confirms the template for the temporary name is the same as what we saw during execution.

Line 51 would be around where we expect a dlopen call to happen, although in this case there's an indirection to resolve an address for a function from a jump table. Unfortunately Ghidra isn't able to resolve that table, so we're left speculating if that'll end up being dlopen or not. But I'm pretty sure that's where we end up eventually, given what we've observed during live debugging.

I wanted to compare this version with some of the older ones, but lacking at the moment any non-Monterey installs, I had to resort to grabbing copies of /usr/lib/dyld from VMs and, eventually, download them from VirusTotal. I was quite surprised to see that every one of them, as far back as 551, and including 852, do in fact use a non-dlopen version of the code, basically the true fileless implementation. I can't rule out that maybe all of these versions are off VMs, where maybe something is different with dyld… I think it's rather curious that I didn't find any that use the dyld3 implementation.

The thing that throws me off the most—if this behavior really was only enabled in Monterey—is that so many files matching the mkstemp pattern of /private/var/tmp/NSCreateObjectFileImageFromMemory-XXXXXX show up in VirusTotal since 2018. I'm not convinced either way just yet.

Future work

The predictable file name and location means XDR can easily grab these files whenever they're created, as it would be exactly the code to run that's written out (so any obfuscation would happen before, most likely). There are some benign uses of these APIs, certainly, but there are bound to be some cool findings too. I haven't spent any time looking through such files on VirusTotal to see if anything good is there, but it's certainly tempting.

Once the code for the latest version of dyld is out, I hope to better understand the specifics of how the dyld4 namespace is enabled for use. Of course, I also want to see what direction Apple is taking for these APIs, given the new namespace.

On the offensive tooling side, I'm curious if it's possible to develop a pure in-memory (true fileless) way to execute code on macOS. Mostly that seems to require writing a Mach-O loader, and I bet significant parts of the logic can be borrowed from both the deprecated code, and the dlopen implementation.

If you happen to come across any interesting implementations of true in-memory loaders for macOS, have figured out some of the missing parts I didn't get to in this analysis, or found any issues/mistakes etc. please let me know via @roguesys.

Thanks for reading!

Reordering Architecture Headers in a Universal Mach-O Binary

roguesys — Thu, 08 Jul 2021 16:24:49 GMT

When I started trialling VS Code and ended up distracted reverse-engineering extensions, my goal was to document a simple mechanism to re-order the way architectures are presented to the loader in universal Mach-O binaries. This is that post.

Why would this be necessary? For me, it was the need to test a custom Mach-O loader, to make sure it can handle certain special cases. This exercise ended up being a straightforward, practical introduction to the Mach-O file format, which in some ways was more valuable, any why I decided to share the process.

Intro to universal Mach-O binaries

This post focuses on universal (a.k.a. "fat") Mach-O binaries, unless otherwise noted. Specifically, those containing object files for x86_64 (Intel) and arm64[^arm64] (Apple Silicon) architectures. There are other potential architectures inside such binaries, since Apple also used the format when moving from PowerPC to Intel. As we'll see, the specific architectures don't really matter when it comes to reordering, but since I did not test with older binaries, I won't claim this works as-is for those combinations.

[^arm64]: Frequently this will in fact be arm64e, but in most cases discussed here, it is not necessary to make the distinction.

The header describing universal binaries is defined in mach-o/fat.h, and the essential information is rather short:

#define FAT_MAGIC       0xcafebabe
#define FAT_CIGAM       0xbebafeca      /* NXSwapLong(FAT_MAGIC) */

struct fat_header {
        uint32_t        magic;          /* FAT_MAGIC */
        uint32_t        nfat_arch;      /* number of structs that follow */
};

struct fat_arch {
        cpu_type_t      cputype;        /* cpu specifier (int) */
        cpu_subtype_t   cpusubtype;     /* machine specifier (int) */
        uint32_t        offset;         /* file offset to this object file */
        uint32_t        size;           /* size of this object file */
        uint32_t        align;          /* alignment as a power of 2 */
};

Each universal binary starts with a header that is always stored in big-endian format on disk, and which contains the magic number (0xcafebabe) and the total number of architectures the binary contains. Headers for each architecture then follow, specifying the CPU, its subtype, an absolute offset into the binary for the object file corresponding to the architecture, that object file's size, and an alignment.

To list architectures in a binary on macOS, either the file utility, or lipo -archs can be used; the output is different, but both parse and display the headers, in the order they're present in the binary.

Reordering architecture headers

The order as displayed is in fact entirely determined by how these headers are laid out inside the binary. Think of the universal binary as a container for its multiple architectures. Each one of those is a (non-universal) binary in its own right. In fact, lipo can be used to "thin" a binary to only a specified architecture, which essentially parses these headers, identifies the one corresponding to the architecture that was requested, skips to the offset indicated in the header, and dumps the next size bytes out.

┌────────────────────────────┐
│ ┌──────────┐               │
│┌┤fat_header├──────────────┐│
││└──────────┘              ││
││┌────────────────────────┐││
│││magic                   │││
││├────────────────────────┤││
│││num archs               │││
││└────────────────────────┘││
│└──────────────────────────┘│
│ ┌────────────┐             │
│┌┤arch_headers├────────────┐│
││└────────────┘            ││
││┌──────────────────────┬─┐││
│││ cputype              │0│││
│││ cpusubtye            └─┤││
│││ offset                 │││
│││ size                   │││
│││ align                  │││
││├──────────────────────┬─┤││
│││ cputype              │1│││
│││ cpusubtye            └─┤││
│││ offset                 │││
│││ size                   │││
│││ align                  │││
││└──────┬───────┬─────────┘││
││       │   …   │          ││
││       └───────┘          ││
│└──────────────────────────┘│
└────────────────────────────┘

This means that, in order to reorder the way architectures in the binary are processed, the simplest approach is to reorder the headers to get the desired sequence. It is possible (but insufficient) to shift the actual architecture code in the binary as well, but there's no benefit, except a (dubious) performance claim that if the offset is earlier, there's less seeking necessary to get to the relevant code. That's out of scope for this post, but certainly something one can try.

lipo has a way to carve out architectures from a binary, and also supports creating a universal binary from specific architecture binaries, but the ordering is fixed, based on a sort by alignment, to save space[^lipo-create], so it's unsuitable for reordering. The best option is a small tool that can shift the arch headers as needed. Fortunately, we don't need to understand the arch headers (not that it's hard) for this task, it's sufficient to figure out how many there are, read 20 bytes for each, and save to a new file with a different ordering.

[^lipo-create]: Search for * create_fat to find the function in lipo.c

The Python code is fairly straightforward, though it doesn't allow for anything fancy like specifying what order the architectures should be output in etc. Simply shifts "left", so for example for the file binary, which has the following 3 architectures on my macOS install, the first shift would go from 0 1 2 to 1 2 0 etc. as indicated below:

arch     | x86_64 arm64 arm64e -> arm64 arm64e x86_64 -> arm64e x86_64 arm64
position |    0     1      2   ->   1      2      0   ->    2      0     1

Obviously (and reassuringly) shifting all the way to the original yields the same SHA256 hash for the binary.

To begin, read in the fat_header to confirm this has multiple architectures, then read how many architectures we're dealing with:

offset = 0
magic = struct.unpack(">I", inbin.read(4))[0]
offset += 4
inbin.seek(offset)

if magic != MAGIC:
    print("not a universal binary")
    return 2

# next value tells us how many archs the binary contains
narchs = struct.unpack(">I", inbin.read(4))[0]
offset += 4
inbin.seek(offset)

if narchs < 2:
    print("not enough archs: %d", narchs)
    return 3

Once we know how many architectures are in the binary, we can simply read their headers, 20 bytes at the time, but we do not need to parse them. This would be different if we also had to reorder the object files inside the binary, since offsets would need to be adjusted, but that's not the case here.

headers = []
for _ in range(narchs):
    headers.append(inbin.read(20))
    offset += 20  # promise
    inbin.seek(offset)

Then write to the output binary, maintaining the fat_header and using a simple shift in the loop to write the arch headers themselves. Finally, copy the remainder of the file:

# write back magic
outbin.write(struct.pack(">I", magic))
# write out how many narchs
outbin.write(struct.pack(">I", narchs))
# put headers in, shifting "left"
for idx in range(1, narchs + 1):
    outbin.write(headers[idx % narchs])
# inneficiently copy the remaining bytes
outbin.write(inbin.read())

Final notes

There is not much to this once there's an understanding of how the headers in the binary are laid out. This is somewhat documented in the open source code Apple publishes, though resources like Jonathan Levin's books make for much easier references to learn from.

As for follow-up work (besides improved error checking) a better way to understand the Mach-O format is to also enable object file reordering in the binary, and/or have a way to specify the sequence for header reordering.

Reverse Engineering License Validation in a VS Code Extension

roguesys — Mon, 31 May 2021 16:47:11 GMT

I recently switched on a trial-basis to VS Code, and spent some time tweaking a few things by way of extensions. VS Code has most of the features I care about by default, but there are a few things I wanted a bit different, which is how I eventually came about this particular (unnamed) extension. After using it for a while, the extension nags the user for a license. The interface to enter it requires the user provides an email address and the license key provided by the vendor upon payment. If valid, the extension unlocks permanently (on that installation of VS Code (it does not appear the license state syncs between connected VS Code instances).

Since this all happens client-side, I wondered what kind of mechanisms are there to do so securely, and if there are any APIs in VS Code that enable this sort of thing. So here we are.

Disclaimer

I did this after purchasing the extension, for fun, out of curiosity, and to check my assumptions about implementing such code to begin with (e.g., "is there a secure storage/settings API in VS Code?"). As a newcomer to the editor and its ecosystem, I took this opportunity to reverse engineer some extension code so I can familiarize myself to implementation patterns, as well.

Walkthrough

While registering the extension with the license key, I did trace process and file accesses using Process Monitor and File Monitor from Objective-See. These didn't end up helping, but I think it's worth collecting traces like that early on. Given that I had a valid license to begin with, I figured these traces might help see which files etc. change during the activation process.

The source code for VS Code extensions (on macOS and Linux) is in $HOME/.vscode/extensions/<ext_name>, and in particular for this extension the two important files were:

js/main.js is the extension loader as required by VS Code;
js/app.js contains the minified source code of the extension.

I opened the second file in a VS Code buffer and ran Prettier to make it somewhat more readable. Since a lot of function and variable names get removed or changed to nondescript ones during the minification process, it's still not amazing to work with, but at least the code is not all on one line.

One thing that stands out at the top of the file is the license text for an MD5 implementation in JavaScript. That's a pretty big hint we might see MD5 in use somewhere.

Looking for license-related strings in the code, the function that checks the license comes up, included below. The comments are my own annotations as I was making sense of the code.

{
    key: "isValidLicense",
    value: function () {
    var e = // email
        arguments.length > 0 && void 0 !== arguments[0]
            ? arguments[0]
            : "",
        t = // token (the license key)
        arguments.length > 1 && void 0 !== arguments[1]
            ? arguments[1]
            : "";
    if (!e || !t) return !1;
    // do something with UUID+email
    var o = s()("".concat(i.APP.UUID).concat(e)),
        // split into 5-char slices
        r = o.match(/.{1,5}/g),
        // join with dashes
        n = r.slice(0, 5).join("-");
    // check t(oken) matches n
    return t === n;
    },
},

i and s() get defined elsewhere. Fortunately, "Go to Definition" worked for s() because there was exactly one declaration. Even without that, UUID was only present in one other location in the code, inside the i object.

// let's call this the `main` function given it declares APP etc.
function (e, t, o) {
"use strict";
    var i = {
        APP: {
            NAME: "[REDACTED]",
            UUID: "[REDACTED]",
        },
    },
r = o(1),
s = o.n(r);

At this point, it looked like I should figure out what gets passed as the o argument to this function, to then figure out what s was. Since nothing had names, I turned to semgrep to figure out where functions with 3 arguments get called from. This would not return just the call sites for the function I care about, but it would cut the scope significantly:

❯ semgrep --lang=js -e "\$FUNC(\$E, \$T, \$O)" app.js
21:        o.o(e, t) || Object.defineProperty(e, t, { enumerable: !0, get: i });
--------------------------------------------------------------------------------
26:          Object.defineProperty(e, Symbol.toStringTag, { value: "Module" }),
--------------------------------------------------------------------------------
27:          Object.defineProperty(e, "__esModule", { value: !0 });
--------------------------------------------------------------------------------
35:          Object.defineProperty(i, "default", { enumerable: !0, value: e }),
--------------------------------------------------------------------------------
39:            o.d(
40:              i,
41:              r,
42:              function (t) {
43:                return e[t];
44:              }.bind(null, r)
45:            );
--------------------------------------------------------------------------------
57:        return o.d(t, "a", t), t;
--------------------------------------------------------------------------------
1067:          const p = s.spawn(i, l, f);
--------------------------------------------------------------------------------
1145:            Object.defineProperty(e, i.key, i);
--------------------------------------------------------------------------------
1230:                e !== t.colorTheme && i.update("colorTheme", e, !0),
--------------------------------------------------------------------------------
1233:                    i.update("iconTheme", o, !0),
--------------------------------------------------------------------------------
1326:            Object.defineProperty(e, i.key, i);
--------------------------------------------------------------------------------
1329:      o.d(t, "default", function () {
1330:        return u;
1331:      });

(A couple more results elided for brevity and to avoid revealing which extension this is.)

Just as I was about to start trying to figure out which of these entries might be the one calling my "main" function, I thought on a whim to just assume s() is and MD5 implementation, and try it out. While I didn't think that would work, well…

❯ echo -n "<APP.UUID>my-test-email@example.com" | md5

Bingo! Well, this lacks the dashes at every 5 characters, but the rest matches the license I received. At first I forgot to use -n, but without it there's a newline emitted that leads to an incorrect hash.

I don't know if the UUID changes per installation/user or if it's global for the extension, but at any rate, it should be possible to "crack" the extension once installed locally, even without going through the entire flow described above. Extracting the UUID from the un-minified file, using ripgrep:

❯ rg -o "UUID:\".*?\"" app.js

A Better License Check?

It's necessary to consider both the threat model and the economics of implementing a solution before deciding to iterate and improve the license check. Investing resources into building up sufficient protections may not always make rational sense.

The threat model with licensing revolves primarily around piracy. As implemented now, it is trivial to create a license key generator, as well as a "nag remover", patching out the validation code. Anther threat involves sharing of license keys. In the threat model, the attacker is the end-user, which controls the extension, the VS Code installation, and the operating system as a whole.

The economics are less obvious to me since I don't have profit and revenue numbers. There are certainly solutions available that might further limit the likelihood of piracy, but at the risk of upsetting legitimate users (through increasing complicated steps required to use the extension even if paid for) and spending resources on a solution that may be disproportionate to the potential additional revenue.

While at first I was going to list a few ideas and their shortcomings, I think there are lots of approaches with varying degrees of complexity to try, none foolproof, all with trade-offs that I can't analyze with the partial information I have.

It's also true that folks that don't want to pay for this particular extension have lots of alternatives. Some may value this particular extension and be fine paying for it, and some may not. I do not think pirated copies equal lost sales, but there is an emotional aspect to anti-piracy efforts, not just a rational/economic one, and I can't speak to these developers' state of mind on that topic. Maybe they already considered everything I've written about here, and this is the result.

It's rather common to assume, from the outside, that imperfect implementations are the result of ignorance of security principles, when sometimes they are in fact the result of a deliberate risk assessment, with its trade-offs.

※ hackd

Capabilities for agent delegation

Atuin and Tailscale and containers and 1Password

Containers (Podman, Docker etc.)

Atuin server

Postgres database

Tailscale

1Password

All together now (hopefully)

Atuin client

macOS reflective code loading analysis

Background

How it (really) works today

Tracing the source code

How long has this been the case?

"Carbon" dating dyld

Future work

Reordering Architecture Headers in a Universal Mach-O Binary

Intro to universal Mach-O binaries

Reordering architecture headers

Final notes

Reverse Engineering License Validation in a VS Code Extension

Disclaimer

Walkthrough

A Better License Check?

"Carbon" dating `dyld`