Skip to main content
hackd

Reverse Engineering License Validation in a VS Code Extension

·7 mins

I recently switched on a trial-basis to VS Code, and spent some time tweaking a few things by way of extensions. VS Code has most of the features I care about by default, but there are a few things I wanted a bit different, which is how I eventually came about this particular (unnamed) extension. After using it for a while, the extension nags the user for a license. The interface to enter it requires the user provides an email address and the license key provided by the vendor upon payment. If valid, the extension unlocks permanently (on that installation of VS Code (it does not appear the license state syncs between connected VS Code instances).

Since this all happens client-side, I wondered what kind of mechanisms are there to do so securely, and if there are any APIs in VS Code that enable this sort of thing. So here we are.

Disclaimer #

I did this after purchasing the extension, for fun, out of curiosity, and to check my assumptions about implementing such code to begin with (e.g., “is there a secure storage/settings API in VS Code?"). As a newcomer to the editor and its ecosystem, I took this opportunity to reverse engineer some extension code so I can familiarize myself to implementation patterns, as well.

Walkthrough #

While registering the extension with the license key, I did trace process and file accesses using Process Monitor and File Monitor from Objective-See. These didn’t end up helping, but I think it’s worth collecting traces like that early on. Given that I had a valid license to begin with, I figured these traces might help see which files etc. change during the activation process.

The source code for VS Code extensions (on macOS and Linux) is in $HOME/.vscode/extensions/<ext_name>, and in particular for this extension the two important files were:

  • js/main.js is the extension loader as required by VS Code;
  • js/app.js contains the minified source code of the extension.

I opened the second file in a VS Code buffer and ran Prettier to make it somewhat more readable. Since a lot of function and variable names get removed or changed to nondescript ones during the minification process, it’s still not amazing to work with, but at least the code is not all on one line.

One thing that stands out at the top of the file is the license text for an MD5 implementation in JavaScript. That’s a pretty big hint we might see MD5 in use somewhere.

Looking for license-related strings in the code, the function that checks the license comes up, included below. The comments are my own annotations as I was making sense of the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
    key: "isValidLicense",
    value: function () {
    var e = // email
        arguments.length > 0 && void 0 !== arguments[0]
            ? arguments[0]
            : "",
        t = // token (the license key)
        arguments.length > 1 && void 0 !== arguments[1]
            ? arguments[1]
            : "";
    if (!e || !t) return !1;
    // do something with UUID+email
    var o = s()("".concat(i.APP.UUID).concat(e)),
        // split into 5-char slices
        r = o.match(/.{1,5}/g),
        // join with dashes
        n = r.slice(0, 5).join("-");
    // check t(oken) matches n
    return t === n;
    },
},

i and s() get defined elsewhere. Fortunately, “Go to Definition” worked for s() because there was exactly one declaration. Even without that, UUID was only present in one other location in the code, inside the i object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// let's call this the `main` function given it declares APP etc.
function (e, t, o) {
"use strict";
    var i = {
        APP: {
            NAME: "[REDACTED]",
            UUID: "[REDACTED]",
        },
    },
r = o(1),
s = o.n(r);

At this point, it looked like I should figure out what gets passed as the o argument to this function, to then figure out what s was. Since nothing had names, I turned to semgrep to figure out where functions with 3 arguments get called from. This would not return just the call sites for the function I care about, but it would cut the scope significantly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
❯ semgrep --lang=js -e "\$FUNC(\$E, \$T, \$O)" app.js
21:        o.o(e, t) || Object.defineProperty(e, t, { enumerable: !0, get: i });
--------------------------------------------------------------------------------
26:          Object.defineProperty(e, Symbol.toStringTag, { value: "Module" }),
--------------------------------------------------------------------------------
27:          Object.defineProperty(e, "__esModule", { value: !0 });
--------------------------------------------------------------------------------
35:          Object.defineProperty(i, "default", { enumerable: !0, value: e }),
--------------------------------------------------------------------------------
39:            o.d(
40:              i,
41:              r,
42:              function (t) {
43:                return e[t];
44:              }.bind(null, r)
45:            );
--------------------------------------------------------------------------------
57:        return o.d(t, "a", t), t;
--------------------------------------------------------------------------------
1067:          const p = s.spawn(i, l, f);
--------------------------------------------------------------------------------
1145:            Object.defineProperty(e, i.key, i);
--------------------------------------------------------------------------------
1230:                e !== t.colorTheme && i.update("colorTheme", e, !0),
--------------------------------------------------------------------------------
1233:                    i.update("iconTheme", o, !0),
--------------------------------------------------------------------------------
1326:            Object.defineProperty(e, i.key, i);
--------------------------------------------------------------------------------
1329:      o.d(t, "default", function () {
1330:        return u;
1331:      });

(A couple more results elided for brevity and to avoid revealing which extension this is.)

Just as I was about to start trying to figure out which of these entries might be the one calling my “main” function, I thought on a whim to just assume s() is and MD5 implementation, and try it out. While I didn’t think that would work, well…

1
❯ echo -n "<APP.UUID>my-test-email@example.com" | md5

Bingo! Well, this lacks the dashes at every 5 characters, but the rest matches the license I received. At first I forgot to use -n, but without it there’s a newline emitted that leads to an incorrect hash.

I don’t know if the UUID changes per installation/user or if it’s global for the extension, but at any rate, it should be possible to “crack” the extension once installed locally, even without going through the entire flow described above. Extracting the UUID from the un-minified file, using ripgrep:

1
❯ rg -o "UUID:\".*?\"" app.js

A Better License Check? #

It’s necessary to consider both the threat model and the economics of implementing a solution before deciding to iterate and improve the license check. Investing resources into building up sufficient protections may not always make rational sense.

The threat model with licensing revolves primarily around piracy. As implemented now, it is trivial to create a license key generator, as well as a “nag remover”, patching out the validation code. Anther threat involves sharing of license keys. In the threat model, the attacker is the end-user, which controls the extension, the VS Code installation, and the operating system as a whole.

The economics are less obvious to me since I don’t have profit and revenue numbers. There are certainly solutions available that might further limit the likelihood of piracy, but at the risk of upsetting legitimate users (through increasing complicated steps required to use the extension even if paid for) and spending resources on a solution that may be disproportionate to the potential additional revenue.

While at first I was going to list a few ideas and their shortcomings, I think there are lots of approaches with varying degrees of complexity to try, none foolproof, all with trade-offs that I can’t analyze with the partial information I have.

It’s also true that folks that don’t want to pay for this particular extension have lots of alternatives. Some may value this particular extension and be fine paying for it, and some may not. I do not think pirated copies equal lost sales, but there is an emotional aspect to anti-piracy efforts, not just a rational/economic one, and I can’t speak to these developers' state of mind on that topic. Maybe they already considered everything I’ve written about here, and this is the result.

It’s rather common to assume, from the outside, that imperfect implementations are the result of ignorance of security principles, when sometimes they are in fact the result of a deliberate risk assessment, with its trade-offs.