Writeups for Dennis Yurichev's Reverse Engineering Challenges (#23-#35)

This is the third set of solutions for my self-imposed challenge of completing at least fifty of the exercises on Dennis Yurichev's challenges.re by the end of the year. The previous set is available here.

Challenge #23

The problem is prefaced with, "[t]his is another implementation of a well-known library function, works only in a 64-bit environment." Translating the disassembly directly to C reveals unrolled loops, but the intent isn't too difficult to figure out.

int f(char *a)
{
        int i;

        if (a[0] == '\0') {
                return 0;
        }

        if (a[1] == (char) 0xff) {
                return 1;
        }

        if (a[2] == (char) 0xff) {
                return 2;
        }

        if (a[3] == (char) 0xff) {
                return 3;
        }

        if (a[4] == (char) 0xff) {
                return 4;
        }

        if (a[5] == (char) 0xff) {
                return 5;
        }

        i = 0;

        while (a[6] != (char) 0xff) {
                if (a[7] == (char) 0xff) {
                        return i + 7;
                }

                i += 8;
                a += 8;

                if (a[1] == (char) 0xff) {
                        return i;
                }

                if (a[1] == (char) 0xff) {
                        return i + 1;
                }

                if (a[2] == (char) 0xff) {
                        return i + 2;
                }

                if (a[3] == (char) 0xff) {
                        return i + 3;
                }

                if (a[4] == (char) 0xff) {
                        return i + 4;
                }

                if (a[5] == (char) 0xff) {
                        return i + 5;
                }
        }

        return i + 6;
}

f returns the index of the first occurrence of 0xff in a. In addition to asking for the purpose of the code, the challenge poses a few additional questions.

First: "The code may crash under some specific circumstances. Which are…?" f will crash in the case that there isn't a 0xff character in the string.

Second: "The code can be easily optimized using SSEx. How?" movq can be used to dereference the characters of a, and the location of the 0xff character can be found using pcmpeqb. Actually implementing this is left as an exercise to the reader. And I'm not saying that just because writing SIMD by hand makes me want to break down and cry… or anything like that…

Finally: "The code will not work correctly on big-endian architectures. How to fix it?" In the disassembly, the LSB of rdx (dl) is treated as the first character in the sliding window. On a big-endian system, dereferencing the window as an integer would mean that the LSB would correspond with the last character in the window. To fix this, you would need to change which parts of the register are being looked at. I realize that's a rather anemic answer, but the alternative would be going all-in and implementing f on a big-endian platform, which I don't really want to do right now.

Challenge #26

I decided to skip challenges #24 and #25 as they were listed as "Level 2" and "Level 3" respectively in terms of difficulty. Challenge #25 in particular seemed particularly demanding. Challenge #26, on the other hand, was a relatively straightforward bytecode reverse engineering task. Like Challenge #14, disassemblies for both .NET and the JVM are given, and as I'm more familiar with Java than C# (unfortunately), that's the disassembly I chose to work with.

public static byte f(byte a) {
    return (byte) ((((long) a * 8623620610L) & 1136090292240L) % 1023L);
}

Again, I'm not familiar with JVM bytecode, so I broke out my favorite JVM reference. Here are the instructions we're concerned with:

iload_0 // load an int value from local 0
i2l     // convert an int to a long
l2i     // convert a long to an int
i2b     // convert an int to a byte
ldc2_w  // push a constant onto the stack
lmul    // multiply two longs
land    // perform a bitwise and on two longs
lrem    // perform remainder division on two longs

Even if you don't know how the JVM works, I think the purpose of f is fairly clear as soon as you know what those few instructions do.

I've typically been using Matt Godbolt's amazing Compiler Explorer to check my solutions, but this time around I used Javabytes. The disassembly of my translation for f matches what was given for the challenge, so I'm quite confident in my answer. As for what it does: I began my analysis as I typically do, giving the function some test values and observing the output.

public static void main(String[] args) {
    for (int i = 0; i < 256; i++) {
        System.out.printf("%3i: b\n", i, f((byte) i));
    }
}

//   0: 0
//   1: -128
//   2: 64
//   3: -64
//   4: 32
// ...
// 253: 63
// 254: -65
// 255: 127

That isn't very telling, but the oscillating sign gives me an idea.

public static String toPaddedBinary(byte a) {
    String s = String.format("%8s", Integer.toBinaryString(a));
    s = s.replace(' ', '0');
    return s.substring(s.length() - 8, s.length());
}

public static void main(String[] args) {
    for (int i = 0; i < 256; i++) {
        System.out.printf("%s: %s\n", toPaddedBinary((byte) i), toPaddedBinary(f((byte) i)));
    }
}

// 00000000: 00000000
// 00000001: 10000000
// 00000010: 01000000
// ...
// 11111101: 10111111
// 11111110: 01111111
// 11111111: 11111111

So f reverses the bits of a.

Challenge #27

This challenge threw me for a bit of a loop, as it didn't give the usual amd64 output from GCC 4.9. Rather an i386 disassembly from MSVC 2010 was given, alongside an arm64 disassembly from GCC 4.9. I tried both, but had some significant trouble with the MSVC disassembly as it seemed to be dealing with 64-bit integers on a 32-bit architecture.

After quickly reviewing CDOT's AArch64 reference to get an idea of register widths, this is the translation came up with:

int f(int a)
{
    return (((int) (((long) a * 0xc64b2279) >> 32)) + a) >> 9 - (a >> 31);
}

I'm not particularly confident in this, however, as the behavior of f is to return $floor(a / 289)$. I suspect my poor understanding of the flexible second operand (i.e. in sub w0, w1, w0, asr 31) is what gave me me the most trouble. Perhaps this is a challenge I should return to when I properly learn ARM.

I tried a more direct translation to Python,

def test_f(a):
    result = a * 0xc64b2279
    upper = result & 0xffffffff00000000
    lower = result & 0xffffffff
    return ((upper + lower) >> 9) + \
        (((upper + lower) >> 9) >> 31)

fmt = lambda n: bin(n)[2:].rjust(32, '0')

for i in range(256):
    print("{}\n{}\n".format(fmt(i), fmt(test_f(i))))

which didn't yield any recognizable patterns.

Actually, before implementing it in Python, I implemented it in Emacs Lisp (I might have been waiting on Python to compile? I don't remember).

(defun test-f (a)
  (let* ((result (* a #xc64b2279))
         (upper (logand result #xffffffff00000000))
         (lower (logand result #xffffffff)))
    (+ (ash (+ upper lower) 9)
       (ash (ash (+ upper lower) 9) 31))))

Either way, this challenge wasn't fruitful.

Challenge #28

I suspect that this challenge was made a bit easier by GCC's optimizations. The amd64 disassembly includes two unused functions, f2 and my_memdup – they're used in some of the other disassemblies, but I chose to ignore them.

#include <string.h>
#include <stdlib.h>

int f1(int *a, int *b)
{
        return *a > *b ? 0 : -1;
}

int f_main(void *src, int n)
{
        int tmp;
        char *dst;

        dst = malloc(n * sizeof(int));
        memcpy(dst, src, n * sizeof(int));
        qsort(dst, n, sizeof(int), f1);

        if (n > 1) {
                tmp = dst[n >> 1] + \
                        dst[n >> 1 - 1];
                return (tmp + (tmp >> 31)) >> 1;
        }

        return dst[0];
}

I've started to see this (tmp + (tmp >> 31)) >> 1 idiom rather frequently, so I decided to finally look it up, coming across this Stack Overflow answer. I'm glad I did, because realizing that it carries out signed integer division by two makes this exercise far more clear.

#include <string.h>
#include <stdlib.h>

int f1(int *a, int *b)
{
        return *a > *b ? 0 : -1;
}

int f_main(void *src, int n)
{
        char *dst;

        dst = malloc(n * sizeof(int));
        memcpy(dst, src, n * sizeof(int));
        qsort(dst, n, sizeof(int), f1);

        if (n > 1) {
                return (dst[n / 2] + dst[n / 2 - 1]) / 2;
        }

        return dst[0];
}

f_main returns the median of a set of values.

Challenge #30

I have, once again, skipped another challenge that was being listed as "level 2," which brings us to the thirtieth challenge. This one is strikingly different from the other challenges I've covered here; rather than being asked to describe what a program does, the instruction read:

"This program requires a password. Try to find it.

As an additional exercise, try to change the password by patching the executable file. Also try using one with a different length. What is the shortest possible password here?

Also try to crash the program using only string input."

We're given several links to downloads. Binaries are provided for 32-bit Microsoft Windows, Mac OS X, and i386/mips Linux. I went with i386 Linux, as I'd be able to run the challenge natively.q

jakob@Epsilon /tmp $ sha256sum password1
96b8110208d61c7ac586910ebad22ef2e4bbeb867e6d6429967846698b9d02fc  password1

Being faced with a binary, I was tempted to use this as an opportunity to try out Ghidra, but while I waited for OpenJDK 11 to download, I peered inside with radare2 and decided that it wasn't worth the trouble. Here's the disassembly, according to radare:

[0x080484ed]> pdf
            ;-- eip:
┌ (fcn) main 149
│   main ();
│           ; var int local_4h @ esp+0x4
│           ; var int local_1ch @ esp+0x1c
│           ; var int local_9ch @ esp+0x9c
│           ; DATA XREF from 0x08048407 (entry0)
│           0x080484ed      55             pushl %ebp
│           0x080484ee      89e5           movl %esp, %ebp
│           0x080484f0      83e4f0         andl $0xfffffff0, %esp
│           0x080484f3      81eca0000000   subl $0xa0, %esp
│           0x080484f9      65a114000000   movl %gs:0x14, %eax         ; [0x14:4]=-1 ; 20
│           0x080484ff      8984249c0000.  movl %eax, local_9ch
│           0x08048506      31c0           xorl %eax, %eax
│           0x08048508      c70424208604.  movl $str.enter_password:, 0(%esp) ; [0x8048620:4]=0x65746e65 ; "enter password:"
│           0x0804850f      e89cfeffff     calll sym.imp.puts          ; int puts(const char *s)
│           0x08048514      8d44241c       leal local_1ch, %eax        ; 0x1c ; 28
│           0x08048518      89442404       movl %eax, local_4h
│           0x0804851c      c70424308604.  movl $0x8048630, 0(%esp)    ; [0x8048630:4]=0x6e007325
│           0x08048523      e8b8feffff     calll sym.imp.__isoc99_scanf
│           0x08048528      83f801         cmpl $1, %eax               ; 1
│       ┌─< 0x0804852b      740c           je 0x8048539
│       │   0x0804852d      c70424338604.  movl $str.no_password_supplied, 0(%esp) ; [0x8048633:4]=0x70206f6e ; "no password supplied"
│       │   0x08048534      e877feffff     calll sym.imp.puts          ; int puts(const char *s)
│       │   ; JMP XREF from 0x0804852b (main)
│       └─> 0x08048539      c74424044886.  movl $str.metallica, local_4h ; [0x8048648:4]=0x6174656d ; "metallica"
│           0x08048541      8d44241c       leal local_1ch, %eax        ; 0x1c ; 28
│           0x08048545      890424         movl %eax, 0(%esp)
│           0x08048548      e843feffff     calll sym.imp.strcmp        ; int strcmp(const char *s1, const char *s2)
│           0x0804854d      85c0           testl %eax, %eax
│       ┌─< 0x0804854f      750e           jne 0x804855f
│       │   0x08048551      c70424528604.  movl $str.password_is_correct, 0(%esp) ; [0x8048652:4]=0x73736170 ; "password is correct"
│       │   0x08048558      e853feffff     calll sym.imp.puts          ; int puts(const char *s)
│      ┌──< 0x0804855d      eb0c           jmp 0x804856b
│      ││   ; JMP XREF from 0x0804854f (main)
│      │└─> 0x0804855f      c70424668604.  movl $str.password_is_not_correct, 0(%esp) ; [0x8048666:4]=0x73736170 ; "password is not correct"
│      │    0x08048566      e845feffff     calll sym.imp.puts          ; int puts(const char *s)
│      │    ; JMP XREF from 0x0804855d (main)
│      └──> 0x0804856b      8b94249c0000.  movl local_9ch, %edx        ; [0x9c:4]=-1 ; 156
│           0x08048572      653315140000.  xorl %gs:0x14, %edx
│       ┌─< 0x08048579      7405           je 0x8048580
│       │   0x0804857b      e820feffff     calll sym.imp.__stack_chk_fail ; void __stack_chk_fail(void)
│       │   ; JMP XREF from 0x08048579 (main)
│       └─> 0x08048580      c9             leave
└           0x08048581      c3             retl

As you can see, this is just like any other "easy crackme." A simple string comparison. radare2 doesn't automatically decode 0x8048630 as a string, but it's trivial to obtain its value.

:> psz @ 0x8048630
%s

Translating it into C is similarly trivial.

#include <stdio.h>
#include <string.h>

int main(void)
{
        char buf[128];
        puts("enter password:");
        if (scanf("%s", buf) != 1) {
                puts("no password supplied");
        }
        if (strcmp(buf, "metallica") == 0) {
                puts("password is correct");
        } else {
                puts("password is not correct");
        }
}

I do have to complement Yurichev's choice of strong passwords. \m/

One may wonder where I pulled 128 from. Our stack layout looks something like this:

+-------------------------------------+
|%esp                                 |
|Scratch space for function arguments.|
+-------------------------------------+
|%esp + 0x1c                          |
|Buffer starts here                   |
|...                                  |
|Buffer ends here                     |
+-------------------------------------+
|%esp + 0x9c                          |
|Stack canary, perhaps?               |
+-------------------------------------+
|%esp + I DON'T CARE ANYMORE          |
|Here be dragons.                     |
+-------------------------------------+

radare2 is kind enough to automatically name local variables according to their position in the stack layout, so I was able to derive this from the names local_1ch and local_9ch. local_4h isn't really a local variable – it looks like one, but that's just how the compiler decided to set up arguments for the various function calls (dereferencing the stack pointer, as opposed to using push). Anyway, subtracting 0x9c from 0x1c gets you 128 – hence, the buffer size in my translation.

It's pretty easy to patch the password, since strcmp operates on C strings. Just patch the characters. No sort of length needs to be adjusted since they're null-terminated. The shortest possible password would be zero characters long, which would be achieved by patching in a null byte at the 'm' in "metallica". This can be done however you like, though radare makes it easy if you've opened the file in "write mode": just seek to the location of the 'm' and wx 00. Crashing the program is similarly easy, since there are no bounds checks on the call to scanf.

jakob@Epsilon /tmp $ python -c "print('a' * 256)" | ./test
enter password:
password is not correct
Segmentation fault

Challenge #31

Yowch. We're only given disassemblies from MSVC this time.

double f(double a, int b)
{
        double cur;
        cur = 1.0;
        while (((double) (((int) (cur - a)) - b)) <= 0.001)
                cur = (a + 1.0) * 0.5;
        return cur;
}

Once again, I deferred to float.exposed to decode the floating-point constant values. __real@3ff0000000000000 is 1.0, __real@3f50624dd2f1a9fc is approximately 0.001, and __real@3fe0000000000000 is 0.5. I also needed to look up most of the SIMD instructions. cvttsd2si converts a double to an int, cdq converts an int into a long, cvtdq2pd converts an int to a double, and comisd is comparable to cmp.

This converges for very few values. Which is a pain, since this translation gives me some very promising output in Compiler Explorer. But considering the value that the loop gets stuck on, I suspect that f averages a and b.

Challenge #32

We're given a hint that, "[t]his is a standard C library function. The source code is taken from MSVC 2010."

#include <stddef.h>

char *f(char *a, char *b)
{
        char *cur;
        char *a_cur;
        char *b_cur;

        cur = a;

        if (*b == '\0') {
                return a;
        }

        while (*cur != '\0') {
                a_cur = cur;
                b_cur = b;

                while (*a_cur != '\0' && *b_cur != '\0' && *a_cur == *b_cur) {
                        a_cur++;
                        b_cur++;
                }

                if (*b_cur == '\0') {
                        return cur;
                }

                cur++;
        }

        return NULL;
}

I think the translation makes the purpose of this function reasonably clear, but the hint means I can verify my work against C's tiny standard library. f is obviously one of the library's string functions. Can you guess which one?

(My answer is that f is an implementation of strstr.)

Challenge #33

What gave it away for me this time was the "crypto" tag. I stopped in my translation efforts about here,

void f(void *a, void *b, void *c)
{
        int mushroom; // _k0
        int bean;     // _k1
        int tomato;   // _k2
        int corn;     // _k3

        // eax = a[0]
        // ecx = a[1]

        mushroom = b[0];
        bean = b[1];

        // esi = b[3];
        // edx = 0;

        tomato = b[2];
        corn = b[3];

        // edi = 32;

        // LL8
        esi = ecx >> 5 + bean;
        ebx = ecx << 4 + mushroom;
        edx -= 0x61c88647;

        esi ^= ebx;
        ebx = ecx + edx;
        esi ^= ebx;

        eax += esi;

        esi = eax >> 5 + corn;
        ebx = eax << 4 + tomato;

        esi ^= ebx;
        ebx = eax + edx;
        esi ^= ebx;

        ecx += esi;
        edi--;

        // When edi == 0: c[0] = eax, c[1] = ecx
}

and decided to do a search for '0x61c88647 hash'. This yields a few interesting results, such as one describing the constant used in ThreadLocal.java's implementation Fibonacci hashing and another describing the constants used in the Tiny Encryption Algorithm.

This immediately set off bells for me. I read Bruce Schneier's Applied Cryptography some years back and was instantly reminded that TEA uses ARX with shifts of 5 and 4. If you pull up Wikipedia's reference code for TEA encryption, you'll be greeted with the following:

void encrypt (uint32_t v[2], uint32_t k[4]) {
    uint32_t v0=v[0], v1=v[1], sum=0, i;           /* set up */
    uint32_t delta=0x9E3779B9;                     /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
    for (i=0; i<32; i++) {                         /* basic cycle start */
        sum += delta;
        v0 += ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
        v1 += ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
    }                                              /* end cycle */
    v[0]=v0; v[1]=v1;
}

Armed with this, I can confidently say that f is an implementation of TEA encryption with a schedule constant of 0x61c88647.

Challenge #34

Another crypto challenge. This time, we're told that "[t]his is a well-known cryptographic algorithm from the past." The disassembly was simple enough that I thought to translate it into standard mathematical notation rather than C, but it turned out to be far less helpful than the equivalent C.

uint16_t f(uint16_t a)
{
        uint16_t tmp;

        tmp = a << 2;
        tmp ^= a;
        tmp <<= 1;
        tmp ^= a;
        tmp <<= 2;
        tmp ^= a;

        return ((tmp & 32) << 10) | (a >> 1);
}

That said, I'm not familiar with the particular algorithm. There's a clear pattern, but I'm not sure where to start looking. Is it a hash function? Some kind of bastardized XOR encryption? Who knows.

Challenge #35

This was a tough one. I'll give my initial translation to C and explain where I went wrong:

#include <strings.h>
#include <stdio.h>

int f(int x, int y)
{
        int a, b;

        if (x == 0) {
                return y;
        }

        if (y == 0) {
                return x;
        }

        a = x >> ffs(x);
        b = y >> ffs(y);

        while (a != b) {
                if (a < b) {
                        SWAP(a, b);
                }

                if (a == 1) {
                        break;
                }

                b = (b - a) >> ffs(b - a);
        }

        return a << ffs(x | y);
}

One thing worth remarking on in the disassembly is this:

xor esi, edx
xor edx, esi
xor esi, edx

This is the XOR swap algorithm. In an attempt to make the translation more clear, I replaced it with a (non-existent) SWAP macro. ffs is also a POSIX extension that corresponds nicely to the bsf instruction.

The issue? I've been reading these MSVC disassemblies wrong the whole time. Take this instruction, for example: mov edx, DWORD PTR _y$[esp+4]. I'd never actually done out the calculations. As it turns out, _rt$2[esp+8] aliases with y. I thought that _rt$2 was a distinct variable and that the compiler was storing to some local variable but never using it. This isn't the case, hence why the translation doesn't work as intended.

What I need to start doing for these MSVC disassemblies is translating them into something I can assemble.

global f
f:
    push ecx
    push esi
    mov esi, DWORD [esp+12]
    test esi, esi
    jne init
    mov eax, DWORD [esp+16]
    pop esi
    pop ecx
    ret
init:
    mov edx, DWORD [esp+16]
    mov eax, esi
    test edx, edx
    je exit
    or eax, edx
    push edi
    bsf edi, eax
    bsf eax, esi
    mov ecx, eax
    mov DWORD [esp+8], eax
    bsf eax, edx
    shr esi, cl
    mov ecx, eax
    shr edx, cl
    mov DWORD [esp+16], eax
    cmp esi, edx
    je return
lp:
    jbe skip
    xor esi, edx
    xor edx, esi
    xor esi, edx
skip:
    cmp esi, 1
    je return
    sub edx, esi
    bsf eax, edx
    mov ecx, eax
    shr edx, cl
    mov DWORD [esp+16], eax
    cmp esi, edx
    jne lp
return:
    mov ecx, edi
    shl esi, cl
    pop edi
    mov eax, esi
exit:
    pop esi
    pop ecx
    ret 0

Actually, I should be doing this for all of the challenges… Anyway, observing a few test values for f:

f(1, 1) = 1
f(1, 2) = 1
f(1, 3) = 1
f(1, 4) = 1
f(1, 5) = 1
f(1, 6) = 1
f(1, 7) = 1
f(1, 8) = 1
f(1, 9) = 1
...
f(1, 252) = 1
f(1, 253) = 1
f(1, 254) = 1
f(1, 255) = 1
f(2, 1) = 1
f(2, 2) = 2
f(2, 3) = 1
f(2, 4) = 2
f(2, 5) = 1
f(2, 6) = 2
f(2, 7) = 1
f(2, 8) = 2
f(2, 9) = 1
f(2, 10) = 2
f(2, 11) = 1
f(2, 12) = 2
...
f(9, 1) = 1
f(9, 2) = 1
f(9, 3) = 3
f(9, 4) = 1
f(9, 5) = 1
f(9, 6) = 3
f(9, 7) = 1
f(9, 8) = 1
f(9, 9) = 9
...
f(10, 1) = 1
f(10, 2) = 2
f(10, 3) = 1
f(10, 4) = 2
f(10, 5) = 5
f(10, 6) = 2
f(10, 7) = 1
f(10, 8) = 2
f(10, 9) = 1
f(10, 10) = 10
f(10, 11) = 1
f(10, 12) = 2
...

It took me a while, but I eventually noticed the pattern. f is the greatest common divisor function.