This is the second set of solutions for my self-imposed challenge of completing at least fifty of the exercises on Dennis Yurichev's challenges.re by the end of the year. The first set is available here.
Challenge #12
No hints are given for this challenge, but it is the first time a binary is available in addition to the disassembly. I didn't download the executable, I was able to gather from the tags that the target is amd64 Linux.
If later challenges also provide executables, I may use it as an opportunity to explore the NSA's newly-released Ghidra. At the time of writing this, Ghidra's source code has yet to be released, so I'll have to pass up the opportunity this time around.
#include <stdio.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/types.h> #include <time.h> #include <unistd.h> #include <utime.h> int main(int argc, char **argv) { // push rbx // mov rbx, rsi // sub rsp, 160 int ret; struct stat sbuf; struct utimbuf tbuf; // cmp edi, 2 // je .L2 if (argc != 2) { // mov edi, OFFSET FLAT:.LC0 // call puts puts("Usage: "); } // .L2: // mov rsi, QWORD PTR [rbx+8] // lea rdx, [rsp+16] // mov edi, 1 // call __xstat ret = stat(argv[1], &sbuf); // test eax, eax // js .L10 if (ret < 0) { // .L10: // mov edi, OFFSET FLAT:.LC1 // call puts // xor edi, edi // call exit puts("error #1!"); exit(0); } // mov rax, QWORD PTR [rsp+88] // xor edi, edi // mov QWORD PTR [rsp], rax tbuf.actime = sbuf.st_atim.tv_sec; // call time /// ... // mov QWORD PTR [rsp+8], rax tbuf.modtime = time(NULL); // mov rdi, QWORD PTR [rbx+8] // mov rsi, rsp // call utime ret = utime(argv[1], &tbuf); // test eax, eax // js .L11 if (ret < 0) { // .L11: // mov edi, OFFSET FLAT:.LC2 // call puts // xor edi, edi // call exit puts("error #2!"); exit(0); } // add rsp, 160 // xor eax, eax // pop rbx // ret return 0; }
The first thing that may stand out in the disassembly listing is call __xstat
.
__xstat
isn't part of the C standard library or POSIX. My understanding of
functions like these is that sometimes an interface like stat
will result in a
call to some internal libc routine when compiled, but the only time I've seen
this before was with __printf_chk
. Regardless, search engines are your friend,
and you should have no trouble arriving at the relevant page in the Linux
Standard Base Specification. One interesting thing of note is the comment that
"ver
shall be 3
or the behavior of these functions is undefined," yet the
disassembly indicates that ver
is 1
. I'm doubtful that this is part of the
challenge, though.
I also just guessed that QWORD PTR [rsp+88]
was sbuf.st_atim.tv_sec
, given the
context that the value is used in. Going from assembly to the corresponding
fields of a struct is a pain without a tool, which is perhaps an indication that
I should've downloaded the binary and used Ghidra^h^h^h^h^h^hradare2 to prod at
it.
Either way, the code updates a file's modification time. It's comparable to
touch
, but doesn't create the file if it doesn't exist. The strings are
intentionally vague, so here's a cleaned up version:
#include <errno.h> #include <string.h> int main(int argc, char **argv) { struct stat sbuf; struct utimbuf tbuf; if (argc != 2) { printf("Usage: %s [path]\n", argv[0]); } if (stat(argv[1], &sbuf) < 0) { printf("%s: %s\n", argv[1], strerror(errno)); exit(0); } tbuf.actime = sbuf.st_atim.tv_sec; tbuf.modtime = time(NULL); if (utime(argv[1], &tbuf) < 0) { printf("utime: %s\n", strerror(errno)); exit(0); } return 0; }
Challenge #13
The question for this exercise is, "[w]hat does this SSE code do?" Uh oh. I don't know anything about SSE. Not the end of the world, though. I always appreciate an opportunity to learn. Here are some notes I took on chapter 25 of Yurichev's book:
Vectorization is the process of taking several arrays as input and producing a single array as output. SIMD (Single Instruction, Multiple Data) is a way of optimizing vectorization by doing certain array-level operations in parallel.
Intel's initial implementation of SIMD reused FPU register. SSE added 128-bit registers (%xmm*) that were separate from the FPU, and AVX added 256-bit registers that were separate from the FPU.
Well, that doesn't seem too complicated, and the exercise only uses two
instructions: movdqu
, which loads a 16-byte value from memory into an %xmm*
register, and pmaxub
, which calculates the maximum values between two %xmm*
registers.
void f(int *dest, int *a, int *b) { int i; // xor rax, rax // ... // add rax, 16 // cmp rax, 1024 // jne .L4 // ... // .L4: for (i = 0; i < 256; i++) { // movdqu xmm0, XMMWORD PTR [rsi+rax] // movdqu xmm1, XMMWORD PTR [rdx+rax] // pmaxub xmm0, xmm1 // movdqu XMMWORD PTR [rdi+rax], xmm0 dest[i] = a[i] > b[i] ? a[i] : b[i]; } // rep ret return; }
f
will fill an array, dest
, such that the element at each index contains the
greater value between a
and b
for that index.
Challenge #14
The challenge description explains that, "[n]ow that's easy," and gives both .NET and Java bytecode disassemblies. I am not familiar with either bytecode format, but I do know Java (unfortunately), so I went with that.
public class Challenge14 { public static boolean f(char a) { // 0: iload_1 // 1: bipush 97 // 3: if_icmplt 14 // 6: iload_1 // 7: bipush 122 // 9: if_icmpgt 14 if (a < 97 || a > 122) { // 14: iload_1 // 15: bipush 65 // 17: if_icmplt 28 // 20: iload_1 // 21: bipush 90 // 23: if_icmpgt 28 if (a < 65 || a > 90) { // 28: iconst_0 // 29: ireturn return false; } // 26: iconst_1 // 27: ireturn return true; } // 12: iconst_1 // 13: ireturn return true; } }
I'm not particularly confident in my translation – the above is the result of
skimming the Java bytecode and Java bytecode instruction listings pages on
Wikipedia – but that translation does appear to convey a meaningful operation:
telling whether or not a
is an ASCII letter.
Challenge #15
The challenge description explains that, "[n]ow that's really easy."
void f(char *dst) { int i; // xorps %xmm0, %xmm0 // movups %xmm0, 240(%rdi) // movups %xmm0, 224(%rdi) // movups %xmm0, 208(%rdi) // movups %xmm0, 192(%rdi) // movups %xmm0, 176(%rdi) // movups %xmm0, 160(%rdi) // movups %xmm0, 144(%rdi) // movups %xmm0, 128(%rdi) // movups %xmm0, 112(%rdi) // movups %xmm0, 96(%rdi) // movups %xmm0, 80(%rdi) // movups %xmm0, 64(%rdi) // movups %xmm0, 48(%rdi) // movups %xmm0, 32(%rdi) // movups %xmm0, 16(%rdi) // movups %xmm0, (%rdi) // ret for (i = 0; i < 256; i++) { dst[i] = '\0'; } }
I initially read the disassembly for this challenge as if it were Intel syntax,
but it's AT&T. The operation is simple: f
zeroes out a 256-byte buffer specified
by the first parameter.
Challenge #16
Only one disassembly is given for this challenge, and the description hints that it is from Clang: "[n]ow this is getting harder. Clang did a lot of optimization tricks and this code is heavily optimized for SSE2. Nevertheless, the original function is tiny and simple. What does it do?"
In all honesty, I don't think that a translation to C is helpful. As the problem
mentioned, there's heavy optimization for SSE2, and the assembly code only
tangentially corresponds to what (I believe) is going on. Instead, I'll attempt
to justify my partial conclusion that f
sums an array of integers.
Bytes from rdi
(indexed with rcx
) are interleaved into xmm0
and xmm2
with
pinsrw
, and continually added into xmm3
and xmm4
. Then, xmm0
and xmm2
are added,
and xmm1
is unpacked with punpckhqdq xmm1, xmm1
. The pseudocode for the
punpckhqdq
instruction is given as:
Destination[0..63] = Destination[64..127]; Destination[64..127] = Source[64..127];
So it's unusual to see xmm1
as both the "Source" and "Destination". This is
followed by a paddq xmm1, xmm0
.
I can't really confirm any of this because of the movdqa xmm1, xmmword ptr
[rip + .LCPI0_0]
instruction. Some sort of mask is being used in those pand
xmm0, xmm1
and pand xmm2, xmm1
instructions, and I suspect that it's one of many
tricks coming together so the function works for an array of integers, but we
aren't given .LCPI0_0
, so I can't tell for sure.
This also means that I can't assemble what's given. If anyone out there is experienced in SIMD and would to share some tips for making sense of this one, I'd really appreciate it.
Challenge #17
The description explains that "[t]his is a quite esoteric piece of code, but nevertheless, the task it does is very mundane and well-known to anyone. The function has 4 32-bit arguments and returns a 32-bit one."
int f(int a, int b, int c, int d) { int tmp1, tmp2; // sub edx, edi c -= a; // mov r8d, ecx // ... // sub r8d, esi tmp1 = d - b; // mov ecx, 63 d = 63; // mov eax, edx // sar eax, cl // and eax, edx tmp2 = (c >> (d & 0xff)) & c; // mov edx, r8d // sar edx, cl // ... // and edx, r8d c = (tmp1 >> (d & 0xff)) & tmp1; // add edi, eax a += tmp2; // add esi, edx // sub esi, edi b += c - a; // mov eax, esi // sar eax, cl // and eax, esi // add eax, edi // ret return ((b >> (d & 0xff)) & b) + a; }
The initial translation is quite messy, but observe that d
has a constant value
of 63
, and 63 & 0xff
is just 63
. Still, there are a number of snippets that look
like (c >> (d & 0xff)) & c
, and it isn't obvious what that does.
int black_box(int a) { return (a >> 63) & a; } int main(void) { int i; for (i = 0; i >= 0; i += 1) { if (black_box(i) != 0) { printf("black_box(%d) = %d\n", i, black_box(i)); } } for (i = 0; i <= 0; i -= 1) { if (black_box(i) != i) { printf("black_box(%d) = %d\n", i, black_box(i)); } } return 0; }
re.c: In function 'black_box': re.c:64:13: warning: right shift count >= width of type [-Wshift-count-overflow] return (a >> 63) & a; ^~
I'm not sure if this is the intended behavior, but on amd64, this acts as \(min(x, 0)\). A first step at simplification can be made.
#define MIN(a, b) (a < b ? a : b) int f(int a, int b, int c, int d) { a += MIN(c - a, 0); c = MIN(d - b, 0); b += c - a; return MIN(b, 0) + a; }
And this can be further cleaned up into a one-liner.
#define MIN(a, b) (a < b ? a : b) int f(int a, int b, int c, int d) { return a + \ MIN(b - a + MIN(d - b, 0) - MIN(c - a, 0), 0) + \ MIN(c - a, 0); }
And this happens to be an interesting implementation of \(min(a, b, c, d)\).
Challenge #18
For challenges with more complicated control flow, I've been drawing the basic blocks out on a sheet of paper and drawing arrows between them to identify which transitions represent loops, and which transitions represent conditionals. That didn't work particularly well for this challenge, though. The solution instead came to me instead by just staring at the disassembly for some time.
#include <ctype.h> #include <stdint.h> #include <stdlib.h> #include <string.h> int f3(char *a, uint64_t *b, uint64_t *c, uint64_t *d, uint64_t *e, uint64_t *f) { int i; char *cur; if (strlen(a) != 36) { return a; } cur = a; i = 0; while (i != 37) { if (i == 8 || i == 13 || i == 18 || i == 23) { if (*cur != '-') { return (char *) -1; } } else { if (i == 36 && *cur == '\0') { break; } if (!isxdigit(*cur)) { return (char *) -1; } } i++; cur++; } *b = strtoul(a, NULL, 16); *c = strtoul(a + 9, NULL, 16); *d = strtoul(a + 14, NULL, 16); *e = strtoul(a + 19, NULL, 16); *f = strtoull(a + 24, NULL, 16); return 0; }
The code appears to implement a UUID parser.
Challenge #19
This challenge was particularly difficult. I began by translating the individual
basic blocks to C, and connecting them with goto
.
#include <stdlib.h> #include <stdio.h> char *f2_bb(char *a, int b, int c, char *d) { // rax <- a // rbx <- a // QWORD PTR [rsp+24] <- d // QWORD PTR [rsp+16] <- b // QWORD PTR [rsp+8] <- c int i; int j; int k; int *ret; // f2: { // test rcx, rcx // jne .L21 if (c == 0) { // add rsp, 32 // pop rbx // ret return a; } goto BBL21; } // .L21: { BBL21: // lea rdi, [4+rcx*4] // ... // call malloc ret = malloc(c * 4 + 4); // ... // mov DWORD PTR [rax], -1 ret[0] = -1; // ... // mov DWORD PTR [rax+4], 0 ret[1] = 0; // mov r9d, 1 i = 1; // ... // cmp r9, rcx // je .L22 if (c == 1) { goto BBL22; } goto BBL8; } // .L8: { BBL8: // mov edi, DWORD PTR [rax+r9*4] // lea r8d, [rdi+1] // test r8d, r8d // mov DWORD PTR [rax+4+r9*4], r8d if ((ret[i + 1] = ret[i] + 1) <= 0) { // jle .L5 goto BBL5; } // movzx r10d, BYTE PTR [rdx+r9] // movsx r8, r8d // cmp r10b, BYTE PTR [rdx-1+r8] if (d[i] != d[ret[i + 1] - 1]) { // jne .L7 goto BBL7; } // jmp .L5 goto BBL5; } // .L7: { BBL7: // mov r8d, DWORD PTR [rax-4+r8*4] // add r8d, 1 // test r8d, r8d // mov DWORD PTR [rax+4+r9*4], r8d if ((ret[i + 1] = ret[ret[i + 1] - 1] + 1) > 0) { // jg .L23 goto BBL23; } goto BBL5; } // .L23: { BBL23: // movsx r8, r8d // cmp BYTE PTR [rdx-1+r8], r10b if (d[ret[i + 1] - 1] == ret[i]) { // je .L5 goto BBL5; } goto BBL7; } // .L5: { BBL5: // add r9, 1 // cmp r9, rcx if (++i != c) { // jne .L8 goto BBL8; } goto BBL22; } // .L22: { BBL22: // xor r8d, r8d // xor r10d, r10d // xor edi, edi i = j = k = 0; goto BBL9; } // .L9: { BBL9: // cmp rdi, rsi // jae .L24 if (j >= b) { goto BBL24; } goto BBL14; } // .L14: { BBL14: // test r8d, r8d if (i < 0) { // js .L10 goto BBL10; } // movsx r9, r8d // movzx r11d, BYTE PTR [rdx+r9] // cmp BYTE PTR [rbx+rdi], r11b if (d[i] == a[j]) { // je .L10 goto BBL10; } // cmp rdi, rsi // mov r8d, DWORD PTR [rax+r9*4] i = ret[i]; if (j < b) { // jb .L14 goto BBL14; } goto BBL24; } // .L10: { BBL10: // add r8d, 1 // add r10d, 1 // movsx rdi, r8d j = ++i; k++; // cmp rdi, rcx if (j == c) { // je .L18 goto BBL18; } // movsx rdi, r10d j = k; // jmp .L9 goto BBL9; } // .L18: { BBL18: // movsx r10, r10d // sub r10, rcx k -= c; // add rbx, r10 a += k; // jmp .L13 goto BBL13; } // .L24: { BBL24: // xor ebx, ebx a = NULL; goto BBL13; } // .L13: { BBL13: // mov rdi, rax // call free free(ret); // add rsp, 32 // mov rax, rbx // pop rbx // ret return a; } }
char *f2(char *a, int b, int c, char *d) { int i; int j; int k; int *ret; if (c == 0) { return a; } ret = malloc((c + 1) * sizeof(int)); ret[0] = -1; ret[1] = 0; i = 1; do { if ((ret[i + 1] = ret[i] + 1) > 0 && d[i] != d[ret[i] - 1]) { while ((ret[i + 1] = ret[ret[i + 1] - 1] + 1) > 0) { if (d[ret[i + 1] - 1] == ret[i]) { break; } } } } while (++i < c); i = j = k = 0; while (j < b) { if (i < 0 || d[i] == a[j]) { j = ++i; k++; if (j == c) { free(ret); return a + k - c; } j = k; } i = ret[i]; } free(ret); return NULL; }
This challenge was nightmarishly difficult. I plan to come back to it near the end of the year, but for now, consider this challenge incomplete. I've been banging my head against a wall trying to make sense of it for a number of weeks now, and I still don't have a good answer for what it does.
Challenge #20
Another challenge described as "easy." This time, it really is.
#include <stdlib.h> float f4() { return rand() * ((float) 4.65661287307739257813e-10); }
I did defer to an ISA reference for cvtsi2ss
and mulss
as I'm not particularly
familiar with x86's floating point instructions. This challenge also gave me an
opportunity to use float.exposed to turn .long 805306368
into a floating point
constant, but \(4.65661287307739257813 \cdot 10^{-10}\) isn't any more
comprehensible. The purpose of f4
is clearer when observing the output.
... 0.086556 0.535690 0.176955 0.791683 0.575702 0.418118 0.952373 ...
f4
returns a random floating point number on the range \([0, 1]\).
Challenge #21
I was able to complete the translation for this challenge in under five minutes, which I'm quite proud of.
#include <string.h> int f1(char *a, char *b) { // rbp <- a // rbx <- b int offset; // push r12 // push rbp // mov rbp, rdi // push rbx // mov rbx, rsi // call strlen // mov rdi, rbx // mov r12, rax // call strlen // sub r12, rax offset = strlen(a) - strlen(b); // mov rsi, rbx // lea rdi, [rbp+0+r12] // call strcmp // pop rbx // test eax, eax // pop rbp // sete al // pop r12 // ret return strcmp(a + offset, b) ? 1 : 0; }
f
is a simple "ends with" predicate function. It returns 1
if a
ends with the
substring b
.
Challenge #22
I stopped when I got to the // ...
. I'd figured it out by then, and -Os
made the
assembly for this quite messy.
int f2(int *a, int b, int *c, int d) { // rcx <- a[b] // eax <- d + 1 // ebp <- c[0] int i; // r8 int j; int *cur; i = d + 1; j = 0; while (j < i){ while (c[j + 1] <= c[0] && j < d) j++; cur = &a[i - 1]; while (*cur-- > c[0]); if (j < i) { c[j + 1] ^= *(cur + 1); *cur ^= *(cur + 1); c[j + 1] ^= *cur; } } c[0] ^= *(cur + 1); // ... return 0; } void f1(int *a, int b, int *c, int d) { int ret; while (b < d) { ret = f2(a, b, c, d); f1(a, b, c, ret - 1); b = ret + 1; } }
The hint for this challenge is that "[t]his can be tricky, but the algorithm is
well known and heavily used almost everywhere," which gave it away once I got to
the mess of xor
instructions. This is the XOR swap algorithm, extended so that
it reverses the contents of a
and c
.
Comment form