/advent

22 - Parsing Headers

Welcome to Day 22th of the Advent of Radare!

Today, we’re focusing on analyzing binary file formats, specifically by examining their headers and loading their data. Understanding the structure of a binary format provides valuable insights into its metadata, dependencies, and setup instructions, which can reveal important information about its behavior and compatibility.

Radare2’s i command provides basic information, while ih and iH allow us to explore parsed headers in detail. We can use the * suffix to create an r2 script to import all the header metadata into radare2 as flags (ih*) for easy navigation and referencing.

Additionally, we’ll explore rabin2, a powerful companion tool that enables binary manipulation, header modification, and even the creation of minimal executables with custom code. We’ll also learn how to use the ob command to manage multiple binary files in memory, which is particularly useful when analyzing complex programs with shared libraries or when comparing different versions of the same binary.

Generic vs Specific

We learned in previous posts that RBin aims to use generic concepts to describe all the information contained in the executable binaries. This is a design decision to ease the reusability of scripts and reduce the learning curve for users when start to work on new file formats or switch between different architectures.

But when refering to binary headers, we can’t be generic, because we want to understand the real structures with it’s custom attributes and fields.

We won’t usually need to mess with that, but when we need to, we will really love to be able to parse and manipulate them at our will, which may be handy for several applications:

Headers

Annotated Hexdump

In radare2, there are various ways to display labels, comments, and colorized regions in hexdumps, but we need to use specific commands to achieve this. Let’s start with manual annotations and then learn which commands can be used to import parsed metadata from headers.

NOTE: One of the key advantages of using radare2 for file format analysis is that you can modify any parser if you have the source code, or hook functions using Frida to generate logs in the form of r2 commands. These commands can add comments and label offsets, which can then be used for static analysis.

Let’s begin by trying the pxa command on a malloc://32. The -f flag sets the blocksize equal to the file size, allowing us to display the entire contents with any print command without needing to specify the size.

$ r2 -f malloc://32
[0x00000000]> xa
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00000000  0000 0000 0000 0000 0000 0000 0000 0000  ................
0x00000010  0000 0000 0000 0000 0000 0000 0000 0000  ................
[0x00000000]> q

At first the output of pxa will look the same as px. But hey! did you noticed that we typed xa? This is because x is the short alias of px. So for practical reasons you can save one char everytime you heXdump.

Let’s add a comment and a couple of flags and see what happens:

[0x00000000]> f hello @ 4
[0x00000000]> f world @ 20
[0x00000000]> CC my comment @ 24
[0x00000000]> xa
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
                      /hello
0x00000000  0000 0000 0000 0000 0000 0000 0000 0000  ................
                      /world    ;my comment
0x00000010  0000 0000 0000 0000 0000 0000 0000 0000  ................ ; my comment
[0x00000000]>

Now things look clearer, right? Another cool and barely known feature of flags is that you can you can assign a color to them, as well as a size and a comment. With these attributes the pxa output will adjust the bounds to colorize each flag in a proper way.

[0x00000000]> f?~color,comment
| f name 12 33 [cmt]        same as above + optional comment
| fc[?][name] [color]       set color for given flag
| fC [name] [cmt]           set comment for given flag

Header Annotations

Now we will learn how to import all the flags (f), comments (CC), formats (pf), and data definitions (Cd) into radare2 to make the headers look better in both pxa and pd views.

You may have noticed that most tools assume everything you read is data unless you specify it as code. Radare takes a different approach: when disassembling with pd, it assumes everything is code unless specified as data. You can mark regions as data using the Cd and Cf commands. To remove metadata annotations, you can use the C- command.

To load struct definitions and formats (pf) from rbin, you can use the pfo command.

[0x00000000]> pfo
bios.r2
cdex.h
dex.h
dex.r2
elf32.r2
elf64.r2
elf_enums.r2
fatmacho.r2
jni.h
macho.r2
mz.r2
ntfs.r2
pe32.r2
trx.r2
zip.r2
[0x00000000]>

The libr/bin/d directory from the radare2 source code contains a bunch of structs defined using the pf command. Read pf? and pf?? help messages to see the format string you can use to interpret data as code.

[0x00000000]> pfo elf32.r2
[0x00000000]> pf.
pf.elf_header ?[2]E[2]E[4]ExxxxN2N2N2N2N2N2 (elf_ident)ident (elf_type)type (elf_machine)machine (elf_obj_version)version entry phoff shoff flags ehsize phentsize phnum shentsize shnum shstrndx
pf.elf_ident [4]z[1]E[1]E[1]E.:: magic (elf_class)class (elf_data)data (elf_hdr_version)version
pf.elf_phdr [4]Exxxxx[4]Ex (elf_p_type)type offset vaddr paddr filesz memsz (elf_p_flags)flags align
pf.elf_shdr x[4]E[4]Exxxxxxx name (elf_s_type)type (elf_s_flags_32)flags addr offset size link info addralign entsize
[0x00000000]>

We have loaded the struct definitions, but we don’t know yet where those types must be applied in memory. To do that we will need to use the .ih* command that will set the flags and use Cf to bind them into specific offsets.

Header Information

Let’s take a look at the output of ih for an ELF file.

$ r2 test/bins/elf/ls
[0x00005ae0]> ih
0x00000000 0x00000000 0x464c457f ELF
0x00000010 0x00000010 0x00000003 Type
0x00000012 0x00000012 0x0000003e Machine
0x00000014 0x00000014 0x00000001 Version
0x00000018 0x00000018 0x00005ae0 EntryPoint
0x00000020 0x00000020 0x00000040 PhOff
0x00000028 0x00000028 0x00021368 ShOff
0x00000030 0x00000030 0x00000000 Flags
0x00000034 0x00000034 0x00000040 EhSize
0x00000036 0x00000036 0x00000038 PhentSize
0x00000038 0x00000038 0x0000000b PhNum
0x0000003a 0x0000003a 0x00000040 ShentSize
0x0000003c 0x0000003c 0x00000019 ShNum
0x0000003e 0x0000003e 0x00000018 ShrStrndx
[0x00005ae0]> iH
0x00000000  ELF64       0x464c457f
0x00000010  Type        0x0003
0x00000012  Machine     0x003e
0x00000014  Version     0x00000001
0x00000018  Entrypoint  0x00005ae0
0x00000020  PhOff       0x00000040
0x00000028  ShOff       0x00021368
0x00000030  Flags       0x00000000
0x00000034  EhSize      64
0x00000036  PhentSize   56
0x00000038  PhNum       11
0x0000003a  ShentSize   64
0x0000003c  ShNum       25
0x0000003e  ShrStrndx   24
[0x00005ae0]>

Comparing that to the output to a macho file, we will see how much different it is:

[0x100003a58]> ih
0x100000000 0x00000000 0x00000000 header; mach0_header
0x100000020 0x00000020 0x00000001 load_command_0_LC_SEGMENT_64; mach0_segment64
0x100000068 0x00000068 0x00000001 load_command_1_LC_SEGMENT_64; mach0_segment64
0x100000240 0x00000240 0x00000001 load_command_2_LC_SEGMENT_64; mach0_segment64
0x100000378 0x00000378 0x00000001 load_command_3_LC_SEGMENT_64; mach0_segment64
0x1000004b0 0x000004b0 0x00000001 load_command_4_LC_SEGMENT_64; mach0_segment64
0x100000518 0x00000518 0x00000001 load_command_7_LC_SYMTAB; mach0_symtab_command
0x100000530 0x00000530 0x00000001 load_command_8_LC_DYSYMTAB; mach0_dysymtab_command
0x100000580 0x00000580 0x00000001 load_command_9_LC_LOAD_DYLINKER; mach0_load_dylinker_command
...
[0x100003a58]> iH
pf.mach0_header @ 0x100000000
0x100000000  Magic       0xfeedfacf
0x100000004  CpuType     0x100000c
0x100000008  CpuSubType  0x80000002
0x10000000c  FileType    0x2
0x100000010  nCmds       20
0x100000014  sizeOfCmds  1712
...

If you’re familiar with these file formats, you’ll notice that the output is missing some fields and details. This occurs partly because certain details are automatically inferred from the pf types when loaded through .ih*. Additionally, radare2’s commands and features continuously evolve based on user requirements. If you’re interested in extending, learning about, or improving support for any of these file formats, I encourage you to examine them and contribute enhancements.

Let’s analyze the output of the script generated by ih* before we execute it.

[0x100003a58]> ih*
'fs+header
'f header.header 1 0x100000000
CCu base64:bWFjaDBfaGVhZGVy @ 0x100000000
Cf 1 mach0_header @ 0x100000000
'f header.load_command_0_LC_SEGMENT_64 1 0x100000020
'f header.load_command_0_LC_SEGMENT_64.value 1 0x00000001
CCu base64:bWFjaDBfc2VnbWVudDY0 @ 0x100000020
Cf -1 mach0_segment64 @ 0x100000020
'f header.load_command_1_LC_SEGMENT_64 1 0x100000068
'f header.load_command_1_LC_SEGMENT_64.value 1 0x00000001
CCu base64:bWFjaDBfc2VnbWVudDY0 @ 0x100000068
Cf -1 mach0_segment64 @ 0x100000068
'f header.load_command_2_LC_SEGMENT_64 1 0x100000240
'f header.load_command_2_LC_SEGMENT_64.value 1 0x00000001
...

Binding Types

When loading binaries in r2, the bin plugin specifies a default base address depending on the file type. This base address can be overridden with the -B command line flag and is also accessible through the bin.baddr eval variable.

In some situations, runtime linkers manipulate program headers or don’t map them in memory, requiring us to switch back to physical addressing mode (-e io.va=false) and seek to address zero to read them properly.

Alternatively, we can load the file using the -n command line flag to prevent rbin from parsing headers or loading any information from them.

Another interesting feature of radare2 is its ability to parse binaries from memory at any address. This means we can analyze a binary by specifying its location in a debugger process, firmware, or memory dump without having to extract it to a separate file on disk.

This is accomplished using the oba $$ command:

0$ r2 -n radare2/test/bins/elf/ls
[0x00000000]> ie              # no entrypoints, rbin data not loaded
[0x00000000]> ob              # we confirm that no bin files opened
[0x00000000]> oba $$
[0x00000000]> ob
* 0 3 x86-64 ba:0x00000000 sz:136038 ../radare2/test/bins/elf/ls
[0x00000000]> ie
paddr      vaddr      phaddr     vhaddr     type
――――――――――――――――――――――――――――――――――――――――――――――――
0x00005ae0 0x00005ae0 0x00000018 0x00000018 program
[0x00000000]>

With all the types we can view those values and manipulate them using wv4 or wv8 to change the entrypoint, version information, amount of elements in the program headers, etc

$ r2 bins/elf/ls
[0x00000000]> e io.va=0
[0x00000000]> pfo elf64.r2
[0x00000000]> .ih*
[0x00000014]> pd 4 @ header.Version
            ;-- header.Version:
            0x00000014      .dword 0x00000001
            ;-- header.EntryPoint:
            ;-- header.ShrStrndx.value:
            0x00000018      .qword 0x0000000000005ae0 ; entry0 ; rip ; header.EntryPoint.value
            ;-- header.PhOff:
            0x00000020      .qword 0x0000000000000040
[0x00000014]>

PaVa mode

The pava mode is another lesser-known feature of radare2 that can be particularly useful for users familiar with tools like IDA. This mode displays offsets in disassembly/hexdump views using virtual addresses while navigating through linear physical addresses.

To illustrate: if you seek to offset 0 in a binary that’s mapped to address 0x800000, the display will show it as if you were at address 0x800000.

This functionality is valuable because it provides a linear listing following the file’s physical data while rendering it with virtual addressing. This makes the disassembly more accurate and helps avoid inconsistencies or complications that might arise from virtual addressing and IO mappings.

-e io.pava=true
-e io.va=false

Challenge

Today’s challenge involves manipulating a binary to change its behavior by modifying the binary file headers. Practice using the new commands learned in this post and open the file using the r2 -nw command-line flags to patch the headers.

Spend some time identifying the entrypoint address specified in the binary header using the ih, ih*, and iH commands. Note that specifying the entrypoint differs between ELF, PE, and MACH-O files - each file format has its own way to tell the runtime linker how to execute the program. Familiarize yourself with these details and then use wv4 or wv8 to apply the changes.

Here’s a practical example:

Final Words

You have probably thought about many different ways to improve these workflows or spotted some bugs or inconsistencies in the output, depending on the version of radare you are using.

Having such feedback and submitting patches to improve support for various use cases in radare is a great way to learn and share those improvements with others.

Stay tuned for tomorrow’s Radare2 challenge as we continue our journey through advanced reverse engineering techniques!