Welcome to Day 22th of the Advent of Radare!
Today, we’re focusing on analyzing binary file formats, specifically by examining their headers and loading their data. Understanding the structure of a binary format provides valuable insights into its metadata, dependencies, and setup instructions, which can reveal important information about its behavior and compatibility.
Radare2’s i
command provides basic information, while
ih
and iH
allow us to explore parsed headers
in detail. We can use the *
suffix to create an r2 script
to import all the header metadata into radare2 as flags
(ih*
) for easy navigation and referencing.
Additionally, we’ll explore rabin2
, a powerful companion
tool that enables binary manipulation, header modification, and even the
creation of minimal executables with custom code. We’ll also learn how
to use the ob
command to manage multiple binary files in
memory, which is particularly useful when analyzing complex programs
with shared libraries or when comparing different versions of the same
binary.
We learned in previous posts that RBin aims to use generic concepts to describe all the information contained in the executable binaries. This is a design decision to ease the reusability of scripts and reduce the learning curve for users when start to work on new file formats or switch between different architectures.
But when refering to binary headers, we can’t be generic, because we want to understand the real structures with it’s custom attributes and fields.
We won’t usually need to mess with that, but when we need to, we will really love to be able to parse and manipulate them at our will, which may be handy for several applications:
However, to analyze the file’s headers in detail, we’ll go deeper with ih and iH, which provide a breakdown of each header and its specific fields.
Parsing Detailed Header Information with ih and iH
The ih command displays parsed header fields in a structured format, including addresses, offsets, and types. This is particularly useful when working with complex binary formats, such as Mach-O or ELF, where headers contain detailed setup instructions for loading the binary.
In radare2, there are various ways to display labels, comments, and colorized regions in hexdumps, but we need to use specific commands to achieve this. Let’s start with manual annotations and then learn which commands can be used to import parsed metadata from headers.
NOTE: One of the key advantages of using radare2 for file format analysis is that you can modify any parser if you have the source code, or hook functions using Frida to generate logs in the form of r2 commands. These commands can add comments and label offsets, which can then be used for static analysis.
Let’s begin by trying the pxa
command on a malloc://32.
The -f
flag sets the blocksize equal to the file size,
allowing us to display the entire contents with any print command
without needing to specify the size.
$ r2 -f malloc://32
[0x00000000]> xa
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00000000 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00000010 0000 0000 0000 0000 0000 0000 0000 0000 ................
[0x00000000]> q
At first the output of pxa
will look the same as
px
. But hey! did you noticed that we typed xa
?
This is because x
is the short alias of px
. So
for practical reasons you can save one char everytime you heXdump.
Let’s add a comment and a couple of flags and see what happens:
[0x00000000]> f hello @ 4
[0x00000000]> f world @ 20
[0x00000000]> CC my comment @ 24
[0x00000000]> xa
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
/hello
0x00000000 0000 0000 0000 0000 0000 0000 0000 0000 ................
/world ;my comment
0x00000010 0000 0000 0000 0000 0000 0000 0000 0000 ................ ; my comment
[0x00000000]>
Now things look clearer, right? Another cool and barely known feature
of flags is that you can you can assign a color to them, as well as a
size and a comment. With these attributes the pxa
output
will adjust the bounds to colorize each flag in a proper way.
[0x00000000]> f?~color,comment
| f name 12 33 [cmt] same as above + optional comment
| fc[?][name] [color] set color for given flag
| fC [name] [cmt] set comment for given flag
Now we will learn how to import all the flags (f
),
comments (CC
), formats (pf
), and data
definitions (Cd
) into radare2 to make the headers look
better in both pxa
and pd
views.
You may have noticed that most tools assume everything you read is
data unless you specify it as code. Radare takes a different approach:
when disassembling with pd
, it assumes everything is code
unless specified as data. You can mark regions as data using the
Cd
and Cf
commands. To remove metadata
annotations, you can use the C-
command.
To load struct definitions and formats (pf
) from rbin,
you can use the pfo
command.
[0x00000000]> pfo
bios.r2
cdex.h
dex.h
dex.r2
elf32.r2
elf64.r2
elf_enums.r2
fatmacho.r2
jni.h
macho.r2
mz.r2
ntfs.r2
pe32.r2
trx.r2
zip.r2
[0x00000000]>
The libr/bin/d
directory from the radare2 source code
contains a bunch of structs defined using the pf
command.
Read pf?
and pf??
help messages to see the
format string you can use to interpret data as code.
[0x00000000]> pfo elf32.r2
[0x00000000]> pf.
pf.elf_header ?[2]E[2]E[4]ExxxxN2N2N2N2N2N2 (elf_ident)ident (elf_type)type (elf_machine)machine (elf_obj_version)version entry phoff shoff flags ehsize phentsize phnum shentsize shnum shstrndx
pf.elf_ident [4]z[1]E[1]E[1]E.:: magic (elf_class)class (elf_data)data (elf_hdr_version)version
pf.elf_phdr [4]Exxxxx[4]Ex (elf_p_type)type offset vaddr paddr filesz memsz (elf_p_flags)flags align
pf.elf_shdr x[4]E[4]Exxxxxxx name (elf_s_type)type (elf_s_flags_32)flags addr offset size link info addralign entsize
[0x00000000]>
We have loaded the struct definitions, but we don’t know yet where
those types must be applied in memory. To do that we will need to use
the .ih*
command that will set the flags and use
Cf
to bind them into specific offsets.
Let’s take a look at the output of ih
for an ELF
file.
$ r2 test/bins/elf/ls
[0x00005ae0]> ih
0x00000000 0x00000000 0x464c457f ELF
0x00000010 0x00000010 0x00000003 Type
0x00000012 0x00000012 0x0000003e Machine
0x00000014 0x00000014 0x00000001 Version
0x00000018 0x00000018 0x00005ae0 EntryPoint
0x00000020 0x00000020 0x00000040 PhOff
0x00000028 0x00000028 0x00021368 ShOff
0x00000030 0x00000030 0x00000000 Flags
0x00000034 0x00000034 0x00000040 EhSize
0x00000036 0x00000036 0x00000038 PhentSize
0x00000038 0x00000038 0x0000000b PhNum
0x0000003a 0x0000003a 0x00000040 ShentSize
0x0000003c 0x0000003c 0x00000019 ShNum
0x0000003e 0x0000003e 0x00000018 ShrStrndx
[0x00005ae0]> iH
0x00000000 ELF64 0x464c457f
0x00000010 Type 0x0003
0x00000012 Machine 0x003e
0x00000014 Version 0x00000001
0x00000018 Entrypoint 0x00005ae0
0x00000020 PhOff 0x00000040
0x00000028 ShOff 0x00021368
0x00000030 Flags 0x00000000
0x00000034 EhSize 64
0x00000036 PhentSize 56
0x00000038 PhNum 11
0x0000003a ShentSize 64
0x0000003c ShNum 25
0x0000003e ShrStrndx 24
[0x00005ae0]>
Comparing that to the output to a macho file, we will see how much different it is:
[0x100003a58]> ih
0x100000000 0x00000000 0x00000000 header; mach0_header
0x100000020 0x00000020 0x00000001 load_command_0_LC_SEGMENT_64; mach0_segment64
0x100000068 0x00000068 0x00000001 load_command_1_LC_SEGMENT_64; mach0_segment64
0x100000240 0x00000240 0x00000001 load_command_2_LC_SEGMENT_64; mach0_segment64
0x100000378 0x00000378 0x00000001 load_command_3_LC_SEGMENT_64; mach0_segment64
0x1000004b0 0x000004b0 0x00000001 load_command_4_LC_SEGMENT_64; mach0_segment64
0x100000518 0x00000518 0x00000001 load_command_7_LC_SYMTAB; mach0_symtab_command
0x100000530 0x00000530 0x00000001 load_command_8_LC_DYSYMTAB; mach0_dysymtab_command
0x100000580 0x00000580 0x00000001 load_command_9_LC_LOAD_DYLINKER; mach0_load_dylinker_command
...
[0x100003a58]> iH
pf.mach0_header @ 0x100000000
0x100000000 Magic 0xfeedfacf
0x100000004 CpuType 0x100000c
0x100000008 CpuSubType 0x80000002
0x10000000c FileType 0x2
0x100000010 nCmds 20
0x100000014 sizeOfCmds 1712
...
If you’re familiar with these file formats, you’ll notice that the
output is missing some fields and details. This occurs partly because
certain details are automatically inferred from the pf
types when loaded through .ih*
. Additionally, radare2’s
commands and features continuously evolve based on user requirements. If
you’re interested in extending, learning about, or improving support for
any of these file formats, I encourage you to examine them and
contribute enhancements.
Let’s analyze the output of the script generated by ih*
before we execute it.
[0x100003a58]> ih*
'fs+header
'f header.header 1 0x100000000
CCu base64:bWFjaDBfaGVhZGVy @ 0x100000000
Cf 1 mach0_header @ 0x100000000
'f header.load_command_0_LC_SEGMENT_64 1 0x100000020
'f header.load_command_0_LC_SEGMENT_64.value 1 0x00000001
CCu base64:bWFjaDBfc2VnbWVudDY0 @ 0x100000020
Cf -1 mach0_segment64 @ 0x100000020
'f header.load_command_1_LC_SEGMENT_64 1 0x100000068
'f header.load_command_1_LC_SEGMENT_64.value 1 0x00000001
CCu base64:bWFjaDBfc2VnbWVudDY0 @ 0x100000068
Cf -1 mach0_segment64 @ 0x100000068
'f header.load_command_2_LC_SEGMENT_64 1 0x100000240
'f header.load_command_2_LC_SEGMENT_64.value 1 0x00000001
...
When loading binaries in r2, the bin plugin specifies a default base
address depending on the file type. This base address can be overridden
with the -B
command line flag and is also accessible
through the bin.baddr
eval variable.
In some situations, runtime linkers manipulate program headers or
don’t map them in memory, requiring us to switch back to physical
addressing mode (-e io.va=false
) and seek to address zero
to read them properly.
Alternatively, we can load the file using the -n
command
line flag to prevent rbin from parsing headers or loading any
information from them.
Another interesting feature of radare2 is its ability to parse binaries from memory at any address. This means we can analyze a binary by specifying its location in a debugger process, firmware, or memory dump without having to extract it to a separate file on disk.
This is accomplished using the oba $$
command:
0$ r2 -n radare2/test/bins/elf/ls
[0x00000000]> ie # no entrypoints, rbin data not loaded
[0x00000000]> ob # we confirm that no bin files opened
[0x00000000]> oba $$
[0x00000000]> ob
* 0 3 x86-64 ba:0x00000000 sz:136038 ../radare2/test/bins/elf/ls
[0x00000000]> ie
paddr vaddr phaddr vhaddr type
――――――――――――――――――――――――――――――――――――――――――――――――
0x00005ae0 0x00005ae0 0x00000018 0x00000018 program
[0x00000000]>
With all the types we can view those values and manipulate them using
wv4
or wv8
to change the entrypoint, version
information, amount of elements in the program headers, etc
$ r2 bins/elf/ls
[0x00000000]> e io.va=0
[0x00000000]> pfo elf64.r2
[0x00000000]> .ih*
[0x00000014]> pd 4 @ header.Version
;-- header.Version:
0x00000014 .dword 0x00000001
;-- header.EntryPoint:
;-- header.ShrStrndx.value:
0x00000018 .qword 0x0000000000005ae0 ; entry0 ; rip ; header.EntryPoint.value
;-- header.PhOff:
0x00000020 .qword 0x0000000000000040
[0x00000014]>
The pava mode is another lesser-known feature of radare2 that can be particularly useful for users familiar with tools like IDA. This mode displays offsets in disassembly/hexdump views using virtual addresses while navigating through linear physical addresses.
To illustrate: if you seek to offset 0 in a binary that’s mapped to address 0x800000, the display will show it as if you were at address 0x800000.
This functionality is valuable because it provides a linear listing following the file’s physical data while rendering it with virtual addressing. This makes the disassembly more accurate and helps avoid inconsistencies or complications that might arise from virtual addressing and IO mappings.
-e io.pava=true
-e io.va=false
Today’s challenge involves manipulating a binary to change its
behavior by modifying the binary file headers. Practice using the new
commands learned in this post and open the file using the
r2 -nw
command-line flags to patch the headers.
-n
: don’t parse the headers-w
: open in write modeSpend some time identifying the entrypoint address specified in the
binary header using the ih
, ih*
, and
iH
commands. Note that specifying the entrypoint differs
between ELF, PE, and MACH-O files - each file format has its own way to
tell the runtime linker how to execute the program. Familiarize yourself
with these details and then use wv4
or wv8
to
apply the changes.
Here’s a practical example:
/bin/ls
binaryoo+
to reopen the file in read-writee io.va=0
to.ih*
wv8 fcnaddr
to patch the entrypointYou have probably thought about many different ways to improve these workflows or spotted some bugs or inconsistencies in the output, depending on the version of radare you are using.
Having such feedback and submitting patches to improve support for various use cases in radare is a great way to learn and share those improvements with others.
Stay tuned for tomorrow’s Radare2 challenge as we continue our journey through advanced reverse engineering techniques!