Welcome to Day 23 of the Advent of Radare!
Today, we’ll explore how memory is configured across different systems and how Operating Systems manage memory mapping for binary executables and libraries covering everything from static to dynamic setups.
Radare2 features a sophisticated and powerful IO layer that supports various configurations through multiple setups, from static analysis, debugger backends, emulation for memory mapped devices when loading raw firmwares, and even simulating complex environments.
Understanding memory architecture is essential: distinguishing between code and data segments, identifying constant resources, locating stack and heap regions, and analyzing how programs interact with memory during execution. These concepts are fundamental for: Debugging applications, reverse engineering firmware, or even for exploit development.
The capabilities of the IO layer in radare2 is extended to work with remote virtualized environments thru specific IO connections like windbg, gdbremote or r2frida. As well as being able to setup esotheric configurations like non-8bit bytes, cyclic memory for non-standard data busses, memory banks, segmented memory registers, and more.
To be clear, you can’t find any alternative out there capable of any of this.
To begin with the topic we must understand how binaries are mapped in memory and the difference between segments and sections, radare2 unifies the terminology across different binary formats to represent the following elements:
Sections (iS
): These provide a
finer organization of the binary’s contents, primarily used for
debugging purposes. In ELF files, these are defined by Section Headers
(SHDR). While they contain useful labels like .text
or
.data
, their permissions and names aren’t required for
program execution.
Segments (iSS
): These are memory
regions that the runtime linker uses to load the program into memory. In
ELF files, these are defined by Program Headers (PHDR) and contain
essential information like memory permissions and load
addresses.
Radare2’s abstractions allows consistent analysis across different file formats, with commands to examine both segments and sections regardless of the binary type or the target OS, so we don’t need to learn new concepts everytime we start analyzing a binary from a different operating system.
As explained above, we have iS
and iSS
commands, both commands take different subcommands to tweak the output
into JSON, r2 commands, csv table, etc. And of course the same
information is also available via rabin2 -S
and
rabin2 -SS
.
This is the example output for our favourite ls
variant
on macOS.
$ r2 /bin/ls
[0x100003a58]> iSS
nth paddr size vaddr vsize perm flags type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x00010000 0x8000 0x100000000 0x8000 -r-x 0x0 MAP __TEXT
1 0x00018000 0x4000 0x100008000 0x4000 -rw- 0x0 MAP __DATA_CONST
2 0x0001c000 0x4000 0x10000c000 0x4000 -rw- 0x0 MAP __DATA
3 0x00020000 0x8000 0x100010000 0x8000 -r-- 0x0 MAP __LINKEDIT
[0x100003a58]> iS
nth paddr size vaddr vsize perm flags type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x000037f8 0x3bd0 0x1000037f8 0x3bd0 -r-x 0x0 REGULAR 0.__TEXT.__text
1 0x000073c8 0x540 0x1000073c8 0x540 -r-x 0x0 SYMBOL_STUBS 1.__TEXT.__auth_stubs
2 0x00007908 0x10d 0x100007908 0x10d -r-x 0x0 REGULAR 2.__TEXT.__const
3 0x00007a15 0x506 0x100007a15 0x506 -r-x 0x0 CSTRINGS 3.__TEXT.__cstring
4 0x00007f1c 0xe0 0x100007f1c 0xe0 -r-x 0x0 REGULAR 4.__TEXT.__unwind_info
5 0x00008000 0x2a0 0x100008000 0x2a0 -rw- 0x0 NONLAZY_POINTERS 5.__DATA_CONST.__auth_got
6 0x000082a0 0x30 0x1000082a0 0x30 -rw- 0x0 NONLAZY_POINTERS 6.__DATA_CONST.__got
7 0x000082d0 0x268 0x1000082d0 0x268 -rw- 0x0 REGULAR 7.__DATA_CONST.__const
8 0x0000c000 0x20 0x10000c000 0x20 -rw- 0x0 REGULAR 8.__DATA.__data
9 0x00000000 0x0 0x10000c020 0xb0 -rw- 0x0 ZEROFILL 9.__DATA.__common
10 0x00000000 0x0 0x10000c0d0 0x158 -rw- 0x0 ZEROFILL 10.__DATA.__bss
[0x100003a58]>
The iSS
command displays each segment, showing
attributes such as address, size, and permissions. This helps identify
which regions are allocated for code, data, or resources.
Permissions in memory regions may help us identify the purpose of each region too
By comparing iS
and iSS
, you can determine
which sections belong to which segments and understand the organization
of the binary’s memory layout. It’s important to note that sections are
primarily useful for analysis, while the runtime linker only requires
segment information to map the binary. A binary can still execute
properly even with stripped section headers, though this makes analysis
significantly more challenging.
Be aware that some binaries may intentionally manipulate section information to complicate analysis. To avoid being misled, analysts may need to take additional steps such as removing flags, specifying different string regions to scan, and other countermeasures.
Fortunately, Radare2 provides all the necessary tools to handle these situations without requiring binary modification.
The om
command lists the mapped memory regions for a
binary. RBin parsers use om
internally to specify which
regions of the binary should be loaded from physical offsets into
virtual memory addresses (only when -e io.va=true
is set),
along with their corresponding permissions.
To view the memory map, simply type om
without
arguments:
$ r2 /bin/ls
[0x100003a58]> om
* 5 fd: 3 +0x00010000 0x100000000 - 0x100007fff r-x fmap.__TEXT
- 4 fd: 3 +0x00018000 0x100008000 - 0x10000bfff r-- fmap.__DATA_CONST
- 3 fd: 3 +0x0001c000 0x10000c000 - 0x10000ffff r-- fmap.__DATA
- 2 fd: 3 +0x00020000 0x100010000 - 0x100015bff r-- fmap.__LINKEDIT
- 1 fd: 4 +0x00000000 0x100015c00 - 0x100017fff r-- mmap.__LINKEDIT
[0x100003a58]> o
3 * r-x 0x00025c00 /bin/ls
4 - r-- 0x00002400 null://9216
[0x100003a58]>
We can observe two numbers in the first columns that represent the “map id” and the “file descriptor” associated with the given map. This is important because as we can observe here, the RBin library will make use of the null uri handler to fill the gaps between regions that have larger virtual addressing size than the physical one.
THe following numbers can be read like this:
*
: selected/priorized memory map, when two maps
overlap this one will go first5
, 4, 3.. : map IDsfd: X
: file descriptor id (see o
output
to find out which file is associated with)+0x00010000
: starting physical offset from the fd0x100000000
: initial virtual address to map the
data0x100007fff
: last virtual address (using a closed
interval representation)r-x
: this page have read and executable
permissionsfmap.__TEXT
: name of the mapSee the output from om*
to find out how we can reproduce
the same setup using commands:
[0x100003a58]> om*
omu 4 0x100015c00 0x00002400 0x00000000 r-- mmap.__LINKEDIT
omu 3 0x100010000 0x00005c00 0x00020000 r-- fmap.__LINKEDIT
omu 3 0x10000c000 0x00004000 0x0001c000 r-- fmap.__DATA
omu 3 0x100008000 0x00004000 0x00018000 r-- fmap.__DATA_CONST
omu 3 0x100000000 0x00008000 0x00010000 r-x fmap.__TEXT
[0x100003a58]>
Let’s check the help to understand the arguments passed.
[0x100003a58]> om?~create
| om fd vaddr [size] [paddr] [rwx] [name] create new io map
[0x100003a58]> omu?
| omu fd va sz pa rwx name same as `om` but checks for existance (u stands for uniq)
[0x100003a58]>
NOTE that any numeric argument in radare2 commands is parsed using the RNum API, which means you can use mathematical operations, reference flags, and use numeric variables. Be careful not to use spaces, as they serve as argument delimiters.
When in the need for emulation we usually need to define a range for
the stack or at least a RAM that have permissions to read and write. The
aeim
will do all the black magic for us. But if we want to
understand how that works and what’s doing behind the scenes keep
reading in here!
The aeim
command have some options to define the size,
initial address and pattern used to fill. After initializing the memory
it will change the stack pointer register to be located in the middle of
it.
[0x00000000]> e~esil.stack
esil.stack.addr = 0x00100000
esil.stack.depth = 256
esil.stack.pattern = 0
esil.stack.size = 0x000f0000
First of all we will need to open a memory that covers all the portion we need. Let’s say, we need 1MB of writeable memory that is initialized to zeroes:
[0x100003a58]> onn malloc://1M
Note the following details:
onn
instead of o
will open the file,
without creating any map for it or parsing rbin info1M
is a valid number for RNum expressions, it’s an
alias for 1024K, which is 0x100000We will now observe that there’s a new file descriptor but no map created for it:
[0x100003a58]> o
3 * r-x 0x00025c00 /bin/ls
4 - r-- 0x00002400 null://9216
5 - rw- 0x00100000 malloc://1M
[0x100003a58]>
Let’s create a named map and put that in the right place for example, we want stack to be located at 0x600000
[0x100003a58]> om 5 0x600000 1M 0 rw- stack
[0x100003a58]> om~stack
- 6 fd: 5 +0x00000000 0x00600000 - 0x006fffff rw- stack
[0x100003a58]>
Before moving forward, let’s flag all those maps with
om**
:
[0x100003a58]> om** # check the flag names
f iomap.stack=0x600000
f iomap.fmap.__TEXT=0x100000000
f iomap.fmap.__DATA_CONST=0x100008000
f iomap.fmap.__DATA=0x10000c000
f iomap.fmap.__LINKEDIT=0x100010000
f iomap.mmap.__LINKEDIT=0x100015c00
[0x100003a58]> .om** # run the script
Now we just need to change the stack pointer register:
[0x100003a58]> ar PC=iomap.stack+0x1000
That memory is now filled with zeroes. But in case of analyzing a
firmware we sometimes can fill it with the contents of the real RAM or
the stack contents of the process which was previously dumped on a file.
Use wff
to write-from-file.
[0x100003a58]> wff stack.bin @ iomap.stack
As mentioned previously, the IO Maps are only used when working virtual addressing. If we disable them we will be just reading the data stored in the physical files loaded below.
This can be a problem when we have multiple files loaded because the address 0 will have different meanings depending on the file that was loaded last. And this is where priorizing file descriptors joins the game.
[0x100003a58]> e io.va=0
3 - r-x 0x00025c00 /bin/ls
4 - r-- 0x00002400 null://9216
5 * rw- 0x00100000 malloc://1M
[0x100003a58]> s 0
[0x00000000]> e io.va=false
[0x00000000]> op 5
[0x00000000]> x 32
- offset - 5859 5A5B 5C5D 5E5F 6061 6263 6465 6667 89ABCDEF01234567
0x00000000 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00000000 0000 0000 0000 0000 0000 0000 0000 0000 ................
[0x00000000]> op 4
[0x00000000]> x 32
- offset - 5859 5A5B 5C5D 5E5F 6061 6263 6465 6667 89ABCDEF01234567
0x00000000 ffff ffff ffff ffff ffff ffff ffff ffff ................
0x00000000 ffff ffff ffff ffff ffff ffff ffff ffff ................
[0x00000000]> op 3
[0x00000000]> x 32
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00000000 cafe babe 0000 0002 0100 0007 0000 0003 ................
0x00000010 0000 4000 0000 bbf0 0000 000e 0100 000c ..@.............
[0x00000000]>
When working with io.va, priorizing different file descriptors will
also change the selected binobject (see ob
command) which
will allow us to work with different binaries mapped in the same
address.
Some architectures like v850 or old s390 cpus have cyclic memory access, which means that only part of the addressing bits of the data bus are used to point to the physical memory.
So, imagine you can use 32 bit immediates, but only the 24 lower bits are connected. This means that every 1^24 we will be addressing the same exact memory. In the case of v850, the higher bits are used to determine if the memory is executable or writeable, that’s used as an optimization for some compilers to read from higher addresses using relative pointers to negative offsets.
Radare2 have two main features to support cyclic memory:
Some architectures solved the limited addressing problem caused by small data buses by using memory banks. A memory bank is a way to determine which
As an example of architectures using memory banks, we have the GameBoy (TM). Which used this technique to be able to load larger ROMs and even sometimes extend the RAM. Other old consoles like NES or Master System used this trick too.
In the case of GB, it was able to address 8KB of ram and 32KB of ROM, but with bank swapping it could use 32KB of ram and 1MB of rom.
In r2land we have the concept of io banks which are used to group a set of maps, these maps can be swapped.
Actually this feature can be applied to other use cases like simulating context switches between userland and kernel, so we can actually use iobanks when emulating a syscall to replace the userland process memory layout with another iobank that contains the kernel memory and code.
This is the help message of the io banks:
[0x00000000]> omb?
Usage: omb[+-adgq] [fd] Operate on memory banks
| omb list all memory banks
| omb [id] switch to use a different bank
| omb=[name] same as 'omb id' but using its name
| omb+ [name] create a new bank with given name
| omba [id] adds a map to the bank
| ombd [id] delete a map from the bank
| omb-* delete all banks
| omb- [mapid] delete the bank with given id
| ombg associate all maps to the current bank
| ombq show current bankid
[0x00000000]>
We’ve seen that it is possible to set a name to a map, but we can actually rename them or set a name at any time.
This can be achieved with the omn
command, which takes
the name as argument (map id is optional because it will pick the one
located in the current offset by default).
[0x00000000]> om?~name
| omn[?] ([mapid]) [name] manage map names
To analyze a binary’s memory in a running state, we can start the
process with Radare2’s debugger. Using the dm
command to
enumerate the memory maps, we can see how the runtime linker has
configured the memory layout. During debugging sessions, we will observe
that for Radare2, the whole address space is covered by a single map,
but it’s the underlying I/O handler that serves its contents according
to what it reads from the debugger interface APIs.
We will also observe that in some systems, the addresses will be
different on every execution due to ASLR (Address Space Layout
Randomization). However, in static analysis, binaries will always load
at the same address. To simulate this behavior in static analysis, we
can use the bin.aslr
variable, which is set to false by
default.
$ r2 -qcs -e bin.aslr=true /bin/ls
0x13003a58
$ r2 -qcs -e bin.aslr=true /bin/ls
0x5003a58
$ r2 -qcs -e bin.aslr=true /bin/ls
0x12003a58
$
On systems with ASLR (Address Space Layout Randomization), memory addresses are randomized each time a process is launched. As a result, we may not be able to use breakpoints, flags, or other absolute addresses because they will all depend on the base address of the module. This is used as a technique to not only annoy reverse engineers but, more importantly, to help prevent certain types of attacks.
Some operating systems allow disabling this feature, which can be
helpful for debugging purposes. In radare2, we can use the
rarun2
tool by passing the aslr=false
option
as an argument. For more details, refer to the manual page.
$ man rarun2
Today’s challenge is about experimenting with different memory
layouts. Open a binary with oon
(open without loading
maps), remove all the maps, and recreate them using the om
commands.
After that, try to create a fake stack similar to what
aeim
does, but by following the manual steps.
You will succeed if stepping through function calls or instructions that write data to the stack are emulated properly. Remember to check the register values before starting the first step!
Understanding memory regions in Radare2 involves examining segments,
sections, and mapped memory. Commands like iSS
,
iS
, om
, dm
, and dmm
provide insights into how a binary’s code and data are organized, both
statically and dynamically. By analyzing permissions and runtime
mappings, you can identify code regions, constants, and data sections
accurately, which enhances your analysis of the binary.
To learn more about how the IO layer works I encourage you to watch this training from latest #r2con2024
Stay tuned for tomorrow’s Radare2 post!