Title : Binary Mangling with Radare
Author : pancake
==Phrack Inc.==
Volume 0x0d, Issue 0x42, Phile #0x0E of 0x11
|=-----------------------------------------------------------------------=|
|=-----------------=[ manual binary mangling with radare ]=--------------=|
|=-----------------------------------------------------------------------=|
|=--------------------=[ by pancake <at> nopcode.org> ]=-----------------=|
|=-----------------------------------------------------------------------=|
1 - Introduction
1.1 - The framework
1.2 - First steps
1.3 - Base conversions
1.4 - The target
2 - Injecting code in ELF
2.1 - Resolving register based branches
2.2 - Resizing data section
2.3 - Basics on code injection
2.4 - Mmap trampoline
2.4.1 - Call trampoline
2.4.2 - Extending trampolines
3 - Protections and manipulations
3.1 - Trashing the ELF header
3.2 - Source level watermarks
3.3 - Ciphering .data section
3.4 - Finding differences in binaries
3.5 - Removing library dependencies
3.6 - Syscall obfuscation
3.7 - Replacing library symbols
3.8 - Checksumming
4 - Playing with code references
4.1 - Finding xrefs
4.2 - Blind code references
4.3 - Graphing xrefs
4.4 - Randomizing xrefs
5 - Conclusion
6 - Future work
7 - References
8 - Greetings
--[ 1 - Introduction
Reverse engineering is something usually related to w32 environments where
there is lot of non-free software and where the use of protections is more
extended to enforce evaluation time periods or protect intellectual (?)
property, using binary packing and code obfuscation techniques.
These kind of protections are also used by viruses and worms to evade
anti-virus engines in order to detect sandboxes. This makes reverse
engineering a double-edged sword.
The way to do low level analysis does usually involve the development and
use of a lot of small specific utilities to gather information about the
contents of firmware or a disk image to carve for keywords and dump its
blocks.
Actually we have a wide variety of software and hardware platforms and
reverse engineering tools should reflect this.
Obviously, reverse engineering, cracking and such are not only related to
legal tasks. Crackmes, hacking challenges and learning for the heck of it
are also good reasons to play with binaries. For us, there is a simple
reason behind them all: fun.
--[ 1.1 - The framework
At this point, we can imagine a framework that combines some of the basics
of *nix, offering a set of actions over an abstracted IO layer.
Following these premises I started to write a block based hexadecimal
editor. It allows to seamlessly open a disc device, file, socket, serial
or process. You can then both automatize processing actions by scripts and
perform manual interactive analysis.
The framework is composed of various small utilities that can be easily
integrated between them by using pipes, files or an API:
radare: the entrypoint for everything :)
rahash: block based hashing utility
radiff: multiple binary diffing algorithms
rabin: extract information from binaries (ELF,MZ,CLASS,MACH0..)
rasc: shellcode construction helper
rasm: commandline assembler/disassembler
rax: inline multiple base converter
xrefs: blind search for relative code references
The magic of radare resides in the ortogonality of commands and metadata
which makes easy to implement automatizations.
Radare comes with a native debugger that works on many os/arches and offers
many interesting commands to ease the analysis of binaries without having
the source.
Another interesting feature is supporting high level scripting languages
like python, ruby or lua. Thanks to a remote protocol that abstracts the
IO, it is allowed the use of immunity debugger, IDA, gdb, bochs, vmware and
many other debugging backends only writing a few lines of script or even
using the remote GDB protocol.
In this article I will introduce various topics. It is impossible to
explain everything in a single article.
If you want to learn more about it I recommend you to pull the latest hg
tip and read the online documentation at:
http://www.radare.org [0]
I have also written a book [1] in pdf and html available at:
http://www.radare.org/get/radare.pdf
--[ 1.2 - First steps
To ease the reading of the article I will first introduce the use of radare
with some of its basics.
Before running radare I recommend to setup the ~/.radarerc with this:
e scr.color=1 ; enable color screen
e file.id=1 ; identify files when opened
e file.flag=1 ; automatically add bookmarks (flags) to syms, strings..
e file.analyze=1 ; perform basic code analysis
Here's a list of points that will help us to better understand how radare
commands are built.
- each command is identified by a single letter
- subcommands are just concatenated characters
- e command stands for eval and is used to define configuration variables
- comments are defined with a ";" char
This command syntax offers a flexible input CLI that allows to perform
temporal seeks, repeat commands multiple times, iterate over file flags,
use real system pipes and much more.
Here are some examples of commands:
pdf @@ sym. ; run "pdf" (print disassembled function) over all
; flags matching "sym."
wx cc @@.file ; write 0xCC at every offset specified in file
. script ; interpret file as radare commands
b 128 ; set block size to 128 bytes
b 1M ; any math expression is valid here (hex, dec, flags, ..)
x @ eip ; dump "block size" bytes (see b command) at eip
pd | grep call ; disassemble blocksize bytes and pipe to system's grep
pd~call ; same as previous but using internal grep
pd~call[0] ; get column 0 (offset) after grepping calls in disasm
3!step ; run 3 steps (system is handled by the wrapped IO)
!!ls ; run system command
f* > flags.txt ; dump all flags as radare commands into a file
--[ 1.3 - Base conversions
rax is an utility that converts values between bases. Given an hexadecimal
value it returns the decimal conversion and viceversa.
It is also possible to convert raw streams or hexpair strings into
printable C-like strings using the -s parameter:
$ rax -h
Usage: rax [-] | [-s] [-e] [int|0x|Fx|.f|.o] [...]
int -> hex ; rax 10
hex -> int ; rax 0xa
-int -> hex ; rax -77
-hex -> int ; rax 0xffffffb3
float -> hex ; rax 3.33f
hex -> float ; rax Fx40551ed8
oct -> hex ; rax 035
hex -> oct ; rax Ox12 (O is a letter)
bin -> hex ; rax 1100011b
hex -> bin ; rax Bx63
-e swap endianness ; rax -e 0x33
-s swap hex to bin ; rax -s 43 4a 50
- read data from stdin until eof
With it we can convert a raw file into a list of hexpairs:
$ cat /bin/true | rax -
There is an inverse operation for every input format, so we can do
things like this:
$ cat /bin/true | rax - | rax -s - > /tmp/true
$ md5sum /bin/true /tmp/true
20c1598b2f8a9dc44c07de2f21bcc5a6 /bin/true
20c1598b2f8a9dc44c07de2f21bcc5a6 /tmp/true
rax(1) can be also used in interactive mode by reading lines from stdin.
--[ 1.4 - The target
This article aims to explain some practical cases where an scriptable
hexadecimal editor can trivially solve and provide new perspectives to
solve reverse engineering quests.
We will take a x86 [2] GNU/Linux ELF [3] binary as a main target. Radare
supports many architectures (ppc, arm, mips, java, msil..) and operating
systems (osx, gnu/ linux, solaris, w32, wine, bsd), You should take in mind
that commands and debugger features explained in this article can be
applied to any other platform than gnu/linux-x86.
In this article we will focus on the most widely-known platform
(gnu/linux-x86) to ease the reading.
Our victim will suffer various manipulation stages to make reverse
engineering harder keeping functionality intact.
The assembly snippets are written in AT&T syntax [4], which I think it's
more coherent than Intel so you will have to use GAS, which is a nice multi
architecture assembler.
Radare2 has a full API with pluggable backends to assemble and disassemble
for many architectures supporting labels and some basic assembler
directives. This is done in 'rasm2' which is a reimplementation of rasm
using libr.
However, this article covers the more stable radare1 and will only use the
'rasm' command to assemble simple instructions instead of complete
snippets.
The article exposes multiple situations where low level analysis and
metadata processing will help us to transform part of the program or simply
obtain information from a binary blob.
Usually the best program to start learning radare is the GNU 'true', which
can be completely replaced with two asm instructions, leaving the rest of
the code to play with. However it is not a valid target for this article,
because we are interested in looking for complex constructions to hide
control flow with call trampolines, checksumming artifacts, ciphered code,
relocations, syscall obfuscation, etc.
--[ 2 - Injecting code in ELF
The first question that pops up in our minds is: How the hell we will add
more code in an already compiled program?
What real packers do is to load the entire program structures in memory for
in-memory manipulation, handling all the code relocations and generating a
completely brand new executable.
Program transformation is not an easy task. It requires a complete analysis
which sometimes is impossible to do automatically, due to the need to mix
the results of static, dynamic and manual code analysis.
Instead of this approach, we will do it in a bottom to top way, taking low
level code structures as units and doing transformations without the need
to understand the whole program.
This enables ensuring that transformations won't break the program in any
way because their local and internal dependencies will be easy identified.
Because the target program is generated by a compiler we can make some
assumptions. For instance, there will be a place to inject code, the
functions will be sequentially defined, access to variables will be
easier to track and calling conventions will be the same along all the
program.
Our first action will be to find and identify all the parts of the text
section with unused or noppable code.
'noppable' code is defined as code that is never executed, or that can be
replaced by a smaller implementation, taking benefit of the non-overwritten
bytes.
Compiler-related stubs can be sometimes noppable or replaced, and more of
this is found in non-C native-compiled programs (haskell, C++, ...)
The spaces between functions where no code is executed are called 'function
paddings'. They exist because the compiler tried to optimize the position
of the functions at aligned addresses to ease to job on the cpu.
Some other compiler-related optimizations will inline the same piece of
code multiple times. By identifying such patterns, we can patch the
spaghetti code into a single loop with an end branch and use the rest for
our own purposes.
Following paragraphs will verbosely explain those situations.
Never executed code:
We can find this code by doing many execution traces of the program and
finding the uncovered ranges. This option will probably require human
interaction.
Using static code analysis we would not be able to follow runtime pointer
calculations or call tables. Therefore, this method will also require some
human interaction and, in the event we get to identify a pattern, we can
write a script to automatize this task and add N code xrefs from a single
source address.
We can also emulate some parts to resolve register values before reaching
a register based call/branch instruction.
Inlined code and unrolled loops:
Programmers and compilers usually prefer to inline (duplicate the code of
a external function in the middle of another one) to reduce the execution
cost of a 'call' instruction.
Loops are also unrolled and this produces longer, but more optimal code
because the cpu doesn't need to compute unnecessary branches all the
time.
These are common speedup practices, and they are a good target to place
our code. This process is done in four steps:
- Find/identify inline code or unrolled loops (repeated similar blocks)
- Patch to uninline or re-roll the loop
- ...
- Profit!
Radare provides different commands to help to find/identify such kind of
code blocks. The related commands are not going to do all the job
automatically. They require some automatization, scripting or manual
interaction.
The basic approach to identify such kind of code is by finding repeated
sequences of bytes allowing a percentage of similarity. Most inline code
will come from small functions and unrolled loops will be probably small
too.
There are two basic search commands that can help on this:
[0xB7F13810]> /?
...
/p len ; search pattern of length 'len'
/P count ; search patterns of size $$b matching at >N bytes of curblock
...
The first command (/p) will find repeated bytes of a specified length.
The other one (/P) will find a block that matches at least N bytes of the
current blocksize compared to the current one.
These pattern search must be restricted to certain ranges, like the .text
section or the function boundaries:
The search should be restricted to the text section:
> e search.from = section._text
> e search.to = section._text_end
All search commands (/) can be configured by the 'search.' eval
variables.
The 'section.' flag names are defined by 'rabin'. Calling rabin with
the -r flag will dump the binary information to stdout as radare commands.
This is done at radare startup when file.id and file.flag are enabled,
but this information can be reimported from the shell by calling:
> .!!rabin -rS ${FILE}
This command can be understood as a dot command '.', which interprets the
output of the following command '!' as radare commands. In the example
below there are two admiration marks (!!). This is because the single '!'
will run the command through the io subsystem of radare, falling in the
debugger layer if trying to run a program in the shell which has a name
that equals a debugger command's.
The double '!!' is used to directly escape from the io subsystem and run
the program directly in the system.
This command can be used to externalize some scripting facilities by
running other programs, processing data, and feeding radare's core back
with dynamically generated commands.
Once all the inlined code is identified, search hits should be analyzed,
discarding false positives and making all those blocks work as subroutines
using a call/ret pattern, nopping the rest of the bytes. Unused bytes can
now be filled with any code.
Another way to identify inlined code can be done by splitting a function
code analysis into basic blocks without splitting nodes (e
graph.split=false) and finding similar basic blocks or repeated.
Unrolled loops are based on repeated sequences of code with small
variations based on the iterator variable which must change along the
repeated blocks in a progressive way (incremental, decremental, etc).
Language bindings are recommended for implementing such kind of tasks,
not only because the higher language representation, but also thanks to
the high level APIs for code analysis that are distributed with radare
for python and other scripting languages (perl, ruby, lua).
The '/' command is used to perform searches. A radare command can be
defined to be executed every time a hit is found, or we can later use
the '@@' foreach iterator can be used over all the flags prefixed with
'hit0' later.
See '/?' for help on search-related commands but, in short, there are
commands to search in plain text, hexadecimal, apply binary masks to
keywords, launch multiple keywords at a time, define keywords in a file,
search+replace, and more.
One of the search algorithms accessible through the '/' command is called
'/p', which performs a pattern search. The command accepts a numeric value
which determines the minimum size for a pattern to be resolved.
Patterns are identified by a series of consecutive bytes, with a minimum
length, that appear more than once. The output will return a pattern id,
the pattern bytes and a list of offsets where the pattern is found.
This kind of search algorithm can be useful for analyzing huge binary
files like firmwares, disc images or uncompressed pictures.
In order to get a global overview of a big file the 'pO' command can be
used, which will display the full file represented in 'blocksize' bytes
in hexadecimal. Each byte will represent the number of printable
characters in the block, the entropy calculation, the number of
functions identified there, the number of executed opcodes, or whatever
is set at the 'zoom.byte' eval variable.
zoom view real file
+-----------+ +----------+
| 0 |--------------| 0 |
| 1 | '-.__ | ... |
| 2 | '-.__ | |
| 3 | '-.| 128 | (filesize/blocksize)
| ... | +----------+
The number of bytes of the real file represented in a byte of the
zoom view is the quotient between the filesize and the blocksize.
Function padding and preludes:
Compilers usually pad functions with nop instructions to align the
following one. They are usually small, but sometimes they are big enough
to store trampolines that will help obfuscate the execution flow.
In the 'extending trampolines' chapter I will explain a technique to hide
function preludes inside a 'call trampoline' which runs the initial code
of each function and then jumps back to the next instruction.
By using call tables loaded in .data or memory it will be harder to follow
for the reverser, because the program can lose all function preludes and
footers so that it'll be read as a single blob.
Compiler stubs:
Function preludes and calling conventions can be used as watermarks to
identify the compiler used. But when compiler optimizations are enabled
the generated code will be heavily transformed and those preludes will
probably be lost.
Some new versions of gcc are using a new function prelude aligning the
stack before entering the code to make memory access by the cpu on local
variables. But we can replace them with a shorter hand made one adapted
to the concrete function needs and this way ripping some more bytes for
our own fun.
The entrypoint embeds a system dependent loader that acts as a launcher
for the constructor/main/destructor. We can analyze the constructor and
the destructor to check if they are void.
Then, we can drop all this code by just adding a branch to the main
pointer, getting about 190 bytes for a C program. You will probably find
much more unused code in programs written in D, C++, haskell or any other
high level language.
This is how a common gcc's entrypoint looks like:
0x08049a00, / xor ebp, ebp
0x08049a02 | pop esi
0x08049a03 | mov ecx, esp
0x08049a05 | and esp, 0xf0
0x08049a08, | push eax
0x08049a09 | push esp
0x08049a0a | push edx
0x08049a0b | push dword 0x805b630 ; sym.fini
0x08049a10, | push dword 0x805b640 ; sym.init
0x08049a15 | push ecx
0x08049a16 | push esi
0x08049a17 | push dword 0x804f4e0 ; sym.main
0x08049a1c, | call 0x80495b4 ; 1 = imp.__libc_start_main
0x08049a21 \ hlt
By pushing 0's instead of fini/init pointers we can ignore the
constructor and destructor functions of the program, most of the programs
will work with no problems without them.
By analyzing code in both (sym.init and sym.fini) pointers we can
determine the number of bytes:
[0x080482C0]> af @ sym.__libc_csu_fini~size
size = 4
[0x080482C0]> af @ sym.__libc_csu_init~size
size = 89
Tracing
Execution traces are wonderful utilities to easily cleanup the view
of a binary by identifying parts of it as tasks or areas. We can for
example make a fast identification of the GUI code by starting a
trace and graphically clicking on all the menus and visual options
of the target application.
The most common problem to tracing is the low performance of a !stepall
operation, which mainly gets stupidly slow on loops, reps and tracing
already traced code blocks. To speedup this process, radare implements
a "touchtrace" method (available under the !tt command) which implements
an idea of MrGadix (thanks!)
This method consist in keeping a swapped copy of the targeted process
memory space on the debugger side and replacing it with
software breakpoint instructions. On intel we would use CC (aka int3).
Once the program runs, it raises an exception because of the trap
instruction, which is handled by the debugger. Then the debugger checks
if the instruction at %eip has been swapped and replaces the userspace
one, storing information about this step (like timestamp, number of times
it has been executed and step counter).
Once the instruction has been replaced, the debugger instructs the
process to continue execution. Each time an unswapped instruction is
executed the debugger replaces it.
At the end (giving an end-breakpoint) the debugger have a buffer with
enough information to determine the executed ranges. Using this easy
method we can easily skip all the reps, loops and already analyzed
code blocks, speeding up the binary analysis.
Using the "at" command (which stands for analyze traces) it is possible
to take statistics or information about the traces and, for example..
serialize the program and unroll loops.
We can also use !trace, giving a certain debug level as numeric argument,
to log register changes, executed instructions, buffers, stack changes,
etc..
After these steps we can subtract the resulting ranges against the .text
section and get how many bytes we can use to inject code.
If the resulting space is not enough for our purposes we will have to face
section resizing to put our code. This is explained later.
Radare offers the "ar" (analyze ranges) command to manage range lists.
It can perform addition, subtraction and boolean operations.
> ar+ 0x1000 0x1040 ; manual range addition
> ar+ 0x1020 0x1080 ; manual add with overlap
> ari ; import traces from other places
.....
> .atr* ; import execution traces information
> .CF* ; import function ranges
> .gr* ; import code analysis graph ranges
.....
> ar ; display non overlapped ranges
> ; show booleaned ranges inside text section
> arb section._text section._text_end
We can also import range information from external files or programs. One
of the nicer characteristics of *nix is the ability to communicate
applications using pipes by just making one program 'talk' in the other
program language. So if we write a perl script that parses an IDA database
(or IDC), we can extract the information we need and print radare commands
('ar' in this case) to stdout, and then parsing the results with the '.'
command.
This time ar gives us information about the number of bytes that are
theoretically not going to be executed and its ranges.
We can now place some breakpoints on these locations to ensure we are not
missing anything. Manual revision of the automated code analysis and
tracing results is recommended.
--[ 2.1 - Resolving register based branchs
The 'av' command provides subcommands to analyze opcodes inside
the native virtual machine.
The basic commands are: 'av-' to restart the vm state, 'avr' to manage
registers, 'ave' to evaluate VM expressions and 'avx' that will execute N
instructions from the current seek (will set the eip VM register to the
offset value).
Internally, the virtual machine engine translates instructions into
evaluable strings that are used to manipulate memory and change registers.
This simple concept allows to easily implement support for new
instructions or architectures.
Here is an example. The code analysis has reached a call instruction
addressing with a register. The code looks like:
/* --- */
$ cat callreg.S
.global main
.data
.long foo
.text
foo:
ret
main:
xor %eax, %eax
mov $foo, %ebx
add %eax, %ebx
call *%ebx
ret
$ gcc callreg.S
$ radare -d ./a.out
( ... )
[0xB7FC6810]> !cont sym.main
Continue until (sym.main) = 0x08048375
[0x08048375]> pd 5 @ sym.main
0x08048375 eip,sym.main:
0x08048375 / xor eax, eax
0x08048377 | mov ebx, 0x8048374 ; sym.foo
0x0804837c,| add ebx, eax
0x0804837e | call ebx
0x08048380, \ ret
[0x08048375]> avx 4
Emulating 4 opcodes
MMU: cached
Importing register values
0x08048375 eax ^= eax
; eax ^= eax
0x08048377 ebx = 0x8048374 ; sym.foo
; ebx = 0x8048374
0x0804837c, ebx += eax
; ebx += eax
0x0804837e call ebx
; esp=esp-4
; [esp]=eip+2
;==> [0xbf9be388] = 8048382 ((esp]))
; write 8048382 @ 0xbf9be388
; eip=ebx
[0x08048375]> avr eip
eip = 0x08048374
At this point we can continue the code analysis where the 'eip'
register of the virtual machine points.
[0x08048375]> .afr @ `avr eip~[2]`
--[ 2.2 - Resizing data section
There is no simple way to resize a section in radare1. This process is
completely manual and tricky.
In order to solve this limitation, radare2 provides a set of libraries
(named libr) that reimplement all the functionalities of radare 1.x
following a modular design instead of the old monolithic approach.
Both branches of development are being done in parallel.
Here is where radare2 api (libr) comes in action:
$ cat rs.c
#include <stdio.h>
#include <r_bin.h>
int main(int argc, char *argv[])
{
struct r_bin_t bin;
if (argc != 4) {
printf("Usage: %s <ELF file> <section> <size>\n", argv[0]);
} else {
r_bin_init(&bin);
if (!r_bin_open(&bin, argv[1], 1, NULL)) {
fprintf(stderr, "Cannot open '%s'.\n", argv[1]);
} else
if (!r_bin_resize_section(&bin, argv[2], r_num_math(NULL,argv[3]))) {
fprintf(stderr, "Cannot resize section.\n");
} else {
r_bin_close(&bin);
return 0;
}
}
return 1;
}
$ gcc -I/usr/include/libr -lr_bin rs.c -o rs
$ rabin -S my-target-elf | grep .data
idx=24 address=0x0805d0c0 offset=0x000150c0 size=00000484 \
align=0x00000020 privileges=-rw- name=.data
$ ./rs my-target-elf .data 0x9400
$ rabin -S my-target-elf | grep .data
idx=24 address=0x0805d0c0 offset=0x000150c0 size=00009400 \
align=0x00000020 privileges=-rw- name=.data
Our 'rs' program will open a binary file (ELF 32/64, PE32/32+, ...) it will
resolve a section named '.data' to change its size and will shift the rest
of sections to keep the elf information as is.
The reason for resizing the .data section is because this one is commonly
linked after the text section. There will be no need to reanalyze the
program code to rebase all the data accesses to the new address.
The target program should keep working after the section resizing and we
already know which places in the program can be used to inject our
trampoline.
+-------+ +-------+
| .text | | .text |
| | | | <-- inject trampoline
|-------| -> |-------|
| .data | | .data |
+-------+ | | <-- inject new code or data
| |
+-------+
Next chapters will explain multiple trampoline methods that will use our
resized data section to store pointer tables or new code.
--[ 2.3 - Basics on code injection
Self modifying code is a nice thing, but usually a dangerous task too. Also
patches/tools like SElinux will not let us to write to an already
executable section.
To avoid this problem we can change in the ELF headers the .text perms to
writable but not executable, and include our decryption function in a new
executable section. When the decryption process is done, it will have to
change text perms to executable and then transfer control to it.
In order to add multiple nested encryption layers the task will be harder
mostly because of the relocation problems, and if the program is not
PIC-ed, rebasing an entire application on the fly by allocating and copying
the program multiple times in memory can be quite complicated.
The simplest implementation consists in these steps:
- Define from/to ranges
- Define cipher key and algorithm
- Inject deciphering code in the entrypoint
- We will need to use mprotect before looping
- Statically cipher the section
The from/to addresses can be hardcoded in the assembly snippet or written
in .data or somewhere for later. We can also ask the user for a key and
make the binary work only with a password.
If the encryption key is not found in the binary, the unpacking process
becomes harder, and we will have to implement a dynamic method to find the
correct one. This will be explained later.
Another way to define ranges is by using watermarks and let our injected
code walk over the memory looking for these marks.
The last step is probably the easier one. We already know the key and the
algorithm, its time to rewrite the inverse one using radare commands and
let the script run over these ranges.
This is an example:
The 'wo' command is a subcommand of the 'w' (write) which offers arithmetic
write operations over the current block.
[0x00000000]> wo?
Usage: wo[xrlasmd] [hexpairs]
Example: wox 90 ; xor cur block with 90
Example: woa 02 03 ; add 2, 3 to all bytes of cur block
Supported operations:
woa addition +=
wos subtraction -=
wom multiply *=
wod divide /=
wox xor ^=
woo or |=
woA and &=
wor shift right >>=
wol shift left <<=
It uses cyclic hexpair byte arrays as arguments. Just by seeking on the
"from" address and setting blocksize as "to-from" we will be ready to type
the 'wo' ones.
Implementing this process in macros will make the code more maintainable
and easy to manage. The understanding of functional programming is good
for writing radare macros, split the code in small functions doing
minimal tasks and make them interact with each other.
Macros are defined like in lisp, but syntax is really closer to a mix
between perl and brainfuck. Don't get scary at first instance..it doesn't
hurt as much as you can think and the resulting code is funnier than
expected ;)
(cipher from to key
s $0
b $1-$0
woa $2
wox $2)
We can put it in a single line (by replacing newlines with commas ',')
and put it in our ~/.radarerc:
(cipher from to key,s $0,b $1-$0,woa $2,wox $2,)
Now we can call this macro at any time by just prepending a dot before the
parenthesis and feeding it with the arguments.
f from @ 0x1200
f to @ 0x9000
.(cipher from to 2970)
To ease the code injection we can write a macro like this:
(inject hookaddr file addr
wa call $2 @ $0
wf $1 @ $2)
Feeding this macro with the proper values will assemble a call opcode
branching to the place where we inject our code (addr or $2) stored in the
'file' or $1.
Radare has the expanded AES key searching algorithm developed by Victor
Mu~noz. It can be helpful while reversing a program that uses this kind of
ciphering algorithm. As the IO is abstracted it is possible to launch this
search against files, program memory, ram dumps, etc..
For example:
$ radare -u /dev/mem
[0x00000000]> /a
...
WARNING: Most current GNU/Linux distributions comes with the kernel
compiled with the CONFIG_STRICT_DEVMEM option which restricts the access
of /dev/mem to the only first 1.1M.
All write operations are stored in a linked list of backups with the
removed contents. This make possible to undo and redo all changes done
directly on file. This allows you to try different modifications on the
binary without having to restore manually the original one.
These changes can be diffed later with the "radiff -r" command. The command
performs a deltified binary diff between two files, The '-r' displays the
result of the operation as radare commands that transforming the first
binary into the second one.
The output can be piped to a file and used later from radare to patch it
at runtime or statically.
For example:
$ cp /bin/true true
$ radare -w true
[0x08048AE0]> wa xor eax,eax;inc eax;xor ebx,ebx;int 0x80 @ entrypoint
[0x08048AE0]> u
03 + 2 00000ae0: 31 ed => 31 c0
02 + 1 00000ae2: 5e => 40
01 + 2 00000ae3: 89 e1 => 31 db
00 + 2 00000ae5: 83 e4 => cd 80
[0x08048AE0]> u* | tee true.patch
wx 31 c0 @ 0x00000ae0
wx 40 @ 0x00000ae2
wx 31 db @ 0x00000ae3
wx cd 80 @ 0x00000ae5
The 'u' command is used to 'undo' write operations. Appending the '*' we
force the 'u' command to list the write changes as radare commands, and we
pipe the output to the 'tee' unix program.
Let's generate the diff:
$ radiff -r /bin/true true
e file.insertblock=false
e file.insert=false
wx c0 @ 0xae1
wx 40 @ 0xae2
wx 31 @ 0xae3
wx db @ 0xae4
wx cd @ 0xae5
wx 80 @ 0xae6
This output can be piped into radare through stdin or using the '.'
command to interpret the contents of a file or the output of a command.
[0x08048AE0]> . radiff.patch
or
[0x08048AE0]> .!!radiff -r /bin/true $FILE
The '$FILE' environment is exported to the shell from inside radare as well
as other variables like $BLOCK, $BYTES, $ENDIAN, $SIZE, $ARCH, $BADDR, ...
Note that in radare there's not really much difference between memory
addresses and on-disk ones because the virtual addresses are emulated by
the IO layer thanks to the io.vaddr and io.paddr variables which are used
to define the virtual and physical addresses for a section or for the whole
file.
The 'S' command is used to configure a fine-grained setup of virtual and
physical addressing rules for ranged sections of offsets.
Do not care too much about it because 'rabin' will configure these
variables at startup when the file.id eval variable is true.
--[ 2.4 - Mmap trampoline
The problem faced when resizing the .text section is that .data section is
shifted, and the program will try to access it absolutely or relatively.
This situation forces us to rebase all the text section and adapt the rest of
sections.
The problem is that we need a deep code analysis to rebase and this is
something that we will probably do wrong if we try to do only a static code
analysis. This situation usually require a complex dynamic, and sometimes
manual, analysis to fix all the possible pointers.
To skip this issue we can just resize the .data section which is after the
.text one and just write a trampoline in the .text section loading code
from .data and putting it in memory using mmap.
The decision to use mmap is because is the only way to get executable pages
with write permissions in memory even with SELinux enabled.
The C code looks like this:
---------------------8<--------------------------
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int run_code(const char *buf, int len)
{
unsigned char *ptr;
int (*fun)();
ptr = mmap(NULL, len,
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (ptr == NULL)
return -1;
fun = (int(*)(void))ptr;
memcpy(ptr, buf, len);
mprotect(ptr, len, PROT_READ | PROT_EXEC);
return fun();
}
int main()
{
unsigned char trap = 0xcc;
return run_code(&trap, 1);
}
---------------------8<--------------------------
The manual rewrite in assembly become:
---------------------8<--------------------------
.global main
.global run_code
.data
retaddr: .long 0
shellcode: .byte 0xcc
hello: .string "Hello World\n"
.text
tailcall:
push %ebp
mov %esp, %ebp
push $hello
call printf
addl $4, %esp
pop %ebp
ret
run_code:
pop (retaddr)
/* lcall *0 */
call tailcall
push (retaddr)
/* our snippet */
push %ebp
mov %esp, %ebp
pusha
push %ebp
xor %ebp, %ebp /* 0 */
mov $-1, %edi /* -1 */
mov $0x22, %esi /* MAP_ANONYMOUS | MAP_PRIVATE */
mov $7, %edx /* PROT_EXEC | PROT_READ | PROT_WRITE */
mov $4096, %ecx /* len = 4096 */
xor %ebx, %ebx /* 0 */
mov $192, %eax /* mmap2 */
int $0x80
pop %ebp
mov %eax, %edi /* dest pointer */
mov 0x8(%ebp), %esi /* get buf (arg0) */
mov $128, %ecx
cld
rep movsb
mov %eax, %edi
mov $5, %edx /* R-X */
mov $4096, %ecx /* len */
mov %edi, %ebx /* addr */
mov $125, %eax /* mprotect */
int $0x80
call *%edi
popa
pop %ebp
ret
main:
push $shellcode
call run_code
add $4, %esp
ret
---------------------8<--------------------------
Running this little program will result on a trap instruction executed on the
mmaped area. I decided to do not checks on return values to make the code
smaller.
$ gcc rc.S
$ ./a.out
Hello World
Trace/breakpoint trap
The size in bytes of this code can be measured with this line in the shell:
$ echo $((`rabin -o d/s a.out |grep run_code|cut -d ' ' -f 2 |wc -c`/2))
76
This graph explains the layout of the patch required
+----+ / push dword csu_fini
.---------------| EP | < push dword csu_init
| +----+ \ push dword main
V v
+------------+ +--------+
| csu_init | | main | / push 0x80483490
|------------| |--------| < call getopt
| mmap | .-| ... | \ mov [esp+30], eax
| trampoline | | | |
+------------+ | +--------+ .-----------------.
||'|| | | this code needs |
| |- - - - |- - - - - - - - - - - - - -. < the mmap tramp |
: __|___ +-----------+ . | to be executed |
. / hook \ ---> | mmap code | . `-----------------'
. \getopt/ +-----------+ .
- - - - - - - - - - - - - -|- - - - - - '
|
+--------+ / fall to the getopt flow
| getopt | <-----'
+--------+
|
(ret)
The injection process consists on the following steps:
* Resize the .data section to get enought space for our code
f newsize @ section._data_end - section._data + 4096
!rabin2 -o r/.data/`?v newsize` $FILE
q!
* Write the code at the end of the unresized .data section
!!gcc our-code.c
"!!rabin a.out -o d/s a.out | grep ^main | cut -d ' ' -f 2 > inj.bytes"
wF inj.bytes @ section._data_end
* Null the __libc_csu_init pointer at entrypoint (2nd push dword)
e asm.profile=simple
s entrypoint
wa push 0 @ `pd 20~push dword[0]#1`
* Inject the mmap loader at symbol __libc_csu_init
# generate mmap.bytes with the mmap trampoline
!!gcc mmap-tramp.S
"!!rabin a.out -o d/s a.out | grep ^main | cut -d ' ' -f 2 > mmap.bytes"
# write hexpair bytes contained in mmap.bytes file at csu_init
wF mmap.bytes @ sym.__libc_csu_init
* Hook a call instruction to bridge the execution of our injected code
and continue the original call workflow.
- Determine the address of the call to the target symbol to be hooked.
For example. Having 'ping' program, this script will determine an
xref to the getopt import PLT entry.
s entrypoint
# seek to the third push dword in the entry point (main)
s `pd 20~push dword[3]#2`
# Analyze function: the '.' will interpret the output of the command
# 'af*' that prints radare commands registering xrefs, variable
# accesses, stack frame changes, etc.
.af*
# generate a file named 'xrefs' containing the code references
# to the getopt symbol.
Cx~`?v imp.getopt`[1] > xrefs
# create an eNumerate flag at every offset specified in 'xrefs' file
fN calladdr @@.xrefs
- Create a proxy hook to bridge execution without destroying the stack.
This is explained in the following lines.
- Change the call of the hook to our proxy function
# write assembly code 'call <addr>'
# quotes will replace the inner string with the result of the
# command ?v which prints the hexadecimal value of the given expr.
wa call `?v dstaddr`
- Profit
Here is an example assembly code that demonstrates the functionality of the
trampoline for the patched call. This snippet of code can be injected after
the mmap trampoline.
----------8<-----------
.global main
trampoline:
push $hookstr
call puts
add $4, %esp
jmp puts
main:
push $hello
call trampoline
add $4, %esp
ret
.data
hello:
.string "hello world"
hookstr:
.string "hook!"
----------8<-----------
The previous code cannot be injected directly in the binary, and one of the
main conceptual reasons is that the destination address of the trampoline
will be defined at runtime, when the mmapped region of the data section is
ready with executable permissions, the destination address must be stored
somewhere in the data section.
--[ 2.4.1 - Call trampoline
Once the holes have been identified we can now modify some parts of the
program to obfuscate the control flow, symbol access, function calls and
inlined syscalls in different ways.
To walk through the program code blocks we can use iterators or nested
iterators to run from simple to complex scripted code analysis engines and
retrieve or manipulate the program by changing instructions or moving code.
The control flow of the program can be obfuscated by adding a calltable
trampoline for calling functions. We will have to patch the calls and store
the jump addresses in a table in the data section.
The same hack can be done for long jumps by placing a different trampoline
to drop the stacked eip and use jumps instead of calls.
Iterators in radare can be implemented as macros appending the macro call
after the @@ foreach mark.
For example: "x @ 0x200"
will temporally seek to 0x200 and run the "x" command.
Using: "x @@ .(iterate-functions)"
will run the iterator macro at every address returned by the macro.
To return values from a macro we can use the null macro body and append the
resulting offset. If no value is given, null is returned and the iterator
will stop.
Heres the implementation for the function-iterator:
"(iterate-functions,()CF~#$@,)"
We can also use the search engine with cmd.hit eval variable to run a
command when a search hit is found. Obviously, the command can be a macro
or a script file in any language.
e cmd.hit=ar- $$ $$+5
/x xx yy zz
But we can also do it after processing the search by just removing the
false positive hits and running a foreach command.
# configure and perform search
e search.from=section._text
e search.to=section._text_end
e cmd.search=
/x xx yy zz
# filtering hit results
fs search ; enter into the search flagspace
f ; list flags in search space
f -hit0_3 ; remove hit #3
# write on every filtered hit
wx 909090 @@ hit*
Flagspaces are groups of flags. Some of them are automatically created by
rabin while identifying strings, symbols, sections, etc, and others are
updated at runtime like by commands like 'regs' (registers) or 'search'
(search results).
You can switch to or create another flagspace using the 'fs' command. When
a flagspace is selected the 'f' command will only show the flags in the
current flagspace. 'fs *' is used to enable all flagspaces at the same time
(global view). See fs? for extended help.
The internal metadata database can be managed with the C command and stores
information about code and data xrefs, comments, function definitions,
type, data conversions and structure definitions to be used by the
disassembler engine, the code analysis module and user defined scripts.
Data and code xrefs are very useful reference points while reverse
engineering. It is interesting to obfuscate them all as much as possible
to fool code analysis engines, forcing the reverser to use runtime or
emulated environments to determine when a string or a buffer is used.
To analyze a function in radare is preferably to go into Visual mode with
command 'V', and then press the keys "d" and "f".
This action will make a code analysis from the current seek or where the
cursor points (if in cursor mode, toggleable with the lowercase "c" key in
Visual mode). This code analysis will define multiple things like argument,
local and global variable accesses, feeding the database with xref
information that can be later used to determine which points of the program
are using a certain pointer.
By defining few eval vars in your ~/.radarerc most of this basic code
analysis will be done by default.
$ cat ~/.radarerc
e file.id=true ; identify file type and set arch/baddr...
e file.analyze=true ; perform basic code analysis at startup
e file.flag=true ; add basic flags
e scr.color=true ; use color screen
NOTE: By passing the '-n' parameter to radare the startup process will skip
the interpretation of .radarerc.
All this information can be cached, manipulated and restored using project
files (with the -P program flag at startup or enabling this feature in
runtime with the 'P' command).
After this parenthesis we continue obfuscating the xrefs to make reversers
life funnier.
One simple way to do this would be to create an initialization code,
copying some pointers required by the program to run at random mmaped
addresses and defining a way to transform these accesses into a sequence
jigsaw. This can be achieved turning each pointer "random" based on some
rules like a shifted array accessed by a mod.
# calltable initializer
(ct-init tramp ctable ctable_end
f trampoline @ $0
f calltable @ $1
f calltable_ptr @ calltable
wf trampoline.bin @ trampoline
b $2-$1
wb 00 ; fill calltable with nops
)
; calltable adder
(ct-add
; check if addr is a long call
? [1:$$]==0xeb
?!()
; check if already patched
? [4:$$+1]==trampoline
??()
wv $$+5 @ calltable_ptr
yt 4 calltable_ptr+4 @ $$+1
f calltable_ptr @ calltable_ptr+8
wv trampoline @ $$+1
)
We can either create a third helper macro that automatically iterates over
every long call instruction in .text and running ct-add on it.
(ct-run from to tramp ctable ctable_end
.(ct-init tramp ctable ctable_end)
s from
loop:
? [1:$$]==0xeb
?? .(ct-add)
s +$$$
? $$ < to
??.loop:
)
Or we can also write a more fine-grained macro to walk over the function
e asm.profile=simple
pD~call[0] > jmps
.(ct-init 0x8048000 0x8048200)
.(ct-add)@@.jmps
The trampoline.bin file will be something like this:
---------------------------------------------------------------
/*
* Calltable trampoline example // --pancake
* ---------------------------------------------------
* $ gcc calltrampoline.S -DCTT=0x8048000 -DCTT_SZ 100
*/
#ifndef CTT_SZ
#define CTT_SZ 100
#endif
#ifndef CTT
#define CTT call_trampoline_table
#endif
.text
.global call_trampoline
.global main
main:
/* */
call ct_init
call ct_test
/* */
pushl $hello
call puts
add $4, %esp
ret
call_trampoline:
pop %esi
leal CTT, %edi
movl %esi, 0(%edi) /* store ret addr in ctable[0] */
0:
add $8, %edi
cmpl 0(%edi), %esi /* check ret addr against ctable*/
jnz 0b
movl 4(%edi), %edi /* get real pointer */
call *%edi /* call real pointer */
jmp *CTT /* back to caller */
/* initialize calltable */
ct_init:
leal CTT+8, %edi
movl $retaddr, 0(%edi)
movl $puts, 4(%edi)
ret
/* hello world using the calltable */
ct_test:
push $hello
call call_trampoline
retaddr:
add $4, %esp
ret
.data
hello:
.string "Hello World"
call_trampoline_table:
.fill 8*(CTT_SZ+1), 1, 0x00
---------------------------------------------------------------
Note that this trampoline implementation is not thread safe. If two threads
are use the same trampoline call table at the same time to temporally store
the return address, the application will crash.
To fix this issue you should use the %gs segment explained in the 'syscall
obfuscation' chapter. But to avoid complexity no thread support will be
added at this moment.
To inject the shellcode into the binary we will have to make some space in
the .data segment for the calltable. We can do this with gcc -D:
[0x8048400]> !!gcc calltrampoline.S -DCTT=0x8048000 -DCTT_SZ=100
> !! rabin -o d/s a.out|grep call_trampoline|cut -d : -f 2 > tramp.hex
> wF tramp.hex @ trampoline_addr
--[ 2.4.2 - Extending trampolines
We can take some real code from the target program and move it into the
.data section encrypted with a random key. Then we can extend the call
trampoline to run an initialization function before running the function.
Our call trampoline extension will cover three points:
- Function prelude obfuscation - Function mmaping (loading code from data)
- Unciphering and checksumming
At the same time we can re-encrypt the function again after calling the end
point address. This will be good for our protector because the reverser
will be forced to manually step into the code because no software
breakpoints will be able to be used on this code because they will be
re-encrypted and the checksumming will fail.
To do this we need to extend our call trampoline table or just add another
field in the call trampoline table specifying the status of the destination
code (if it is encrypted or not) and optionally the checksum.
The article aims to explain guidelines and possibilities when playing with
binaries. But we are not going to implement such kind of extensions to
avoid enlarge too much this article explaining these advanced techniques.
The 'p' command is used to print the bytes in multiple formats, the more
powerful print format is 'pm' which permits to describe memory contents
using a format-string like string and name each of the fields.
When the format-string starts with a numeric value it is handled as an
array of structures. The 'am' command manages a list of named 'pm' commands
to ease the re-use of structure signatures along the execution of radare.
We will start introducing this command describing the format of an entry of
our trampoline call table:
[0x08049AD0]> pm XX
0x00001ad0 = 0x895eed31
0x00001ad4 = 0xf0e483e1
Now adding the name of fields:
[0x08049AD0]> pm XX from to
from : 0x00001ad0 = 0x895eed31
to : 0x00001ad4 = 0xf0e483e1
And now printing a readable array of structures:
[0x08049AD0]> pm 3XX from to
0x00001ad0 [0] {
from : 0x00001ad0 = 0x895eed31
to : 0x00001ad4 = 0xf0e483e1
}
0x00001ad8 [1] {
from : 0x00001ad8 = 0x68525450
to : 0x00001adc = 0x0805b6a0
}
0x00001ae0 [2] {
from : 0x00001ae0 = 0x05b6b068
to : 0x00001ae4 = 0x68565108
}
There is another command named 'ad' that stands for 'analyze data'. It
automatically analyzes the contents of the memory (starting from the
current seek) looking for recognized pointers (following the configured
endianness), strings, pointers to strings, and more.
The easiest way to see how this command works is by starting a program in
debugger mode (f.ex: radare -d ls) and running 'ad@esp'.
The command will walk through the stack contents identifying pointers to
environment variables, function pointers, etc..
Internal grep '~' syntax sugar can be used to retrieve specific fields of
structures for scripting, displaying or analysis.
# register a 'pm' in the 'am' command
[0x08048000]> am foo 3XX from to
# grep for the rows matching 'from'
[0x08048000]> am foo~from
from : 0x00001ad0 = 0x895eed31
from : 0x00001ad8 = 0x68525450
from : 0x00001ae0 = 0x05b6b068
# grep for the 4th column
[0x08048000]> am foo~from[4]
0x895eed31
0x68525450
0x05b6b068
# grep for the first row
[0x08048000]> am foo~from[4]#0
0x895eed31
--[ 3 - Protections and manipulations
This chapter will face different manipulations that can be done on binaries
to add some layers of protections to make reverse engineering on the target
binaries a more complicated.
--[ 3.1 - Trashing the ELF header
Corrupting the ELF header is another possible target of a packer for making
the reverser's life little harder.
Just modifying one byte in the ELF header becomes a problem for tools like
gdb, ltrace, objdump, ... The reason is that they depend on section headers
to get information about sections, symbols, library dependencies, etc...
and if they are not able to parse them they just stop with an error. At the
moment of writing, only IDA and radare are able to bypass this issue.
This can be easily done in this way:
$ echo wx 99 @ 0x21 | radare -nw a.out
A reverser can use the fix-shoff.rs (included in the scripts) directory in
the radare sources, to reconstruct the 'e_sh_off' dword in the ELF header.
$ cat scripts/fix-shoff.rs
; radare script to fix elf.shoff
(fix-shoff
s 0 ; seek to offset 0
s/ .text ; seek to first occurrence of ".text" string
loop:
s/x 00 ; go next word in array of strings
? [1:$$+1] ; check if this is the last one
?!.loop: ; loop if buf[1] is == 0
s +4-$$%4 ; align seek pointer to 4
f nsht ; store current seek as 'nsht'
wv nsht @ 0x20 ; write the new section header table at 0x20 (elf.sh_off)
)
The script is used in this way:
$ echo ".(fix-shoff) && q" | radare -i scripts/fix-shoff.rs -nw "target-elf"
This script works on any generic bin with a trashed shoff. It will not work
on binaries with stripped section headers (after 'sstrip' f.ex.)
This is possible because GCC stores the section header table immediately
after the string table index, and this section is easily identified because
the .text section name will appear on any standard binary file.
It is not hard to think on different ways to bypass this patch. By just
stripping all the rest of section after setting shoff to 0 will work pretty
well and the reverser will have to regenerate the section headers
completely. This can be done by using sstrip(1) (double 's' is not a typo)
from the elf-kickers package. This way we eliminate all unnecessary
sections to run the program.
The nice thing of this header bug is that the e_shoff value is directly
passed to lseek(2) as second argument, and we can either try with values
like 0 or -1 to get different errors instead of just giving an invalid
address. This way it is possible to bypass wrong ELF parsers in other ways.
Similar bugs are found in MACH-O parsers from GNU software like setting
number of sections to -1 in the SEGMENT_COMMAND header. These kind of bugs
don't hit kernels because they usually use minimum input from the program
headers to map it in memory, and the dynamic linker at user space do the
rest. But debuggers and file analyzers usually fail when try to extract
corrupted information from headers.
--[ 3.2 - Source level watermarks
If we have the source of the target application there will be more ways to
play with it. For instance, using defines as volatile asm inline macros,
the developer can add marks or paddings usable for the binary packer.
Some commercial packers offer an SDK which is nothing more than a set of
include files offering helper functions and watermarks to make the packers
life easier. And giving to the developer more tools to protect their code.
$ cat watermark.h
#define WATERMARK __asm__ __volatile__ (".byte 3,1,3,3,7,3,1,3,3,7");
We will use this CPP macro to watermark the generated binary for later
post-processing with radare. We can specify multiple different watermarks
for various purposes. We must be sure that these watermarks are not already
in the generated binary.
We can even have larger watermarks to pad our program with noisy bytes in
the .text section, so we can later replace these watermarks with our
assembly snippets.
Radare can ease the manual process of finding the watermarks and injecting
code, by using the following command. It will generate a file named
'water.list' containing one offset per line.
[0x00000000]> /x 03010303070301030307
[0x00000000]> f~hit[0] > water.list
[0x00000000]> !!cat water.list
0x102
0x13c
0x1a2
0x219
If we want to run a command at every offset specified in the file:
[0x00000000]> .(fill-water-marks) @@. water.list
Inside every fill-water-mark macro we can analyze the watermark type
depending on the 'hit' flag number (we can search more than one keyword at
a time) or by checking the bytes there and perform the proper action.
If those watermarks have a fixed size (or the size is embedded into the
watermark) we can then write a radare script that write a branch
instruction at to skip the padding. (This will avoid program crash because
it will not execute the dummy bytes of our watermark).
Here's an ascii-art explanation:
|<-- watermark ------------------------------------------>|
|<-- unused bytes -->|
+---------------------------------------------------------+
| jmp $$+size | .. .. .. .. .. .. .. .. .. .. .. .. .. .. | program
+---------------------------------------------------------+
| ^
`------------------------------------------------------'
At this point we have some more controlled holes to put our code.
Our variable-sized watermark will be defined in this way:
.long 0,1,2,3,4,5,6,7,4*20,9,10,11,12,13,14,15,16,17,18,19,20
And the watermark filling macro:
(fill-water-marks,wa jmp $${io.vaddr}+$$+[$$+8],)
# run the macro to patch the watermarks with jumps
.(fill-water-marks) @@. water.list
The io.vaddr specified the virtual base address of the program in memory.
--[ 3.3 - Ciphering .data section
A really common protection in binaries consists on ciphering the data
section of a program to hide the strings and make the static code analysis
harder.
The technique presented here aims to protect the data section. This section
generally does not contain program strings (because they are usually
statically defined in the rodata section), but will help to understand some
characteristics of the ELF loader and how programs work with the rw
sections.
The PT_LOAD program header type specifies where, which and how part of the
binary should be mapped in memory. By default, the kernel maps the program
at a base address. More over it will map aligned size (asize) of the
program starting at 'physical' on 'virtual' address with 'flags'
permissions.
$ rabin -I /bin/ls | grep baddr
baddr=0x08048000
$ rabin -H /bin/ls | grep LOAD
virtual=0x08060000 physical=0x00018000 asize=0x1000 size=0x0fe0 \
flags=0x06 type=LOAD name=phdr_0
In this case, 4K is the aligned size of 0xfe0 bytes of the binary will be
loaded at address 0x8060000 starting at address 0x18000 of the file.
By running '.!rabin -rIH $FILE' from radare all this information will be
imported into radare and it will emulate such situation.
The problem we face here, is that the size and address of the mapped area
doesn't match with the offset of the .data section.
Some operations are required to get the correct offsets to statically
modify the program to cipher such range of bytes and inject a snippet
of assembly to dynamically decipher such bytes.
The script looks like:
-------8<------------
# flag from virtual address (the base address where the binary is mapped)
# this will make all new flags be created at current seek + <addr>
# we need to do this because we are calculating virtual addresses.
ff $${io.vaddr}
# Display the ranges of the .data section
?e Data section ranges : `?v section._data`- `?v section._data_end`
# set flag space to 'segments'. This way 'f' command will only display the
# flags contained in this namespace.
fs segments
# Create flags for 'virtual from' and 'virtual to' addresses.
# We grep for the first row '#0' and the first word '[0]' to retrieve
# the offset where the PT_LOAD segment is loaded in memory
f vfrom @ `f~vaddr[0]#0`
# 'virtual to' flag points to vfrom+ptload segment size
f vto @ vfrom+`f~vaddr[1]#0`
# Display this information
?e Range of data decipher : `?v vfrom`- `?v vto`= `?v vto-vfrom`bytes
# Create two flags to store the position of the physical address of the
# PT_LOAD segment.
f pfrom @ `f~paddr[0]#0`
f pto @ pfrom+`f~paddr[1]#0`
# Adjust deltas against data section
# ----------------------------------
# pdelta flag is not an address, we set flag from to 0
# pdelta = (address of .data section)-(physical address of PTLOAD segment)
ff 0 && f pdelta @ section._data-pfrom
?e Delta is `?v pdelta`
# reset flag from again, we are calculating virtual and physical addresses
ff $${io.vaddr}
f pfrom @ pfrom+pdelta
f vfrom @ vfrom+pdelta
ff 0
?e Range of data to cipher: `?v pfrom`- `?v pto`= `?v pto-pfrom`bytes
# Calculate the new physical size of bytes to be ciphered.
# we dont want to overwrite bytes not related to the data section
f psize @ section._data_end - section._data
?e PSIZE = `?v psize`
# 'wox' stands for 'write operation xor' which accepts an hexpair list as
# a cyclic XOR key to be applied to at 'pfrom' for 'psize' bytes.
wox a8 @ pfrom:psize
# Inject shellcode
#------------------
# Setup the simple profile for the disassembler to simplify the parsing
# of the code with the internal grep
e asm.profile=simple
# Seek at entrypoint
s entrypoint
# Seek at address (fourth word '[3]') at line'#1' line of the disassembly
# matching the string 'push dword' which stands for the '__libc_csu_init'
# symbol. This is the constructor of the program, and we can modify it
# without many consequences for the target program (it is not widely used).
s `pd 20~push dword[3]#1`
# Compile the 'uxor.S' specifying the virtual from and virtual to addresses
!gcc uxor.S -DFROM=`?v vfrom` -DTO=`?v vfrom+psize`
# Extract the symbol 'main' of the previously compiled program and write it
# in current seek (libc_csu_init)
wx `!!rabin -o d/s a.out|grep ^main|cut -d ' ' -f 2`
# Cleanup!
?e Uncipher code: `?v $$+$${io.vaddr}`
!!rm -f a.out
q!
-------8<------------
The unciphering snippet of code looks like:
-------8<------------
.global main
main:
mov $FROM, %esi
mov $TO, %edi
0:
inc %esi
xorb $0xa8, 0(%esi)
cmp %edi, %esi
jbe 0b
xor %edi, %edi
ret
-------8<------------
Finally we run the code:
$ cat hello.c
#include <stdio.h>
char message[128] = "Hello World";
int main()
{
message[1]='a';
if (message[0]!='H')
printf("Oops\n");
else
printf("%s\n", message);
return 0;
}
$ gcc hello.c -o hello
$ strings hello | grep Hello
Hello World
$ ./hello
Hallo World
$ radare -w -i xordata.rs hello
$ strings hello | grep Hello
$ ./hello
Hallo World
The 'Hello World' string is no longer in plain text. The ciphering
algorithm is really stupid. The reconstructed data section can be
extracted from a running process using the debugger:
$ cat unpack-xordata.rs
e asm.profile=simple
s entrypoint
!cont `pd 20~push dword[3]#2`
f delta @ section._data- segment.phdr_0.rw_.paddr-section.
s segment.phdr_0.rw_.vaddr+delta
b section._data_end - section._data
wt dump
q!
$ radare -i unpack-xordata.rs -d ./hello
$ strings dump
Hello World
Or statically:
e asm.profile=simple
wa ret @ `pd 20~push dword[3]#1`
wox a8 @ section._data:section._data_end-section.data
This is a plain example of how complicated can be to add a protection and
how easy is to bypass it.
As a final touch for the unpacker, just write a 'ret' instruction at symbol
'__libc_csu_init' where the deciphering loop lives:
wa ret @ sym.__libc_csu_init
Or just push a null pointer instead of the csu_init pointer in the entrypoint.
e asm.profile=simple
s entrypoint
wa push 0 @ `pd 20~push dword[0]#1`
--[ 3.4 - Finding differences in binaries
Lets draw a scenario where a modified binary (detected by a checksum file
analyzer) has been found on a server, and the administrator or the forensic
analyst wants to understand whats going on that file.
To simplify the explanation the explanation will be based on the previous
chapter example.
At first point, checksum analysis will determine if there's any collision
in the checksum algorithm, and if the files are really different:
$ rahash -a all hello
par: 0
xor: 00
hamdist: 01
xorpair: 2121
entropy: 4.46
mod255: 84
crc16: 2654
crc32: a8a3f265
md4: cbfc07c65d377596dd27e64e0dcac6cd
md5: 2b3b691d92e34ad04b1affea0ad2e5ab
sha1: 8ac3f9165456e3ac1219745031426f54c6997781
sha256: 749e6a1b06d0cf56f178181c608fc6bbd7452e361b1e8bfbcfac1310662d4775
sha384: c00abb14d483d25d552c3f881334ba60d77559ef2cb01ff0739121a178b05...
sha512: 099b42a52b106efea0dc73791a12209854bc02474d312e6d3c3ebf2c272ac...
$ rahash -a all hello.orig
par: 0
xor: de
hamdist: 01
xorpair: a876
entropy: 4.29
mod255: 7b
crc16: 2f1f
crc32: b8f63e24
md4: bc9907d433d9b6de7c66730a09eed6f4
md5: 6085b754b02ec0fc371a0bb7f3d2f34e
sha1: 6b0d34489c5ad9f3443aadfaf7d8afca6d3eb101
sha256: 8d58d685cf3c2f55030e06e3a00b011c7177869058a1dcd063ad1e0312fa1de1
sha384: 6e23826cc1ee9f2a3e5f650dde52c5da44dc395268152fd5c98694ec9afa0...
sha512: 09b7394c1e03decde83327fdfd678de97696901e6880e779c640ae6b4f3bf...
There is the possibility to generate two different files matching until 3
different checksum algorithms (and maybe more in the future). If any of
them differs we can determine that the file is different.
$ du -bs hello hello.orig
4913 hello
4913 hello.orig
As both files have the same size it will be worth using a byte-level
diffing, because it is more probable easy to read the differences than
trying to understand them as deltified ones (which is the default diffing
algorithm).
$ radiff -b hello.orig hello
------
000003ed 00
000003f0 55 | 60
000003f1 89 | be
000003f2 e5 | c0
...
00000406 fe | 61
00000407 ff | c3
00000408 ff | 90
00000409 8d | 90
0000040a bb | 90
0000040b 18 | 90
0000040c ff
------
000005bd 00
000005c0 00 | a8
000005c1 00 | a8
000005c2 00 | a8
...
000005df 00 | a8
000005e0 48 | e0
000005e1 65 | cd
000005e2 6c | c4
000005e3 6c | c4
000005e4 6f | c7
0000065f 00 | a8
00000660 00
Radiff detects two blocks of changes determined by the '------' lines.
The first block starts at 0x3f0 and the second one (which is bigger) begins
at address 0x5c0.
While watching the massive 00->a8 conversion it is possible to blindly
determine that the ciphering algorithm is XOR and the key is 0xa8. But
let's get a look on the first block of changes and disassemble the code.
$ BYTES=$(radiff -f 0x3f0 -t 0x40b -b hello.orig hello | awk '{print $4}')
$ rasm -d "$(echo ${BYTES})"
pushad
mov esi, 0x80495c0
mov edi, 0x8049660
inc esi
xor byte [esi], 0xa8
cmp esi, edi
jbe 0x38
popad
ret
...
Here it is! this code of assembler is looping from 0x80495c0 to 0x8049660
and XORing each byte with 0xa8.
radiff can also dump the results of the diff in radare commands. The
execution of the script will result on a binary patch script to convert one
file to another.
$ radiff -rb hello hello.orig > binpatch.rs
$ cat binpatch.rs | radare -w hello
$ md5sum hello hello.orig
6085b754b02ec0fc371a0bb7f3d2f34e hello
6085b754b02ec0fc371a0bb7f3d2f34e hello.orig
--[ 3.5 - Removing library dependencies
In some gnu/linux distributions, all the ELF binaries are linked against
libselinux, which is an undesired dependency if we want to make our hello
world portable. SElinux is not available in all gnu/linux distros.
This is a simple example, but we can extend the issue to bad pkg-config
configuration files that can make a binary depend on more libraries than
the minimum required.
Perhaps most users will not care and will probably install those
dependencies, recompile the kernel or whatever else. It is a pity to make
steps in backward instead of telling gcc to not link against those
libraries.
But we can avoid these dependencies easily: all the library names are
stored as zero-terminated strings in the ELF header, and the loader (at
least the Solaris, Linux and BSD ones) will ignore dependency entries if
the library name is zero length.
The patch consists on finding the non-desired library name (or the prefix,
if there are more than one libraries with similar names) and then write
0x00 in each search result.
$ radare -nw hello
[0x00000000]> / libselinux
... (results) ...
[0x00000000]> wx 00 @@ hit
[0x00000000]> q!
Using -nw radare will run without interpreting the ~/.radarerc and it will
not detect symbols or analyze code (we only need to use the basic
hexadecimal capabilities). The -w flag will open the file in read-write.
Each search result will be stored in the 'search' flagspace with names
predefined by the 'search.flagname' eval variable that will be hit%d_%d by
default. (This is keyword_index and hit_index).
The last command will run 'wx 00' at every '@@' flag matching 'hit' in the
current flagspace.
Use 'q!' if you don't want to be prompted to restore the changes done
before exiting.
--[ 3.6 - Syscall obfuscation
Looking for raw int 0x80 instructions on standard binaries it's going
probably to be a hard task unless these are statically compiled. This is
because they usually live in libraries which are embedded in the program.
Libraries like libc have loads of symbols that are directly mapped to
syscalls, and the reverser could use this information to identify code in a
statically compiled program.
To obfuscate symbols we can embed syscalls directly on the host program,
after trashing the stack to break standard backtraces.
If you go deep enough on the new syscalling system of Linux you will notice
that the int 0x80 syscall mechanism is no longer used in the current x86
platforms because performance reasons.
The new syscall mechanism consists in calling a trampoline function
installed by the kernel as a 'fake' ELF shared object mapped on the running
process memory. This trampoline function implements the most suitable
syscall mechanism for the current platform. This shared object is called
VDSO.
In Linux only two segment registers are used: cs and gs. This is for
simplicity reasons. The first one allows to access to the whole program
memory maps, and the second one is used to store Thread related
information.
Here is a simple segment dumper to get the contents from %gs.
----------------------8<----------------------------
$ cat segdump.S
#define FROM 0 /* 0x8048000 */
#define LEN 0x940
#define TIMES 1
#define SEGMENT %gs /* %ds */
.global main
.data
buffer:
.fill LEN, TIMES, 0x00
.text
main:
mov $TIMES, %ecx
loopme:
push %ecx
sub $TIMES, %ecx
movl $LEN, %eax
imul %eax, %ecx
movl %ecx, %esi /* esi = ($TIMES-%ecx)*$LEN */
addl $FROM, %esi
lea buffer, %edi
movl $LEN, %ecx
rep movsb SEGMENT:(%esi),%es:(%edi)
movl $4, %eax
movl $1, %ebx
lea buffer, %ecx
movl $LEN, %edx
int $0x80
pop %ecx
xor %eax, %eax
cmp %eax, %ecx
dec %ecx
jnz loopme
ret
----------------------8<----------------------------
At this point we disable the random va space to analyze the contents of the
VDSO map always at the same address during executions.
$ sudo sysctl -w kernel.randomize_va_space=0
Compile and dump the segment to a file:
$ gcc segdump.S
$ ./a.out > gs.segment
And now let's do some little analysis in the debugger:
$ radare -d ls ; debug 'ls' program
...
; Inside the debugger we write the contents of the gs.segment file.
; In the initial state, the current seek points to 'eip'.
[0x4A13B8C0]> wf gs.segment
Poke 100 bytes from gs.segment 1 times? (y/N) y
file poked.
offset 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x4a13b8c0, c006 e8b7 580b e8b7 c006 e8b7 0000 0000 ....X...........
0x4a13b8d0, 1414 feb7 ec06 0a93 b305 6beb 0000 0000 ..........k.....
0x4a13b8e0, 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x4a13b8f0, 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x4a13b900, 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x4a13b910, 0000 0000 0000 0000 0000 0000 0000 0000 ................
; Once the file is poked in process memory we have different ways to
; analyze the contents of the current data block. One possible option
; is to use the 'pm' command which prints memory contents following the
; format string we use as argument, it is also possible to specify the
; element names as in a structure.
;
; Another option is the 'ad' command which 'analyzes data' reading the
; memory as 32bit primitives following pointers and identifying
; references to code, data, strings or null pointers.
[0x4A13B8C0]> pm XXXXXXXXX
0x4a13b8c0 = 0xb7fe46c0
0x4a13b8c4 = 0xb7fe4ac8
0x4a13b8c8 = 0xb7fe46c0
0x4a13b8cc = 0x00000000 ; eax
0x4a13b8d0 = 0xb7fff400 ; maps._vdso__0+0x400
0x4a13b8d4 = 0xff0a0000
0x4a13b8d8 = 0x856003b1
0x4a13b8dc = 0x00000000 ; eax
0x4a13b8e0 = 0x00000000 ; eax
[0x4A13B8C0]> ad
0x4a13b8c0, int be=0xc046feb7 le=0xb7fe46c0
0x4a13b8c4, int be=0xc84afeb7 le=0xb7fe4ac8
0x4a13b8c8, int be=0xc046feb7 le=0xb7fe46c0
0x4a13b8cc, (NULL)
0x4a13b8d0, int be=0x00f4ffb7 le=0xb7fff400 maps._vdso__0+0x400
0xb7fff400, string "QRU"
0xb7fff404, int be=0xe50f3490 le=0x90340fe5
; Both analysis result in a pointer to the vdso map at 0x4a13b8d0
; Disassembling these contents with 'pd' (print disassembly) we get that
; the code uses sysenter.
[0x4A13B8C0]> pd 17 @ [+0x10]
_vdso__0:0xb7fff400, 51 push ecx
_vdso__0:0xb7fff401 52 push edx
_vdso__0:0xb7fff402 55 push ebp
.-> _vdso__0:0xb7fff403 89e5 mov ebp, esp
| _vdso__0:0xb7fff405 0f34 sysenter
| _vdso__0:0xb7fff407 90 nop
| _vdso__0:0xb7fff408, 90 nop
| _vdso__0:0xb7fff409 90 nop
| _vdso__0:0xb7fff40a 90 nop
| _vdso__0:0xb7fff40b 90 nop
| _vdso__0:0xb7fff40c, 90 nop
| _vdso__0:0xb7fff40d 90 nop
`=< _vdso__0:0xb7fff40e ebf3 jmp 0xb7fff403
_vdso__0:0xb7fff410, 5d pop ebp
_vdso__0:0xb7fff411 5a pop edx
_vdso__0:0xb7fff412 59 pop ecx
_vdso__0:0xb7fff413 c3 ret
After this little analysis we can use a new way to call syscalls and
transform the "int 0x80" instructions into a bunch of operations.
new_syscall:
push %edi
push %esi
movl $0x10, %esi
movl %gs:(%esi), %edi
call *%edi
pop %esi
pop %edi
ret
Using this 'big' primitive we are able to add more trash code and make
static analysis harder.
Using dynamic analysis it is possible to bypass this protection by writing
a simple script that dumps the backtrace every time a syscall is executed.
(syscall-bt
e scr.tee=log.txt
label:
!contsc
!bt
.label:
)
Or with a shell single liner:
$ echo '(syscall-bt,e scr.tee=log.txt,label:,!contsc,!bt,.label:,) \
&&.(syscall-bt)' | radare -d ./a.out
A way to make the analysis harder is to trash the stack contents with a
random offset every time we run a syscall. By decreasing the stack pointer
and later incrementing it we will make the backtrace fail.
To trash the stack we can just increment the stack before the syscall and
decrement it after calling it. The analyst will have to patch these inc/dec
opcodes for every syscall trampolines to get proper backtrace and resolve
the xrefs for each entry.
Radare implements a "!contsc" command that makes the debugger stop before
and after a syscall. This can be scripted together with "!bt" and "!st" to
get a handmade version of the strace utility.
Another trick that we can exploit for writing our homebrew packer is to use
instructions needed to be patched for proper analysis as cipher keys.
Making the program crash when trying to decipher a section using invalid
keys for example.
--[ 3.7 - Replacing library symbols
Statically linked binaries are full of unused bytes, we can take use of
this to inject our code.
Lot of library symbols (for example from libc) can be directly replaced by
system calls at the symbol or directly in-place where it is called. For
example:
call exit
can be replaced these (also 5 byte length) instructions:
xor eax,eax
inc eax
int 0x80
$ rasm 'xor eax,eax;inc eax;int 0x80'
31 c0 40 cd 80
This way we can just go into the statified libc's exit implementation and
overwrite it with our own code.
If the binary is statically compiled with stripped symbols we will have to
look for signatures or directly find the 'int 0x80' instruction and then
analyze code backward looking for the beginning of the implementation and
then find for xrefs to this point.
Most compilers append libraries at the end of the binary, so we can easily
'draw' an imaginary line where the program ends and libraries start.
There's another tool that may help to identify which libraries and versions
are embedded in our target program. It is called 'rsc gokolu'. The name
stands for 'g**gle kode lurker' and it will look for strings in the given
program in the famous online code browser and give approximated results of
the source of the string.
When we have debugging or symbolic information the process becomes much
easier and we can directly overwrite the first bytes of the embedded glibc
code with the small implementation in raw assembly using syscalls.
rabin is the utility in radare which extracts information from the headers
of a given program. This is like a multi-architecture multi-platform and
multi-format 'readelf' or 'nm' replacement which supports ELF32/64,
PE32/32+, CLASS and MACH0.
The use of this tool is quite simple and the output is quite clean to ease
the parsing with sed and other shell tools.
$ rabin -vi a.out
[Imports]
Memory address File offset Name
0x08049034 0x00001034 __gmpz_mul_ui
0x08049044 0x00001044 __cxa_atexit
0x08049054 0x00001054 memcmp
0x08049064 0x00001064 pcap_close
0x08049074 0x00001074 __gmpz_set_ui
...
$ rabin -ve a.out
[Entrypoint]
Memory address: 0x080494d0
$ rabin -vs a.out
[Symbols]
Memory address File offset Name
0x0804d86c 0x0000586c __CTOR_LIST__
0x0804d874 0x00005874 __DTOR_LIST__
0x0804d87c 0x0000587c __JCR_LIST__
0x08049500 0x00001500 __do_global_dtors_aux
0x0804dae8 0x00005ae8 completed.5703
0x0804daec 0x00005aec dtor_idx.5705
0x08049560 0x00001560 frame_dummy
...
$ rabin -h
rabin [options] [bin-file]
-e shows entrypoints one per line
-i imports (symbols imported from libraries)
-s symbols (exports)
-c header checksum
-S show sections
-l linked libraries
-H header information
-L [lib] dlopen library and show address
-z search for strings in elf non-executable sections
-x show xrefs of symbols (-s/-i/-z required)
-I show binary info
-r output in radare commands
-o [str] operation action (str=help for help)
-v be verbose
See manpage for more information.
The same flags can be used for any type of binary for any architecture.
Using the -r flag the output will be formatted in radare commands, this way
you can import the binary information. This command imports the symbols,
sections, imports and binary information.
[0x00000000]> .!!rabin -rsSziI $FILE
This is done automatically when loading a file if you have file.id=true, if
you want to disable the analysis run 'radare' with '-n' (because this way
it will not interpret the ~/.radarerc).
Programs with symbols or debug information are mostly analyzed by the code
analysis of 'radare' thanks to the hints given by 'rabin'. So,
theoretically we will be able to automatically resolve the crossed
references to functions. But if we are not that lucky we can just setup a
simple disassembly output format (e asm.profile=simple) and disassemble the
whole program code grepping for 'call 0x???????' to resolve calls to our
desired owned symbol.
[0x08049A00]> e scr.color=false
[0x08049A00]> e asm.profile=simple
[0x08049A00]> s section._text
[0x08049A00]> b section._text_end-section._text
[0x08049A00]> pd~call 0x80499e4
0x08049fd9 call 0x80499e4 ; imp.exit
0x0804fa89 call 0x80499e4 ; imp.exit
0x0805047b call 0x80499e4 ; imp.exit
[0x08049A00]>
If we are not lucky and do not have symbols we will have to identify which
parts of the binary are part of our target binary.
$ rsc syms-dump /lib/libc.so.6 24 > symbols
--[ 3.8 - Checksumming
The most common procedure to check for undesired runtime or static program
patches is done by using checksumming operations over sections, functions,
code or data blocks.
These kind of protections are usually easy to patch, but allow the end user
to get a grateful error message instead of a raw Segmentation Fault.
The error message string can be a simple entrypoint for the reverser to
catch the position of the checksumming code.
Radare offers a hashing command under the "h" char and allows you to
calculate block-based checksums while debugging or statically by opening
the file from disk.
Heres an usage example:
hmd5
It is also possible to use these hashing instructions to find pieces of the
program matching a certain checksum value, or just recalculate the new sum
to not patch the hash check.
The 'h' command can be also performed from the shell by using the "rahash"
program which permits to calculate multiple checksums for a single file
based on blocks of bytes. This is commonly used on forensics, when you want
a checksum to not only give you a boolean reply (if the program has been
modified), because this way you can reduce the problem on a smaller piece
of the binary because you have multiple checksums at different offsets for
the same file.
This action can be done over hard disks, device firmwares or memory dumps
of the same size.
If you are dealing similar file with deltified contents or you require a
more fine grained result you should try "radiff" which implements multiple
bindiffing algorithms that goes from byte level detecting deltas to code
level diffing using information of basic blocks, variables accessed and
functions called.
We will also have to decide a protection method for the checksumming
algorithm, because this will be the first patch required to enable software
breakpoints and user defined patches to be done.
To protect the hasher we just put it distributed along the program by
mixing its code with the rest of the program making more difficult to
identify the its position. We can also choose to put this code into a
ciphered area in .data together with other necessary code to make the
program run.
One of the main headaches for reversers is the layering of protections that
can be represented by lot of entrypoints to the same code making more
difficult to locate and patch in a proper way.
The checksumming of a section can be used as a deciphering key for other
ciphered sections, to avoid easy patching, the program should check for the
consistence of the deciphered code. This can be done by checking for
another checksum or by comparing few bytes at the beginning of the
deciphered and mmap code.
There is another possible design to implement the checksumming algorithm in
a more flexible way which consists on checksumming the code in runtime
instead of hardcoding the resulting value somewhere.
This method allows us to 'protect' multiple functions without having to
statically calculate the checksum value of the region. The problem with
this method is that it doesn't provide us protection against static
modifications on the binary.
This snippet can be compiled with DEBUG=0 to get a fully working
checksumming function which returns 0 when the checksum matches.
---------------------8<--------------------------
#define CKSUM 0x68
#define FROM 0x8048000
#define SIZE 1024
str:
.asciz "Checksum: 0x%08x\n"
.global cksum
cksum:
pusha
movl $FROM, %esi
movl $SIZE, %ecx
xor %eax, %eax
1:
xorb 0(%esi), %al
add $1, %esi
loopnz 1b
#if DEBUG
pushl %eax
pushl $str
call printf
addl $8, %esp
#else
sub $CKSUM, %eax
jnz 2f
#endif
popa
ret
#if DEBUG
.global main
main:
call cksum;
ret
#else
2:
mov $20, %eax
int $0x80
mov $37, %ebx
mov $9, %ecx
xchg %eax, %ebx
int $0x80
#endif
---------------------8<--------------------------
To get the bytes of the function to be pushed we can use an 'rsc' script that
dumps the hexpairs contained managed by a symbol in a binary:
$ gcc cksum.S -DDEBUG
$ ./a.out
Checksum: 0x69
$ gcc -c cksum.S
$ rsc syms-dump cksum.o | grep cksum
cksum: 60 be 00 80 04 08 b9 00 04 00 00 31 c0 32 06 83 c6 01 e0 f9 2d d2 04 \
00 00 75 02 61 c3 b8 14 00 00 00 cd 80 bb 25 00 00 00 b9 09 00 00 00 93 cd 80
The offsets to patch with our ranged values are:
+2: original address (4 bytes in little endian)
+7: size of buffer (4 bytes in little endian)
+21: checksum byte (1 byte)
We can use this hexdump from a radare macro that automatically patches the
correct bytes while inserting the code in the target program.
$ rsc syms-dump cksum.o | grep cksum | cut -d : -f 2 > cksum.hex
$ cat cksum.macro
(cksum-init ckaddr size,f ckaddr @ $0,f cksize @ $1,)
(cksum hexfile size
# kidnap some 4 opcodes from the function
f kidnap @ `aos 4`
yt kidnap ckaddr+cksize
wa jmp $$+kidnap @ ckaddr+cksize+kidnap
wa call `?v ckaddr`
wF $0 @ ckaddr
f cksize @ $$w
f ckaddr_new @ ckaddr+cksize+50
wv $$ @ ckaddr+2
wv $1 @ ckaddr+7
wx `hxor $1` @ ckaddr+21
f ckaddr @ ckaddr_new
)
$ cat cksum.hooks
0x8048300
0x8048400
0x8048800
$ radare -w target
[0x8048923]> . cksum.macro
[0x8048923]> .(cksum-init 0x8050300 30)
[0x8048923]> .(cksum cksum.hex 100) @@. cksum.hooks
--[ 4 - Playing with code references
The relationship between parts of code in the program are a good source of
information for understanding the dependencies between code blocks,
functions.
Following sub-chapters will explain how to generate, retrieve and use the
code references for retrieving information from a program in memory or
statically on-disk.
--[ 4.1 - Finding xrefs
One of the most interesting information we need when understanding how a
program works or what it does is the flow graph between functions. This
give us the big picture of the program and allows us to find faster the
points of interest.
There are different ways to obtain this information in radare, but you will
finally manage it with the 'Cx' and 'CX' commands (x=code xref, X=data
xref). This means that you can even write your own program to extract this
information and import it into radare by just interpreting the output of
the program or a file.
[0x8048000]> !!my-xrefs-analysis a.out > a.out.xrefs
[0x8048000]> . a.out.xrefs
or
[0x8048000]> .!my-xrefs-analysis a.out
At the time of writing this, radare implements a basic code analysis
algorithm that identifies function boundaries, splits basic blocks, finds
local variables and arguments, tracks their access and also identifies code
and data references of individual instructions.
When 'file.analyze' is 'true', this code analysis is done recursively over
the entrypoint and all the symbols identified by rabin. This means that it
will not be able to automatically resolve code references following pointer
tables, register branches and so on and lot of code will remain unanalyzed.
We can manually force to analyze a function at a certain offset by using
the '.afr@addr' command which will interpret ('.') the output of the
command 'afr' that stands for 'analyze function recursively' at (@) a
certain address (addr).
When a 'call' instruction is identified, the code analysis will register a
code xref from the instruction to the endpoint. This is because compilers
uses these instructions to call into subroutines or symbols.
As we did previously, we can disassemble the whole text section looking for
a 'call' instruction and register code xrefs between the source and
destination address with these two lines:
[0x000074E0]> e asm.profile=simple
[0x000074E0]> "(xref,Cx `pd 1~[2]`)"
[0x000074E0]> .(xref) @@=`pd 100~call[0]`
We use the simple asm.profile to make the disassembly text parsing easier.
The first line will register a macro named 'xref' that creates a code xref
at address pointed by the 3rd word of the disassembled instruction.
In the second line it will run the previous macro at every offset specified
by the output of the "pd 100~call[0]" instruction. This is the offset of
every call instruction of the following 100 instructions.
When disassembling code (using the 'pd' command (print disasm)) xrefs will
be displayed if asm.xrefs is enabled. Optionally, we can set asm.xrefsto to
visualize the xref information at both end-points of the code reference.
--[ 4.2 - Blind code references
The xrefs(1) program that comes with radare implements a blind code
reference search algorithm based on identifying sequences of bytes matching
a certain pattern.
The pattern is calculated depending on the offset. This way the program
calculates the relative delta between the current offset and the
destination address and tries to find relative branches from one point to
another of the program.
The current implementation supports x86, arm and powerpc, but allows to
define templates of branch instructions for other architectures by using
the option flags to determine the instruction size, endianness, the way to
calculate the delta, etc.
The main purpose of this program was to identify code xrefs on big files
(like firmwares), where the code analysis takes too long or requires many
manual interaction, it is not designed to identify absolute calls or jumps.
Here's an usage example:
$ xrefs -a arm bigarmblob.bin 0x6a390
0x80490
0x812c4
--[ 4.3 - Graphing xrefs
Radare ships some commands to manage and create interactive graphs to
display code analysis based on basic blocks, function blocks or whatever we
want by using the 'gu' command (graph user)
The most simple way to use the graphing capabilities is to use the 'ag'
command that will directly popup a gtk window with a graphed code analysis
from the current seek, splitting the code by basic blocks. (use the eval
graph.jmpblocks and graph.callblocks to split blocks by jumps or calls to
analyze basic blocks or function blocks).
The graph is stored internally, but if we want to create our own graphs
we can use the 'gu' command which provides subcommands to reset a graph,
create nodes, edges and display them.
[0x08049A00]> gu?
Usage: gu[dnerv] [args]
gur user graph reset
gun $$ $$b pd add node
gue $$F $$t add edge
gud display graph in dot format
guv visualize user defined graph
This is a radare script that will generate a graph between analyzed
function by using the xrefs metadata information.
After configuring the graph we can display it using the internal viewer
(guv) or using graphviz's dot.
--------8<---------------
"(xrefs-to from,f XT @ `Cx~$0[3]`,)"
"(graph-xrefs-init,gur,)"
"(graph-xrefs,.(xrefs-to `?v $$`),gue $$F XT,)"
"(graph-viz,gud > dot,!!dot -Tpng -o graph.png dot,!!gqview graph.png,)"
.(graph-xrefs-init)
.(graph-xrefs) @@= `Cx~[1]`
.(graph-viz)
--------8<---------------
Here's the explanation of the previous script:
Note that all commands are quoted '"' because this way the inner special
characters are ignored ('>', '~' ...)
"(xrefs-to from,f XT @ `Cx~$0[3]`,)"
Register a macro named 'xrefs-to' that accepts one argument named 'from'
and aliased as $0 inside the macro body.
The macro executes the 'f XT' (create a flag named XT) at ('@') the
address where the code xref (Cx) points (3rd column of the row
matching $0 (the first argument of the macro (from)).
"(graph-xrefs-init,gur,)"
The 'graph-xrefs-init' macro resets the graph user metadata.
"(graph-xrefs,.(xrefs-to `?v $$`),gue $$F XT,)"
This macro named 'graphs-xrefs' that run commands separated by commas:
.(xrefs-to `?v $$`)
execute the macro 'xrefs-to' at current seek ($$), the value is
concatenated to the output of executing '?v $$' which evaluates
the '$$' special variable and prints the value in hexadecimal.
gue XF XT
Adds an 'graph-user' edge between the beginning of the function at
current seek and XT.
"(graph-viz,gud > dot,!!dot -Tpng -o graph.png dot,!!rsc view graph.png,)"
These commands will generate a dot file, render it in png and display it.
.(graph-xrefs-init)
.(graph-xrefs) @@= `Cx~[1]`
.(graph-viz)
Then, we bundle all those macros and the xrefs graph will be displayed
--[ 4.4 - Randomizing xrefs
Another annoying low level manipulation we can do is to repeat functions at
random places with random modifications to make the program code more
verbose and hard to analyze because of complex control flow paths.
By doing a cross-references (xrefs) analysis between the symbols we will be
able to determine which symbols are referenced more from than one single
point.
The internal grep syntax can be tricky to retrieve lists of offsets for
xrefs at a given position.
[0x0000c830]> Cx
Cx 0x00000c59 @ 0x00000790
Cx 0x00000c44 @ 0x000007b0
Cx 0x00000c35 @ 0x00000810
Cx 0x00000c26 @ 0x000008a0
...
These numbers specify the caller address and the called one. That is:
0xc59 -(calls)-> 0x790
To disassemble the opcodes referencing code to other positions we can use the
'pd' command with a foreach statement:
[0x0000c830]> pd 1 @@= `C*~Cx[1]`
This command will run 'pd 1' on every address specified in the 1st column
of the output of the 'C*~Cx[1]'
--[ 5 - Conclussion
This article has covered many aspects of reverse engineering, low level
program manipulation, binary patching/diffing and more, despite being a
concrete task it has tried to expose a variety of scenarios where a command
line hexadecimal editor becomes an essential tool.
Several external scripting languages can be used, don't care about the
cryptic scripting of the radare commands, there are APIs for perl, ruby and
python for controlling radare and analyze code or connect to remote
radares.
Being used in batch mode enables the ability to automatize many analysis
tasks, generate graphs of code, determine if it is packed or infected, find
vulnerabilities, etc.
Maybe the more important characteristic of this software is the abstraction
of the IO which allows you to open hard disk images, serial connections,
process memory, haret wince devices, gdb-enabled applications like bochs,
vmware, qemu, shared memory areas, winedbg and more.
There are lot of things we have not covered in this article but the main
concepts have been exposed while explaining multiple protection techniques
for binaries.
Using scriptable tools for implementing binary protection techniques eases
the development and permits to recycle and accelerate the analysis and
manipulation of files.
--[ 6 - Future work
The project was born in 2006, and it has grown pretty fast.
At the moment of writing this paper the development is done on two
different branches: radare and radare2.
The second one is a complete refactoring of the original code aiming to
reimplement all the features of the radare framework into multiple
libraries.
This design permits to bypass some limitations of the design of r1 enabling
full scripting support for specific or composed set of libraries. Each
module extends its functionalities into plugins.
While using parrot as a virtual machine, multiple scripting languages can
be used together. Being scripting language agnostic enables the possibility
to mix multiple frameworks together (like metasploit [5], ERESI [6], etc)
The modular design based on libraries, plugins and scripts will open new
ways to extend the framework by third party developers and additionally
make the code much more maintainable and bugfree.
New applications will appear on top of the r2 set of libraries, there are
some projects aiming to integrate it from web frontends and graphical
frontends.
Plugins extend the modules adding support for new architectures, binary
formats, code analysis modules, debugging backends, remote protocols,
scripting facilities and more.
At the moment of writing this, radare2 (aka libr) is actually under
development but implements some test programs that try to reimplement the
applications distributed with radare.
--[ 7 - References
[0] radare homepage
http://www.radare.org
[1] radare book (pdf)
http://www.radare.org/get/radare.pdf
[2] x86-32/64 architecture
http://www.intel.com/Assets/PDF/manual/253666.pdf
[3] ELF file format
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
[4] AT&T syntax
http://sig9.com/articles/att-syntax
[5] Metasploit, ruby based exploit development framework
http://www.metasploit.com/
[6] ERESI
http://www.eresi-project.org/
--[ 8 - Greetings
Thanks to JV for his support during the write-up of this article, and to
the Phrack staff to make the publication possible.
I would also like to greet some people for their support while the
development of radare and this article:
SexyPandas .. for the great quals, ctfs and moments
nopcode .. for those great cons, feedback and beers
nibble .. for those pingpong development sessions and support
killabyte .. for the funny abstract situations we faced everyday
esteve .. for the talks, knowledge and the arm/wince reversing tips
wzzx .. for the motivators, feedback and luls
lia .. my gf which supported me all those nightly hacking hours :)
..
Certainly...
Thanks to all the people who managed to read the whole text and also to the
rest for being part of that ecosystem named 'underground' and computer
security which bring us lot of funny hours.
There is still lot of work to be done, and ideas or contributions are
always welcome. Feel free to send me comments.
--pancake
--------[ EOF