There are currently no projects
This tab is intentionally left blank.
Angelina Jolie doing ‘sexy movie hacking’ in ‘Hackers’
UPDATE: Since my initial post I’ve updated the code a couple of times to better support 64bit linux binaries and also objdumps of windows binaries. Let me know if you run into som sort of disassembly output from objdump that returns the input unaltered and I’ll see if I can make it work.
UPDATE 2: I’ve made a variant of this script for disassemblies from llvm-objdump. Read all about it.
UPDATE 3: Fixed some missing instructions and instruction synonyms. I’ve also added descriptions of the given jump instruction at the end of the line realising that trying to keep line lengths within 80 columns is a lost cause with this script ;)
I’m currently following a course on “Proactive Computer Security”, which, if you accept the fact that computers are the closest equivalent we’ve got to magic, equates to a “Defence Against The Dark Arts” class, and based on the same credo that one should familiarize oneself with how to attack in order to better defend.
Hacking of computer programs typically involves feeding that program an input that it wasn’t designed to handle properly, resulting in the attacker getting control or gaining information that she wasn’t supposed to.
So in this particular course we’ve been doing a lot of reverse engineering on disassembled programs in order to find weaknesses in their input handling.
A function handling an input of unknown length n bytes will typically have a loop that keeps running until a certain condition is met. Like in the following example where the instruction at 0x08048725 keeps jumping back to the instruction at 0x080486d9 if the latest character read from the input is not a newline. Notice also the jump at 0x08048711 that will jump past the test for newline if the end of the input has been reached.
080486cb
80486cb: 55 push ebp
80486cc: 89 e5 mov ebp,esp
80486ce: 53 push ebx
80486cf: 83 ec 10 sub esp,0x10
80486d2: c7 45 f8 00 00 00 00 mov [ebp-0x8],0x0
,---80486d9: eb 29 jmp 8048704
|,->80486db: 8b 45 f8 mov eax,[ebp-0x8]
|| 80486de: 8b 55 08 mov edx,[ebp+0x8]
|| 80486e1: 8d 0c 02 lea ecx,[edx+eax*1]
|| 80486e4: 8b 1d 4c a0 04 08 mov ebx,ds:0x804a04c
|| 80486ea: a1 54 a0 04 08 mov eax,ds:0x804a054
|| 80486ef: 89 c2 mov edx,eax
|| 80486f1: 01 da add edx,ebx
|| 80486f3: 0f b6 12 movzx edx,BYTE PTR [edx]
|| 80486f6: 88 11 mov BYTE PTR [ecx],dl
|| 80486f8: 83 45 f8 01 add [ebp-0x8],0x1
|| 80486fc: 83 c0 01 add eax,0x1
|| 80486ff: a3 54 a0 04 08 mov ds:0x804a054,eax
'|->8048704: 8b 15 54 a0 04 08 mov edx,ds:0x804a054
| 804870a: a1 50 a0 04 08 mov eax,ds:0x804a050
| 804870f: 39 c2 cmp edx,eax
,-|--8048711: 7d 14 jge 8048727
| | 8048713: 8b 15 4c a0 04 08 mov edx,ds:0x804a04c
| | 8048719: a1 54 a0 04 08 mov eax,ds:0x804a054
| | 804871e: 01 d0 add eax,edx
| | 8048720: 0f b6 00 movzx eax,BYTE PTR [eax]
| | 8048723: 3c 0a cmp al,0xa
| '--8048725: 75 b4 jne 80486db
'--->8048727: a1 54 a0 04 08 mov eax,ds:0x804a054
804872c: 83 c0 01 add eax,0x1
804872f: a3 54 a0 04 08 mov ds:0x804a054,eax
8048734: 8b 45 f8 mov eax,[ebp-0x8]
8048737: 83 c4 10 add esp,0x10
804873a: 5b pop ebx
804873b: 5d pop ebp
804873c: c3 ret
Now imagine following the above control flow without the arrows, which objdump, my disassembler of choice does, not provide for some reason.
Ida Pro Control Flow View
Having these arrows in the assembly makes it both easier to identify sections of the code with a lot of control flow and perhaps more importantly faster to deduce the logic in said section.
Both the free reverse engineering framework radare and the rather costly IDA Pro suite does this already with a more graph like layout, but I believe my approach has qualities that they lack, in being so unobtrusive, simple and without other requirements than a python installation and the objdump program.
My arrow-annotating python code follows below, but people who expect to run it should probably download the file directly as the syntax highlighter tends to mangle the code enough to confuse the python intepreter.
#!/usr/bin/env python
# encoding: utf-8
"""
asm_jmps.py
Created by Daniel Fairchild on 2013-06-10.
Usage (in shell):
objdump -M intel -d target.bin | ./asm_jmps.py
License:
I'd appreciate a comment at: http://blog.fairchild.dk/?p=633
if you find the following usefull. That'll be all.
"""
import sys
import re
JMPS = { #define jumps, synomyms on same line
'ja':'if above', 'jnbe':'if not below or equal',
'jae':'if above or equal','jnb':'if not below','jnc':'if not carry',
'jb':'if below', 'jnae':'if not above or equal', 'jc':'if carry',
'jbe':'if below or equal', 'jna':'if not above',
'jcxz':'if cx register is 0', 'jecxz':'if cx register is 0',
'je':'if equal', 'jz':'if zero',
'jg':'if greater', 'jnle':'if not less or equal',
'jge':'if greater or equal',
'jl':'if less', 'jnge':'if not greater or equal',
'jle':'if less or equal', 'jnl':'if not less',
'jmp':'unconditional',
'jne':'if not equal', 'jnz':'if not zero',
'jng':'if not greater',
'jno':'if not overflow',
'jnp':'if not parity', 'jpo':'if parity odd',
'jns':'if not sign',
'jo':'if overflow',
'jp':'if parity', 'jpe':'if parity even',
'js':'if sign'}
fcl = re.compile(" +([\da-f]+)\:")
fjre=re.compile("".join([
" +([\da-f]+)\:\\t.*(",
"".join(map(lambda x: x+"|", JMPS))[:-1],
")\s+\*?0?x?([\da-f]+)"]))
def j_line(ln, ljmps):
jl = len(ljmps)
outl = [" "]*(jl+2)
jdesc=""
for i in range(jl):
if ljmps[i][0] == ln: #jmp from
outl[-(jl-i+2):] = ["-"]*(jl-i+2)
outl[i] = "," if ljmps[i][0] < ljmps[i][1] else "\'"
jdesc = "; jump %s" % ljmps[i][2]
if ljmps[i][1] == ln: #jmp to
outl[-(jl-i+1):] = ["-"]*(jl-i+1)
outl[-1] = ">"
outl[i] = "," if ljmps[i][0] > ljmps[i][1] else "\'"
if ljmps[i][0] < ln and ljmps[i][1] > ln:
outl[i] = "|"
elif ljmps[i][0] > ln and ljmps[i][1] < ln:
outl[i] = "|"
return ("".join(outl),jdesc)
def drw_jmps(all_lines, fun_lines):
ljmps = []
for cl in fun_lines:
m = fjre.match(all_lines[fun_lines[cl]])
if m != None:
if fun_lines.has_key(m.group(3)):
ljmps.append((fun_lines[cl], fun_lines[m.group(3)],JMPS[m.group(2)]))
#the following sorting bands same endpoints together
ljmps = sorted(ljmps, key=lambda x: -x[1])
for cl in sorted(fun_lines, key=lambda x: int(x,16)):
jlr = j_line(fun_lines[cl],ljmps)
all_lines[fun_lines[cl]]="".join([
jlr[0],
all_lines[fun_lines[cl]][:-1].lstrip(),jlr[1],
"\n"])
if __name__ == "__main__":
#read lines from stdin
nasml = sys.stdin.readlines()
#make a dictionary of asm lines
asm_lines = {}
for i in range(len(nasml)):
m = fcl.match(nasml[i])
if m != None:
asm_lines[m.group(1)] = i
if nasml[i] == "\n" or "\tret " in nasml[i]:
drw_jmps(nasml, asm_lines)
asm_lines = {}
fun_decl_lines = {}
print "".join(nasml)