IDA Tricks - Dealing with inlined data
published 2018-06-04
Intro
When analyzing position-independent code (i.e. shellcode or malicious code snippets), you'll frequently see something like the following:
The second call
is actually not a subroutine call but a disguised push
for the following data.
We can manually fix this by undefining the sub (otherwise IDA's auto-analysis will override our judgement), making it code again, jumping to the location after the call, turning it into string, fixing the following code:
There you can see it's just a call-pop
pair, effectively loading the string's address into [ebp-20h]
.
Now while this is better (although tedious to fix), switching to graph view will fail. Or rather turning the whole code back into a function will fail because of undefined instructions, and thus we also won't get our graph view back.
So not exactly a solution.
Modify the code
We know this is a push
in disguise. We can also freely modify the binary in IDA. So why not just make it a proper push!
To do that, we will perform the following steps:
- Add a segment with the same size as the current one
- Copy the inlined data to the new segment to the same offset
- Turn the
call
into apush
followed by a jump to the call's original location - Some cosmetic stuff
All these can be done in a small script.
The reason for creating a same-sized segment is so we do not have to keep book. We just copy the inlined data to the same offset in the new storage segment instead. Not a nice solution for big targets but it works just fine for anything else, and not keeping state makes the script easy to use.
The script
The script assumes two things:
- Your mouse is on the
call
instructions so you can quickly use the script bound to a hotkey - We have manually created the
storage
segment. Could be done in the script of course.
The proof-of-concept script then is:
def get_storage_segment():
seg = get_first_seg()
while seg != BADADDR and get_segm_name(seg) != "storage":
seg = get_next_seg(seg)
if seg != BADADDR:
return seg
else:
return None
def fix_call():
ea = get_screen_ea()
if print_insn_mnem(ea) != 'call':
print "Not a call instruction!"
return
# address of the trailing 'pop' instruction
call_target = get_operand_value(ea,0)
data_start = next_head(ea)
data_len = call_target-data_start
storage_addr = get_storage_segment()
if not storage_addr:
print "Error: Segment 'storage' not found"
return
# get offset in this segment
offset = data_start - get_segm_attr(data_start,SEGATTR_START)
copy_dest = storage_addr + offset
for i in range(data_len):
PatchByte(copy_dest+i,Byte(data_start+i))
ida_idp.assemble(ea,0,ea,True,"push 0%08xh" % copy_dest)
ea += get_item_size(ea)
ida_idp.assemble(ea,0,ea,True,"jmp 0%08xh" % call_target)
ea += get_item_size(ea)
# Undefine the inlined data to clean up the disassembly
del_items(ea,DELIT_SIMPLE,call_target-ea)
# Add a name to the copied data
MakeName(copy_dest,"inlined_%08x" % data_start)
idaapi.add_hotkey("2",fix_call)
It just does the above, calculate infos about the inlined data length and offset, addresses for the push
and jmp
instruction we are going to patch in, copies the data, patches the instructions and performs a bit of cleanup.
Note that this may have issues with segmentation, I think I had some odd configuration where some API call returned a full address (with respect to the segment address) and some did not but I couldn't figure out what the constellation was when writing this article.
Always, always backup your .idb
before using modifying scripts like these. Even if the code performs as it does, you will find edge cases where it fails and ruin your database.
If we run the above script on the example and do minimal manual intervention (tell IDA that the push
is using an offset, and that the bytes following the push
are also code) we get this:
And eventually, we can turn this into a subroutine and switch to graph view:
Much better!