5. Determining Interesting Functions

Clearly without source code, we can't possibly hope to understand all of sections of an entire program. So we have to use various methods and guess work to narrow down our search to a couple of key functions.

5.1. Reconstructing function & control information

The problem is that first, we must determine what portions of the code are actually functions. This can be difficult without debugging dymbols. Fortunately, there are a couple of utilities that make our lives easier.

5.1.2. disasm.pl

Steve Barker wrote a neat little perl script that makes objdump much more legible in the event that symbols are not included. The script has since been extended and improved by myself and Nasko Oskov. It now makes 3 passes through the output. The first pass builds a symbol table of called and jumped-to locations. The second pass finds areas between two rets, and inserts them into the symbol table as "unused" functions. The third pass prints out the nicely labeled output, and prints out a function call tree. Usage:

./disasm /path/to/binary > binary.asminfo

There are/will be few command line options to the utility. Now --graph is supported. It will generate a file called call_graph that contains defitinition that can be used with a program called dot to generate visual representation of the call graph.

Note: Unused functions just mean that that function wasn't called DIRECTLY. It is still possible that a function was called through a function pointer (ie, main is called this way)

5.2. Consider the objective

Ok, so now we're getting ready to get really down and dirty. The first step to finding what you are looking for is to know what you are looking for. Which functions are 'interesting' is entirely dependent on your point of view. Are you looking for copy protection? How do you suspect it is done. When in the program execution does it show up? Are you looking to do a security audit of the program? Is there any sloppy string usage? Which functions use strcmp, sprintf, etc? Which use malloc? Is there a possibility of improper memory allocation?

5.3. Finding key functions

If we can narrow down our search to just a few functions that are relevant to our objective, our lives should be much easier.

5.3.1. Finding main()

Regardless of our objective, it is almost always helpful to know where main() lies. Unforuntely, when debugging symbols are removed, this is not always easy.

In Linux, program execution actually begins at the location defined by the _start symbol, which is provided by gcc in the crt0 libraries (check gcc -v for location). Execution then continues to __libc_start_main(), which calls _init() for each library in the program space. Each _init() then calls any global constructors you may have in that particular library. Global constructors can be created by making global instances of C++ classes with a constructor, or by specifying __attribute__((constructor)) after a function prototype. After this, execution is finally transferred to main.

The easiest technique is to try to use our friends ltrace and gdb together with our disassembled output. Checking the return address of the first few functions of ltrace -i, and cross refrencing that to our assembly output and function call tree should give us a pretty good idea where main is. We may have to try to trick the program into exiting early, or printout out an error message before it gets too deep into its call stack.

Other techniques exist. For example, we can LD_PRELOAD a .c file with a constructor function in it. We can then set a breakpoint to a libc function that it calls that is also in the main executable, and finish and stepi until we are satisfied that we have found main.

Even better, we could just set a breakpoint in the function __libc_start_main (which is a libc function, and thus we will always have a symbol for it), and do the same technique of finishing and stepiing until we reach what looks like main to us.

At worst, even without a frame pointer, we should be able to get the address of a function early enough in the execution chain for us to consider it to be main.

5.4. Plotting out program flow

Plot out execution paths into a tree from main, especially to your function(s) of interest. You can use disasm.pl to generate call graphs with the --graph option. Using it enables the script to generate file called call_graph. It contains definition of the call graph in a format used by a popular graphing tool called dot. Feeding this definition file in dot will give you a nice (probably pretty huge) graphics file with visual representation of the call graph. It is pretty amazing. Definitely try it with some small program.

Further analysis will have to hold off until we understand some assembly.