Clearly without source code, we can't possibly hope to understand all of sections of an entire program. So we have to use various methods and guess work to narrow down our search to a couple of key functions.
The problem is that first, we must determine what portions of the code are actually functions. This can be difficult without debugging dymbols. Fortunately, there are a couple of utilities that make our lives easier.
Objdump's most useful purpose is to disassemble a program with the -d switch. Lacking symbols, this output is a bit more cryptic. The -j option is used to specify a segment to disassemble. Most likely we will want .text, which is where all the program code lies.
Note that the leftmost column of objdump contains a hex number. This is in fact the actual address in memory where that instruction is located. Its binary value is given in the next column, followed by its mnemonic.
objdump -T will give us a listing of all library functions this program calls.
Steve Barker wrote a neat little perl script that makes objdump much more legible in the event that symbols are not included. The script has since been extended and improved by myself and Nasko Oskov. It now makes 3 passes through the output. The first pass builds a symbol table of called and jumped-to locations. The second pass finds areas between two rets, and inserts them into the symbol table as "unused" functions. The third pass prints out the nicely labeled output, and prints out a function call tree. Usage:
./disasm /path/to/binary > binary.asminfo
There are/will be few command line options to the utility. Now --graph is supported. It will generate a file called call_graph that contains defitinition that can be used with a program called dot to generate visual representation of the call graph.
Note: Unused functions just mean that that function wasn't called DIRECTLY. It is still possible that a function was called through a function pointer (ie, main is called this way)
Ok, so now we're getting ready to get really down and dirty. The first step to finding what you are looking for is to know what you are looking for. Which functions are 'interesting' is entirely dependent on your point of view. Are you looking for copy protection? How do you suspect it is done. When in the program execution does it show up? Are you looking to do a security audit of the program? Is there any sloppy string usage? Which functions use strcmp, sprintf, etc? Which use malloc? Is there a possibility of improper memory allocation?
If we can narrow down our search to just a few functions that are relevant to our objective, our lives should be much easier.
Regardless of our objective, it is almost always helpful to know where main() lies. Unforuntely, when debugging symbols are removed, this is not always easy.
In Linux, program execution actually begins at the location defined by the _start symbol, which is provided by gcc in the crt0 libraries (check gcc -v for location). Execution then continues to __libc_start_main(), which calls _init() for each library in the program space. Each _init() then calls any global constructors you may have in that particular library. Global constructors can be created by making global instances of C++ classes with a constructor, or by specifying __attribute__((constructor)) after a function prototype. After this, execution is finally transferred to main.
The easiest technique is to try to use our friends ltrace and gdb together with our disassembled output. Checking the return address of the first few functions of ltrace -i, and cross refrencing that to our assembly output and function call tree should give us a pretty good idea where main is. We may have to try to trick the program into exiting early, or printout out an error message before it gets too deep into its call stack.
Other techniques exist. For example, we can LD_PRELOAD a .c file with a constructor function in it. We can then set a breakpoint to a libc function that it calls that is also in the main executable, and finish and stepi until we are satisfied that we have found main.
Even better, we could just set a breakpoint in the function __libc_start_main (which is a libc function, and thus we will always have a symbol for it), and do the same technique of finishing and stepiing until we reach what looks like main to us.
At worst, even without a frame pointer, we should be able to get the address of a function early enough in the execution chain for us to consider it to be main.
Its probably a good idea to make a list of all functions that call exit. These may be of use to us. Other techniques for tracking down interesting functions include:
Checking for which functions call obscure gui construction widgets used in a dialog box asking for a product serial number
Checking the string references to find out which functions reference strings that we are interested in. For example, if a program outputs the text "Already registered." knowing what function outputs this string is helpful in figuring out the protection this particular program uses.
Running a program in gdb, then hitting control C when it begins to perform some interesting operation. using stepi N should slow things down and allow you to be more accurate. Sometimes this is too slow however. Find a commonly called function, set a breakpoint, and try doing cont N.
Checking which functions call functions in the BSD socket layer
Plot out execution paths into a tree from main, especially to your function(s) of interest. You can use disasm.pl to generate call graphs with the --graph option. Using it enables the script to generate file called call_graph. It contains definition of the call graph in a format used by a popular graphing tool called dot. Feeding this definition file in dot will give you a nice (probably pretty huge) graphics file with visual representation of the call graph. It is pretty amazing. Definitely try it with some small program.
Further analysis will have to hold off until we understand some assembly.