2. The Linux Compilation Process

2.1. Intro

Compilation in general is split into roughly 5 stages: Preprocessing, Parsing, Translation, Assembling, and Linking. All 5 stages are implemented by one program in UNIX, namely cc, or in our case, gcc. The general order of things goes gcc -> gcc -E -> gcc -S -> as -> ld.

2.2. gcc

gcc is the C compiler of choice for most UNIX. The program gcc itself is actually just a front end that executes various other programs corresponding to each stage in the compilation process. To get it to print out the commands it executes at each step, use gcc -v.

2.3. gcc -E (Preprocessor Stage)

gcc -E runs only the preprocessor stage. This places all include files into your .c file, and also translates all macros into inline C code.

2.4. gcc -S (Parsing+Translation Stages)

gcc -S will take .c files as input and output .s assembly files in AT&T syntax.

gcc can be called with various optimization options that can do interesting things to the outputted assembly code. There are between 4 and 7 general optimization classes that can be specified with a -ON, where 0 <= N <= 6. 0 is no optimization (default), and 6 is maximum.

There are also several fine-grained assembly options that are specified with the -f flag. The most interesting are -funroll-loops, -finline-functions, and -fomit-frame-pointer. Loop unrolling means to expand a loop out so that there are n copies of the code for n iterations of the loop (ie no jmp statements to the top of the loop). On modern processors, this optimization is negligible. Inlining functions means to effectively convert all functions in a file to macros, and place copies of their code directly in line in the calling function (like the C++ inline keyword). This only applies for functions called in the same C file as their definition. It is also a relatively small optimization. Omitting the frame pointer (aka the base pointer) frees up an extra register for use in your program. If you have more than 4 heavily used local variables, this may be rather large advantage, otherwise it is just a nuisance (and makes debugging much more difficult).

Since some of these get turned on by default in the higher optimization classes, it is useful to know that despite the fact that the manual page does not mention it explicitly, all of the -f options have -fno equivalents. So -fnoinline-functions prevents function inlining, regardless of the -O option. (I think it happens at -O3 by default).

2.5. as (Assembly Stage)

as is the GNU assembler. It takes input as AT&T syntax asm files and generates a .o object file.

2.6. ld/collect2 (Linking Stage)

ld is the GNU linker. It will generate a valid executable file. If you link against shared libraries, you will want to actually use what gcc calls, which is collect2. Watch gcc -v for flags