3. The magic of the Elf

 

Any sufficiently advanced technology is indistinguishable from magic.

 Arthur C. Clarke

Building executables from C source code is a complex task. An innocent looking call of gcc(1) will invoke a pre-processor, a multi-pass compiler, an assembler and finally a linker. Using all these tools to plant virus code into another executable makes the result either prohibitively large, or very dependent on the completeness of the target installation.

Real viruses approach the problem from the other end. They are aggressively optimized for code size and do only what's absolutely necessary. Basically they just copy one chunk of code and patch a few addresses at hard coded offsets.

However, this has drastic effects:

There are ways to circumvent these limitations. But they are complicated and make the virus more likely to fail.

3.1. Executable and linkable format

Another natural limitation of viruses is rigid dependency on the file format of target executables. These formats differ a lot. Even on the same hardware architecture and under the same operating system. Furthermore executable are not designed with post link-time modifications in mind. It's rare for a virus to support more than one infection method. This document is about the format used on recent versions of Linux, FreeBSD and Solaris.

3.1.1. ELF specification

This format is well documented. Some public resources:

  • Source code of Linux and FreeBSD. Admittedly not for the faint of heart.

  • /usr/include/elf.h [1]

  • Portable Formats Specification, Version 1.1. [2]

  • Creating Really Teensy ELF Executables for Linux [3]

A quote from the Portable Formats Specification:

The Executable and Linking Format was originally developed and published by UNIX System Laboratories (USL) as part of the Application Binary Interface (ABI). The Tool Interface Standards committee (TIS) has selected the evolving ELF standard as a portable object file format that works on 32-bit Intel Architecture environments for a variety of operating systems.

3.1.2. Assembly language documentation

Actually ELF is defined for a variety of both 32 bit and 64 bit architectures. Obviously you need to handle assembly language for each platform. A good starting point is "Linux Assembly" [4] and "Assembly Language Related Web Sites". [5]

i386 specific sites:

  • Assembly-HOWTO. [6] Description of tools and sites for Linux.

  • FAQ of comp.lang.asm.x86 [7]

  • "Robin Miyagi's Linux Programming" [8] features a tutorial and interesting links.

  • "Assembly resources" [9] covers advanced topics.

  • IA-32 Intel Architecture Software Developer's Manual [10]

  • "The Place on the Net to Learn Assembly Language Programming" [11]

  • The Art of Assembly Language. 32-bit Linux Edition Featuring HLA. [12]

  • X86 Architecture, low-level programming, freeware [13]

  • Dr. Dobb's Microprocessor Resources [14]

sparc specific sites:

  • SPARC Assembly Language Reference Manual [15]

  • A Laboratory Manual for the SPARC [16]

3.2. Portability

This document tries to cover multiple platforms through conditional compilation. There is a configure.pl that determines the host type and sets up a Makefile. The Makefile uses individual sub-directories for each platform and exports the name of these directories (and some other platform specific values) as environment variables. Most of the shell scripts invoked by make(1) are shown here. The following table should help to understand them.

Table 1. Environment variables exported by Makefile

VariableValue on this platform
${ARCH}i386
${CFLAGS}-Wall -O2 -march=i586
${ELF_BASE}08048000
${ELF_ALIGN}1000
${OUT}out/redhat-linux-i386
${TMP}tmp/redhat-linux-i386

3.3. In the language of mortals

For the first example I'll present the simplest piece of code that still gives sufficient feedback. Our aim is to implant it into /bin/sh. On practically every recent installation of Linux/i386 the following code will emit three magic letters instead of just dumping core.

Source: out/redhat-linux-i386/magic_elf/magic_elf.c
#include <unistd.h>
int main() { write(1, (void*)0x08048001, 3); return 0; }

It is not an error that a file called magic_elf.c is located in a directory called out/redhat-linux-i386. The Makefile building this document did trivial pre-processing on the original source file. ELF is used on many architectures. And each has a different magic value.

Command: src/magic_elf/cc.sh
#!/bin/sh
gcc ${CFLAGS} ${OUT}/magic_elf/magic_elf.c \
	-o ${TMP}/magic_elf/magic_elf \
&& ${TMP}/magic_elf/magic_elf

Output: out/redhat-linux-i386/magic_elf/magic_elf
ELF

3.4. How it works

3.4.1. Digested answer

The three letters are part of the signature of ELF files. Executables created by ld(1) are always mapped into the same memory region. That's why the program can find its own header at a predictable virtual address.

3.4.2. Short answer

RTFM. [17] Just read all of ELF specification.

3.4.3. Sort of an answer

0x8048000 is not a natural constant, but happens to be the default base address of ELF executables produced by ld(1). As of version 2.11 of binutils it should be possible to change that with options -Ttext ORG and --section-start SECTIONNAME=ORG, but I didn't get it working. Anyway, the layout of executables produced by ld(1) is straight forward.

  1. One ELF header - Elf32_Ehdr

  2. Program headers - Elf32_Phdr

  3. Program interpreter (not if statically linked)

  4. Code

  5. Data

  6. Section headers - Elf32_Shdr

Everything from the start of the file to the last byte of code is mapped into one segment (named "code" or "text") that begins at the base address. There is a whole chapter called readelf describing a command to view all these details. In the meantime I will show fancy ways to get by without.

3.5. Strings and dumps

What would you do if you knew nothing about ELF and just asked yourself how that example works? How can you go sure that the executable file really contains those three letters?

A good start for finding text in binary files is strings(1).

Command: src/magic_elf/strings.sh
#!/bin/sh
# without "-a -n 3" we don't get any output
strings -a -n 3 ${TMP}/magic_elf/magic_elf | grep -n ELF

Output: out/redhat-linux-i386/magic_elf/strings
1:ELF

The leading 1: is written by grep(1) and tells that our three-letter word is the first found string. This gives some help where we can find it in a hex dump. It is difficult to search strings in such a dump because of line breaks. Interactive tools like hexedit(1) or hexedit(1) might be useful.

The traditional tool for dumps is od(1). Its name is probably an abbreviation for "octal dump". GNU od provides -t x1 for byte wise hexadecimal output. We stick to the standard options.

Command: src/magic_elf/od.sh
#!/bin/sh
od -N 16 -c ${TMP}/magic_elf/magic_elf | head -1
od -N 16 -a ${TMP}/magic_elf/magic_elf | head -1

Output: out/redhat-linux-i386/magic_elf/od
0000000 177   E   L   F 001 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000000 del   E   L   F soh soh soh nul nul nul nul nul nul nul nul nul

hexdump(1) is part of util-linux but also available on FreeBSD. It features user defined formats.

Source: src/format.hex
"%04.4_ax  " 8/1 "%02x " "  " 8/1 "%02x "
"  " 16/1 "%_p" "\n"

Source: src/magic_elf/hexdump.sh
#!/bin/sh
hexdump -f src/format.hex \
< ${TMP}/magic_elf/magic_elf \
| head

Source: out/redhat-linux-i386/magic_elf/hexdump
0000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  .ELF............
0010  02 00 03 00 01 00 00 00  60 83 04 08 34 00 00 00  ........`...4...
0020  d0 29 00 00 00 00 00 00  34 00 20 00 06 00 28 00  Ð)......4. ...(.
0030  1e 00 1b 00 06 00 00 00  34 00 00 00 34 80 04 08  ........4...4...
0040  34 80 04 08 c0 00 00 00  c0 00 00 00 05 00 00 00  4...À...À.......
0050  04 00 00 00 03 00 00 00  f4 00 00 00 f4 80 04 08  ........ô...ô...
0060  f4 80 04 08 13 00 00 00  13 00 00 00 04 00 00 00  ô...............
0070  01 00 00 00 01 00 00 00  00 00 00 00 00 80 04 08  ................
0080  00 80 04 08 e8 04 00 00  e8 04 00 00 05 00 00 00  ....è...è.......
0090  00 10 00 00 01 00 00 00  e8 04 00 00 e8 94 04 08  ........è...è...

At this point we can guess that file offset 1 and 0x8048000 + 1 are not coincidental. A test program might help.

3.6. The address of main

Source: out/redhat-linux-i386/magic_elf/addr_of_main.c
#include <stdio.h>

int main()
{
  printf("# 08048000=%#02x\n", *(unsigned char*)0x08048000);
  printf("# 08048001=%.3s\n", (char*)0x08048001);
  printf("main=%p\n", main);
  printf("ofs=%lu\n", (unsigned long)main - 0x08048000);
  return 0;
}

Output: out/redhat-linux-i386/magic_elf/addr_of_main
# 08048000=0x7f
# 08048001=ELF
main=0x8048460
ofs=1120

Looks good. The byte at address 0x8048000 + 0 is equal to that at file offset 0. And 0x8048460 is a plausible address of function main.

3.7. Other roads to ELF

Source: out/redhat-linux-i386/magic_elf/other_perl.pl
#!/usr/bin/perl -w
syscall 4, 1, 0x08048001, 3

Output: out/redhat-linux-i386/magic_elf/other_perl
ELF

Command: out/redhat-linux-i386/magic_elf/other_mem.sh
#!/bin/sh
skip=$( echo "ibase=16; 08048001" | bc )
dd if=/proc/self/mem bs=1 skip=${skip} count=3 2>/dev/null

Output: out/redhat-linux-i386/magic_elf/other_mem
ELF

Command: src/magic_elf/other_exe.sh
#!/bin/sh
dd if=/proc/self/exe bs=1 skip=1 count=3 2>/dev/null

Output: out/redhat-linux-i386/magic_elf/other_exe
ELF

Notes

[1]

Present on Linux (part of glibc) and FreeBSD.

[2]

Canonical Postscript document: ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz
A flat-text version: http://www.muppetlabs.com/~breadbox/software/ELF.txt

[3]

http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

[4]

http://linuxassembly.org/

[5]

http://www2.dgsys.com/~raymoon/asmlinks.html

[6]

http://www.tldp.org/HOWTO/Assembly-HOWTO

[7]

http://www2.dgsys.com/~raymoon/x86faqs.html

[8]

http://www.geocities.com/SiliconValley/Ridge/2544/

[9]

http://www.agner.org/assem/

[10]

http://developer.intel.com/design/pentium4/manuals/245470.htm

[11]

http://webster.cs.ucr.edu/index.html

[12]

http://webster.cs.ucr.edu/Page_AoALinux/0_AoAHLA.html

[13]

http://www.goosee.com/x86/

[14]

http://www.x86.org/

[15]

http://docs.sun.com/?p=/doc/816-1681

[16]

http://www.cs.unm.edu/~maccabe/classes/341/labman/labman.html

[17]

http://www.tuxedo.org/~esr/jargon/html/entry/RTFM.html