5. readelf

 

Outside of a dog, a book is a man's best friend.

Inside a dog it's too dark to read.

 Groucho Marx

Let's get a bit more serious and examine the assembly program from In the language of evil with readelf(1), part of the binutils package. This tool is huge. Here we use only one option, -l.

Command: src/readelf/evil_magic.sh
#!/bin/sh
strip ${TMP}/evil_magic/nasm
ls -l ${TMP}/evil_magic/nasm
readelf -l ${TMP}/evil_magic/nasm

Output: out/redhat-linux-i386/readelf/evil_magic
-rwxr-xr-x    1 alba     anonymou      476 Jun 30 00:06 tmp/redhat-linux-i386/evil_magic/nasm

Elf file type is EXEC (Executable file)
Entry point 0x8048080
There are 1 program headers, starting at offset 52

Program Header:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x00097 0x00097 R E 0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .text 

Nice to see the entry point we retrieved through od(1) again. Program layout is a simplified variation of Sort of an answer. The value of FileSiz includes ELF header and program header. The size of this overhead is:

overhead = Entry point - VirtAddr = 0x8048080 - 0x8048000 = 0x80 = 128 bytes

So effective code size is:

code size = FileSiz - overhead = 0x97 - 0x80 = 0x17 = 23 bytes

This matches with the disassembly listing. However, the ratio of file size to effective code deserves the title "Bloat", with capital B.

code size / file size = 23 / 476 = 0.048

Only 5 percent of the file actually do something useful!

Anyway, we see that even for trivial examples the code is surrounded by lots of other stuff. Let's zoom in on our target.

5.1. Bashful glance

Command: src/readelf/segments.sh
#!/bin/sh
ls -l /bin/bash
readelf -l /bin/bash

Output: out/redhat-linux-i386/readelf/segments
-rwxr-xr-x    1 root     root       519964 Jul  9  2001 /bin/bash

Elf file type is EXEC (Executable file)
Entry point 0x8059380
There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4
  INTERP         0x0000f4 0x080480f4 0x080480f4 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x79273 0x79273 R E 0x1000
  LOAD           0x079280 0x080c2280 0x080c2280 0x057e0 0x09bd0 RW  0x1000
  DYNAMIC        0x07e980 0x080c7980 0x080c7980 0x000e0 0x000e0 RW  0x4
  NOTE           0x000108 0x08048108 0x08048108 0x00020 0x00020 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.got .rel.bss .rel.plt .init .plt .text .fini .rodata 
   03     .data .eh_frame .ctors .dtors .got .dynamic .bss 
   04     .dynamic 
   05     .note.ABI-tag 

Looks intimidating. But then the ELF specification says that only segments of type "LOAD" are considered for execution. Since the flags of the first one are R E, meaning "read & execute", we know that it must be the code segment. The other one has RW, meaning "read & write", so it must be the data segment.

MemSiz is larger than FileSiz in the data segment. Just like with mmap(2) excessive bytes are defined to be initialized with 0. The linker takes advantages of that by grouping all variables that should be initialized to zero at the end. Note that the last section of segment 3 (counting starts with 0) is called .bss, the traditional name for this kind of area.

The mapping for segment 2 looks even more complex. But I would guess that .rodata means "read-only data" and .text contains productive code, as opposed to the administrative stuff in the other sections. LSB [1] has a good overview of section names. [2]

5.2. Turn the pages

The distance between the two LOAD segments is interesting:

VirtAddr[2] - VirtAddr[1] - FileSiz[1] = 0x80c2280 - 0x8048000 - 0x79273 = 0x100d = 4109 bytes

Only 13 bytes (0xd) would be needed to align the first LOAD segment up to the alignment of 0x1000. For some reason at least one complete page lies between code segment and data segment. This would be easy target for a tiny virus. So lets check out whether this is a unique phenomenon.

The following Perl script reads a list of file names from stdin, one file name per line. readelf -l is called for every file, and the output is parsed for lines starting with the word LOAD. The end of the segment is calculated and stored for the next line. If the distance from the start of a LOAD segment to the end of the previous one is less than 0x1000, a message is written.

Source: src/scanner/dist.pl
#!/usr/bin/perl -w
use strict;

my $min = 0xFFFFFFFF;
my $max = 0;
my $detected = 0;
LOOP: while(my $filename = <>)
{
  chomp $filename; $filename =~ s/^\s*//; 
  next LOOP if ( ! -e $filename );
  open(ELF, "readelf -l $filename 2>&1 |") || die "$1 ($filename)";

  my $nrLoad = 0;
  my $end = 0;
  LINE: while(my $line = <ELF>)
  {
    chomp $line;
    next LINE if (!($line =~ m/^\s*LOAD\s*/));

    $nrLoad++;
    my @number = split / +/, $line;
    my $virtaddr = hex($number[3]);
    my $filesiz = hex($number[5]);
    if ($end != 0)
    {
      my $dist = $virtaddr - $end;
      if ($dist < 0x1000)
      {
	printf "%-46s virtaddr=0x%08x dist=0x%08x\n",
	  $filename, $virtaddr, $dist;
	$detected++;
      }
      $max = $dist if ($dist > $max);
      $min = $dist if ($dist < $min);
    }
    $end = $virtaddr + $filesiz;
  }
  close ELF;
  printf("%-46s has %d LOAD segments.\n", $filename, $nrLoad)
    if ($nrLoad != 2);
}
printf "%4d files; %4d detected; min=0x%08x; max=0x%08x\n",
  $., $detected, $min, $max;

You will now experience a time warp. I introduced above script as a means to verify the existence of an exploitable peculiarity. But this also makes it a tool to check whether the peculiarity is already exploited. In other words, it is a scanner. For viruses we yet have to develop.

Of course the naive implementation through parsing the output of readelf(1) significantly limits performance. But use of file(1) as a fast file-type filter will lower noise ("has 0 LOAD segments") and duration to acceptable regions.

The first test is with typical places like /bin. Only the last line of output is shown.

Command: src/scanner/big.sh
#!/bin/sh
dst=$1
scanner=${2:-src/scanner/dist.pl}
find /bin /sbin /usr/bin /usr/sbin /usr/lib -type f -print0 \
| xargs -r0 file \
| sed -ne 's/: *ELF 32-bit [LM]SB executable,.*//p' \
| ( ${scanner} 2>&1 ) \
| tee ${dst}.full \
| tail -1 \
> ${dst}

Output: out/redhat-linux-i386/scanner/dist_big
1475 files;   20 detected; min=0x00000020; max=0x0000101f

Strange result. On my original installation of RedHat 7.2 all files could be infected through this method. Since the second quarter of 2002 some (but not all) executables of updated packages are immune. On a similar installation of RedHat 7.3 this scanner detects a few hundred files. Either someone is actively closing a gap. Or I caught an infection …

Anyway, there are still enough interesting files left. So on to all infected executables created from the sources of this document.

Command: src/scanner/small.sh
#!/bin/sh
dst=$1
scanner=${2:-src/scanner/dist.pl}
find ${TMP} -type f -name '*_infected' -print0 \
| xargs -r0 file \
| sed -ne 's/: *ELF 32-bit [LM]SB executable,.*//p' \
| ( ${scanner} 2>&1 ) \
| tee ${dst}.full \
| tail -1 \
> ${dst}

Again only the last line of output is shown. It's enough to see that all infected files are detected.

Output: out/redhat-linux-i386/scanner/dist_small
  35 files;   35 detected; min=0xf7f385a0; max=0x00001010

5.3. The plan

You may have heard that Linux is a difficult target for malware [3] because there are so many different distributions. Well, they all use basically the same compiler, producing the same idiosyncrasies. This allows us to cheat in big style.

  1. Insert our code between code segment and data segment.

  2. Modify inserted code to jump to original entry point afterwards.

  3. Change entry point to start of our code.

  4. Modify program header

    1. Include increased amount of code in entry of code segment.

    2. Move all following entries down the file.

  5. Modify section header

    1. Include trailing code in last section of code segment (should be .rodata).

    2. Move all following sections down the file.

This setup has two big problems, however.

5.4. Paranoid android

Since all executables in /bin and /usr/bin follow the same layout, a heuristic scanner can easily spot deviations. A "perfect infection", resulting in a executable indistinguishable from the real thing, is far from sight. But then there are bigger issues an innocent virus seeking a warm nest in the wild would face.

For example RPM-based distributions maintain a checksum database. Verifying a single file, a complete package, or even all installed packages takes just one command.

If you know what you are looking for:

rpm --verify -f /bin/sh

For dedicated people with enough time to read the output:

/bin/nice -n 19 rpm --verify --all

A possible counter attack is to patch the database after infection. This is distribution dependent and requires root permissions. And it won't help against people who have the checksums offline. That type of precaution is commonly called "Intrusion Detection System". There is a FAQ [4] hosted by SANS Institute [5] And a brief introduction to products is "Talisker's Intrusion Detection System List". [6] tripwire [7] has achieved a big name, but there is other free software: snort, [8] aide, [9] and lids [10]

Another possibility is to hide the original (uninfected) executable on the file system, and patch the kernel via an inserted module to fake calculation of the checksum. A common name for this concept is "Loadable Kernel Module (LKM) Rootkits". [11] And if the kernel is compiled without module-support, there is still direct access to /dev/kmem to install a kernel-patch…

On this road lies madness.

Notes

[1]

http://www.linuxbase.org/

[2]

http://www.linuxbase.org//spec/gLSB/gLSB/specialsections.html

[3]

http://www.malware.org/malware.htm

[4]

http://www.sans.org/newlook/resources/IDFAQ/ID_FAQ.htm

[5]

http://www.sans.org

[6]

http://www.networkintrusion.co.uk/ids.htm

[7]

http://www.tripwire.org/

[8]

http://www.snort.org/

[9]

http://www.cs.tut.fi/~rammer/aide.html

[10]

http://lids.planetmirror.com/

[11]

http://la-samhna.de/library/lkm.html