top of page

Writing A Program That Parses The Headers Of An Portable Executable.

  • Writer: Owner
    Owner
  • Jul 3, 2020
  • 4 min read

Updated: Nov 12, 2022

By the end of this post, you will understand enough to make a lightweight parser for parsing the headers of the PE file format (written in c++),

something like this -

ree

What is the PE file format ?


PE stands for "Portable Executable" . It is a file format of executable files in Windows.Basically the .exe and the DLLs are portable executables.

In its most bare definition, It is a data structure that has all the necessary information that the OS loader requires to load the file into memory AKA to run it.

What Are "Headers"?


Headers of a portable executable are parts of the PE structure that have information required by the OS'es loader to run/load the file into memory. Sometimes it can be used by reverse engineers to gather information about an executable or an unknown file format. The headers of every kind of file format is located at the image base of the file. (An Image Base of a process is the absolute start of it's address space)


Structure of the PE file format


Like mentioned above, the PE header is nothing but a data structure. It is structured somewhat like this -


ree


Since this is a article about writing a parser,I wont go through the contents of each individual header as that will leave us with a lot of ground to cover. But I won't exclude the unique, and key information contained in them.


If you want a VERY detailed and well documented look into the data structure I'd recommend reading this.







Dos MZ Header

The first header of a PE file is the Dos header, If you ever load a binary file into a hex viewer you will notice that the first two bytes will always read 4D 5A (The order of these bytes depends on the endianness of course) when you convert these bytes from hex to ASCII, it reads "MZ" these are the initials of Mark Zbikowski,the creator of the first linker to DOS, this sequence of bytes is also refereed to as the "magic number".

A couple of bytes of whitespace later you will come across a sequence of bytes that read - "This program cannot be run in DOS mode" , The purpose of this is graceful decline, whenever you load an executable into DOS it prints out this message. (This portion is called the Dos Stub)


This structure contains a very important variable, the - e_lfanew

This variable acts as a RVA for the pointer to the next header structure, the PE file header.

(RVA stands for "Relative Virtual Addresses", they assume that the base address of a module loaded into memory is not known at compile time, hence in our case the pointer to the next header structure will be => requiredPointer = ImageBase + e_lfanew)


PE File Header


If you follow the e_lfanew pointer then you will arrive at the PE header. The only important chunk of information the PE header contains is the PE signature which is a sequence of bytes that read -"PE\0\0 ".

This confirms that the file is a PE file.

The header that follows the PE header is the "COFF header", sometimes also called the "File Header" which contains some information about the executable.


The COFF header is followed by the "Optional Header" and the "Data Directories" which also contain some essential information.


For our convenience, The information contained within the structures, and these structures themselves are in contained in the WIN32 API.


Programming The Parser


The first step here would be to get the address of the Imagebase, we will be using the Win API function - GetModuleHandleA. This function returns the handle for the specified module, in our case for demonstration purposes I will be passing the parameter for this function as NULL, this returns the handle of the current running process.

HANDLE hmodule = GetModuleHandleA(NULL);

Here, `hmodule` has the address of the Imagebase, now we just cast it to a pointer of struct IMAGE_DOS_HEADER (which is this win API structure for the Dos Header).


PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)hmodule;

Now you can use that pointer to print out the contents of the Dos header.


This is the struct -


typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

Now to get to the PE header,and read the PE signature (which is of type char) you need to add the address of the Imagebase to the variable e_lfanew in the above struct, as mentioned above.

You can always typecast and add the two addresses, MSVC will allow that- but Type punning by pointer aliasing is undefined behavior in C++ , the more appropriate method of getting the pointer and verifying the PE signature would be using memcpy() -

char* buffer;
	std::memcpy(&buffer, &pDosHeader, sizeof(buffer));
	buffer = buffer + pDosHeader->e_lfanew;
	if (buffer[0] == 'P' && buffer[1] == 'E') { std::cout << "PE file signature confirmed."; }

And now you can keep following the chain of pointers to get to all the information in every header!

If you want the complete source code for my project you can check out my GitHub


Microsoft has a detailed documentation on the PE file format, click here to get to it.


Comments


  • Instagram
  • YouTube
  • LinkedIn

_

bottom of page