x86 Debugger for Windows and Wine

2022-08-31

I wanted to play Heroes of Might and Magic 3 on a tablet, a game from 1999 you can play with only a mouse. Since the game require two buttons, and you're left with only one on a tablet, I wrote a win32 program that uses SetWindowsHookEx to intercept mouse events and turn a long press on the screen into a right click.

The game has two modes, one where you move your hero around an adventure map, and one where you do combat on a chess-like board. Long press to right click works ok in both modes but being able to adjust the mouse based on which mode the player is in makes for a better experience. I won't go into the differences between adventure vs combat here, instead I'll show you how to build a custom debugger in C to trigger code at specific memory locations, to get notified when a program runs code that (for instance) sends you into combat.

And don't worry, you don't have to buy an old game to follow this guide.

Since the game is 32-bit, that's what I'll be focusing on, though it's easy to get the code to support 64-bit.

There will be lots of talk about C along the way, with the assumption that you're not used to the language. If you just want the code and not my ramblings, go to the end of the article when you've checked the prerequisites below.

Prerequisites

You need Windows or Wine and Tiny C Compiler 0.9.27-win32 with all Windows headers. Go to http://download.savannah.gnu.org/releases/tinycc/ and download tcc-0.9.27-win32-bin.zip and winapi-full-for-0.9.27.zip. Extract tcc, then extract winapi into your tcc dir and click yes to override files when asked. Add tcc to your PATH.

Function Hooks

Since this isn't something, I've dealt with before, I started out by searching for function hooks:

"Function Hooking is a programming technique that lets you to intercept and redirect function calls in a running application" – Kyle Halladay

If you need to change a programs behavior, this approach enables you to do so, but it's rather advanced. You change the program at runtime, overriding the jump to an original function with your hook, which quickly turns into a problem: A C function signature is required if you need to execute the original function after your hook is invoked, and to get that signature, you read through assembly and turn it into C by hand. There are likely ways to get around this, perhaps a good C decompiler can help, but at this point you're already dealing with quite a lot of code, and why bother if you're not going to change the return value or use the input arguments.

Function hooks looks super interesting for unit testing though.

Not Heroes of Might and Magic 3

Instead of figuring out function signatures we'll build a custom debugger, so let's create something to attach it to. Create myprogram.c somewhere and paste the following:

/* Mem. addr. */  // Compile: tcc myprogram.c
/*            */
/*            */  #include <stdio.h>
/*            */
/* 0x00401000 */  void hello()
/*            */  {
/* 0x00401010 */      printf("B: Hello from b!\n");
/*            */  }
/*            */  
/* 0x0040101A */  int main()
/*            */  {
/*            */      while (1)
/*            */      {
/* 0x00401024 */          int cmd = getchar();
/* 0x00401032 */          if (cmd == 'a')
/* 0x0040103E */              printf("A: Nice!\n");
/* 0x00401051 */          else if (cmd == 'b')
/* 0x00401057 */              hello();
/* 0x00401067 */          else if (cmd == 'x')
/* 0x0040106D */              break;
/*            */      }
/* 0x0040107A */      return 0;
/*            */  }

You find these memory addresses by opening myprogram.exe in x86dbg or similar debuggers. It's also important you compile with tcc unless you want to find the addresses yourself.

Breakpoints

Debugging in a broad sense means somehow tracing down bugs in an application, but in this article, it's about setting and reacting to breakpoints.

There are two ways to set breakpoints: One is at the hardware level using cpu debug registers. This method doesn't require altering application code, but the number of breakpoints one can set is limited by the cpu. If you have six debug registers, then that's a hard limit of six breakpoints.

The approach used in this article is software breakpoints. You change an instruction at a specific memory address in the debugged application with the software breakpoint instruction "0xCC", aka INT3. When the cpu instruction pointer (EIP for x86, RIP for x64) gets to that address, the debugger is notified. But since we had to replace an instruction with INT3, the original must be reinserted, and the cpu instruction pointer moved one step back, so the program can continue with the original instruction.

wmain

I'll start with wmain to give you an overview of what's going to happen.

Read the following, then put it in a file called debugger.c:

#include <windows.h>
#include <shlwapi.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>

// Software breakpoint instruction
#define INT3 0xCC

#define CASE_A 0x0040103E
#define CASE_B 0x00401057

struct breakpoint
{
    char origbyte;
    void *address;
};

/*                                                     *
 *   Insert upcoming functions from the article here   *
 *                                                     */

int wmain(int argv, wchar_t *args[])
{
    uint32_t addresses[] = {CASE_A, CASE_B};
    PROCESS_INFORMATION procinfo = {0};
    size_t breakpoints_size = ARRAYSIZE(addresses);
    struct breakpoint *breakpoints[breakpoints_size];
    uint32_t address;

    if (argv < 2)
        return error(L"Usage: %s path_to_exe\n", args[0]);

    if (!create_process(args[1], &procinfo))
        return error(L"Error: Failed to create process for %s. "
                      "Could it be an invalid path?\n", args[1]);
    
    for (size_t i = 0; i < breakpoints_size; i++)
        breakpoints[i] = set_breakpoint(procinfo.hProcess, addresses[i]);

    while (get_breakpoints(&procinfo, breakpoints, breakpoints_size, &address))
    {
        switch (address)
        {
        case CASE_A:
            printf("Case A triggered\n");
            break;
        case CASE_B:
            printf("Case B triggered\n");
            break;
        }
    }

    return 0;
}

It won't compile because error, create_process, set_breakpoint and get_breakpoints isn't implemented yet, but I hope it gives you an idea about what's going on. If not, here's the summary:

The program passed as argv[1] is started,
breakpoints (0xCC/INT3) are set and then
get_breakpoints will return true with an address whenever a breakpoint is hit.

In case you're not used to C (or win32), here's a few notes on the code:

main vs wmain

Why wmain and not main? wmain is a Microsoft-specific thing where you get the arguments as wchar_t (UTF-16) as opposed to main where you get char. Required when you want to support Unicode characters.

PROCESS_INFORMATION procinfo = {0};

PROCESS_INFORMATION has some id's necessary for altering the application at runtime, but the interesting part here is {0}. When initializing the struct this way, it assigns zero to every variable in the struct. Before C99 you'd declare the variable like PROCESS_INFORMATION procinfo; and then call ZeroMemory(&procinfo, sizeof(procinfo)); or memset(&procinfo, 0, sizeof(procinfo)); which does the same thing (ZeroMemory is win32-specific).

Nasty bugs can occur if you forget to zero-initialize. One issue that happened to me was that a struct I forgot to initialize caused my pc to go to sleep every second time the function was called.

Code style

All declarations go to the top, besides variables in for loops. This is super old school, but I have come to like it when writing pl/pgSQL where there's no other way. Since C doesn't have great support for multiple return values, you also often have functions that set values through pointers sent in as arguments. The variable address is one such example in the code above. Then you can either have the variable at the top or a dangling "uint32_t address;" somewhere. I think the latter is less aesthetic and having the variables at the top can act as a kind of table of contents.

80-character limit per line, which helps me think (way too long) about writing more concise code. Long variable names will cause the code to span multiple lines, making it look ugly and hard to read, while short names can put even the most complex code on one line, but it'll be hard to understand. 80 characters per line makes it stand out if something is either a bit complicated or overcomplicated.

Lowercase types are preferred. BOOL, TRUE, FALSE, DWORD and WCHAR are defined in windows.h, but I prefer their lowercase counterparts. stdbool.h provides bool, true and false and from stdint.h we get uint32_type that matches both the pointer size needed for breakpoints in 32-bit programs and DWORD. This is again for aesthetic reasons.

snake_case everywhere except for Windows functions. This makes it easy to see if something is application code or a call to Windows in a namespace-free language, because Windows functions are all PascalCase.

struct variables are declared with "struct [struct_name] [variable_name];" like struct breakpoint *breakpoints[breakpoints_size];. No modern language uses such syntax, but I like that I can have a struct named breakpoint and declare a variable with the same name, making "struct breakpoint *breakpoint;" possible. Microsoft uses the modern way by typedef'ing the structs like "typedef struct _PROCESS_INFORMATION { ...variables... } PROCESS_INFORMATION" which is why there's no struct keyword in front of those variables. If you look at Linus Torvalds' C, you'll find that he doesn't typedef his structs.

Error handling

You're likely used to the comfort of exceptions, but that luxury doesn't exist in C. People have instead come up with various ways to deal with errors, and the most widespread seems to be that a function's return value is the error code, with the actual return value being a pointer passed as an argument. Go's solution to this is multiple return values val, err := myfn() which in C would be char[10] val; errno_t err = myfn(&val);.

Your code can quickly explode in size if you check every call for errors, so I'm assuming the rest of the code works if create_process succeeds. This is how I code in languages with exceptions, where I only check for errors I can do something about, handling exceptions in a global exception handler. That could be a process supervisor for your C application.

The error function simply prints an error message, then returns 1 so wmain can do return error("...") and looks like this:

int error(const wchar_t *msg, ...)
{
    wchar_t buffer[500];
    va_list arglist;
    va_start(arglist, msg);
    vswprintf_s(buffer, ARRAYSIZE(buffer), msg, arglist);
    va_end(arglist);
    fwprintf(stderr, L"%s", buffer);
    return 1;
}

This is more code than had it been part of wmain, but I like it in a separate function and keeping wmain easy to glance.

Paste it into debugger.c.

Creating a process for debugging

This part is straightforward:

bool create_process(wchar_t *program_fullpath,
                    OUT PROCESS_INFORMATION *procinfo)
{
    // Path is duplicated because PathRemoveFileSpec mutates its argument
    wchar_t *program_dir = wcsdup(program_fullpath);
    STARTUPINFO unused_startinfo = {0};
    bool success = false;

    // PathRemoveFileSpec is deprecated since win8, but PathCchRemoveFileSpec
    // doesn't do anything on my machine (using tcc).
    PathRemoveFileSpec(program_dir);

    success = CreateProcess(program_fullpath, NULL, NULL, NULL, false,
                            DEBUG_ONLY_THIS_PROCESS, NULL, program_dir,
                            &unused_startinfo, procinfo);
    free(program_dir);
    return success;
}

Note the OUT in the function signature. This is defined by windows.h and is removed by the preprocessor. So why put it there? If you've worked with stored procedures, you have likely seen IN and OUT and INOUT, and this is (somewhat) similar. IN is the default, an argument that is passed to the function that's not mutated, but when OUT is used the argument is meant to be read outside the function – the value it's pointing to will be mutated. IN OUT would then mean that the argument is expected to hold a valid value that is read by the function, but also mutated (probably wise to use sparingly). It's only there to convey information to the reader of your code.

Current directory is set to program_dir by removing the file name from the full path, but it isn't always necessary. It was needed for Heroes of Might and Magic 3 HD mod to start correctly.

I don't know why it surprised me, but Windows has built in functions for debugging, and DEBUG_ONLY_THIS_PROCESS makes the OS send debug events to your debugger. The alternative would be to call DebugActiveProcess after starting the process, or if you're attaching to an already running process.

Setting breakpoints

void write_memory_byte(HANDLE proc, void *address, char byte)
{
    WriteProcessMemory(proc, address, &byte, 1, NULL);
    FlushInstructionCache(proc, address, 1);
}

struct breakpoint *set_breakpoint(HANDLE proc, uint32_t address)
{
    // Allow hex vals like 0x.. to be passed without having to cast to void *
    void *addr = (void *) address;
    struct breakpoint *breakpoint = malloc(sizeof(struct breakpoint));

    breakpoint->address = addr;

    // Read original byte at the address into breakpoint for later use and ...
    ReadProcessMemory(proc, addr, &(breakpoint->origbyte), 1, NULL);

    // ... replace it with INT3
    write_memory_byte(proc, addr, INT3);

    return breakpoint;
}

As mentioned earlier, the original byte is read and saved before being replaced by INT3, so we can look it up by address and put it back once the breakpoint is triggered.

void * is a pointer to anything and is used here as a pointer to the address where the breakpoint is set. We don't know and don't care about the actual type. Casting from uint32_t to void * is safe because the pointer size in 32-bit programs is the same as uint32_t.

Getting breakpoints

Now for the interesting stuff:

bool get_breakpoints(PROCESS_INFORMATION *procinfo,
                     struct breakpoint *breakpoints[],
                     size_t breakpoints_size,
                     OUT uint32_t *breakpoint_address)
{
    DEBUG_EVENT event = {0};
    void *address = NULL;
    CONTEXT context = {.ContextFlags = CONTEXT_ALL};
    struct breakpoint *breakpoint = NULL;

    *breakpoint_address = 0;
    
    while (WaitForDebugEvent(&event, INFINITE))
    {
        switch (event.dwDebugEventCode)
        {
        case EXIT_PROCESS_DEBUG_EVENT:
            return false;
        case EXCEPTION_DEBUG_EVENT:
            switch (event.u.Exception.ExceptionRecord.ExceptionCode)
            {
            case STATUS_BREAKPOINT:
                address = event.u.Exception.ExceptionRecord.ExceptionAddress;
                
                for (size_t i = 0; i < breakpoints_size; i++)
                    if (breakpoints[i]->address == address)
                        breakpoint = breakpoints[i];

                // Ignore initial STATUS_BREAKPOINT event triggered by Windows
                if (breakpoint == NULL)
                    break;

                // Replace INT3 with original byte
                write_memory_byte(procinfo->hProcess, address,
                                  breakpoint->origbyte);

                GetThreadContext(procinfo->hThread, &context);

                // Decrement instruction pointer by one, to rerun with original
                // byte. Happens after ContinueDebugEvent.
                context.Eip--;

                // Enable single step: Triggers EXCEPTION_SINGLE_STEP on the
                // next (and only the next) instruction where we can insert the
                // breakpoint again
                context.EFlags |= 0x100;

                // Apply changes to thread context
                SetThreadContext(procinfo->hThread, &context);

                break;
            case EXCEPTION_SINGLE_STEP:
                // Insert breakpoint again
                write_memory_byte(procinfo->hProcess, breakpoint->address,
                                  INT3);

                *breakpoint_address = (uint32_t) breakpoint->address;

                break;
            }
        }

        ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE);

        if (*breakpoint_address != 0)
            return true;
    }

    return false;
}

Before you continue, recall this code from wmain:

while (get_breakpoints(&procinfo, breakpoints, breakpoints_size, &address))
{
    switch (address)
    {
        ...
    }
}

return 0;

What happens is this:

Wait for debugging events from Windows.
If the event is EXIT_PROCESS_DEBUG_EVENT it means that the program being debugged has exited, so get_breakpoints return false, meaning that while (get_breakpoints(...)) in wmain breaks the while loop and exits the debugger.
If the event is EXCEPTION_DEBUG_EVENT it's either
1. A breakpoint, in which case the original value must be inserted and EIP or the instruction pointer is decremented by one. The breakpoint is stored for the next run which will be...
2. Single stepping, enabled by a breakpoint event via context.EFlags |= 0x100;, where the breakpoint instruction can be reinserted and thereby triggered again the next time the debugged application reaches that address. Single stepping is disabled after each run unless context.EFlags is set again. Then *breakpoint_address is set, making get_breakpoint return true after ContinueDebugEvent which is needed when wmain's while-loop calls get_breakpoints once again after running the switch (address) {...} code block.

Compile the code

tcc -impdef kernelbase.dll
tcc -DUNICODE debugger.c kernelbase.def

Then debug myprogram.exe by executing debugger "[full path to]\myprogram.exe". Try entering a, then type enter, then try the same with b.

Known issues

*breakpoint_address in get_breakpoints: This value is initially set to 0, so if you have breakpoints at that memory address, you'll have to change the code.
It doesn't seem like the debugger works for all addresses. Sometimes it just hangs for a particular address. Incrementing the address by 1 usually does the trick.

Full program

#include <windows.h>
#include <shlwapi.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>

// Software breakpoint instruction
#define INT3 0xCC

#define CASE_A 0x0040103E
#define CASE_B 0x00401057

struct breakpoint
{
    char origbyte;
    void *address;
};

void write_memory_byte(HANDLE proc, void *address, char byte)
{
    WriteProcessMemory(proc, address, &byte, 1, NULL);
    FlushInstructionCache(proc, address, 1);
}

struct breakpoint *set_breakpoint(HANDLE proc, uint32_t address)
{
    // Allow hex vals like 0x.. to be passed without having to cast to void *
    void *addr = (void *) address;
    struct breakpoint *breakpoint = malloc(sizeof(struct breakpoint));

    breakpoint->address = addr;

    // Read original byte at the address into breakpoint for later use and ...
    ReadProcessMemory(proc, addr, &(breakpoint->origbyte), 1, NULL);

    // ... replace it with INT3
    write_memory_byte(proc, addr, INT3);

    return breakpoint;
}

bool get_breakpoints(PROCESS_INFORMATION *procinfo,
                     struct breakpoint *breakpoints[],
                     size_t breakpoints_size,
                     OUT uint32_t *breakpoint_address)
{
    DEBUG_EVENT event = {0};
    void *address = NULL;
    CONTEXT context = {.ContextFlags = CONTEXT_ALL};
    struct breakpoint *breakpoint = NULL;

    *breakpoint_address = 0;
    
    while (WaitForDebugEvent(&event, INFINITE))
    {
        switch (event.dwDebugEventCode)
        {
        case EXIT_PROCESS_DEBUG_EVENT:
            return false;
        case EXCEPTION_DEBUG_EVENT:
            switch (event.u.Exception.ExceptionRecord.ExceptionCode)
            {
            case STATUS_BREAKPOINT:
                address = event.u.Exception.ExceptionRecord.ExceptionAddress;
                
                for (size_t i = 0; i < breakpoints_size; i++)
                    if (breakpoints[i]->address == address)
                        breakpoint = breakpoints[i];

                // Ignore initial STATUS_BREAKPOINT event triggered by Windows
                if (breakpoint == NULL)
                    break;

                // Replace INT3 with original byte
                write_memory_byte(procinfo->hProcess, address,
                                  breakpoint->origbyte);

                GetThreadContext(procinfo->hThread, &context);

                // Decrement instruction pointer by one, to rerun with original
                // byte. Happens after ContinueDebugEvent.
                context.Eip--;

                // Enable single step: Triggers EXCEPTION_SINGLE_STEP on the
                // next (and only the next) instruction where we can insert the
                // breakpoint again
                context.EFlags |= 0x100;

                // Apply changes to thread context
                SetThreadContext(procinfo->hThread, &context);

                break;
            case EXCEPTION_SINGLE_STEP:
                // Insert breakpoint again
                write_memory_byte(procinfo->hProcess, breakpoint->address,
                                  INT3);

                *breakpoint_address = (uint32_t) breakpoint->address;

                break;
            }
        }

        ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE);

        if (*breakpoint_address != 0)
            return true;
    }

    return false;
}

bool create_process(wchar_t *program_fullpath,
                    OUT PROCESS_INFORMATION *procinfo)
{
    // Path is duplicated because PathRemoveFileSpec mutates its argument
    wchar_t *program_dir = wcsdup(program_fullpath);
    STARTUPINFO unused_startinfo = {0};
    bool success = false;

    // PathRemoveFileSpec is deprecated since win8, but PathCchRemoveFileSpec
    // doesn't do anything on my machine (using tcc).
    PathRemoveFileSpec(program_dir);

    success = CreateProcess(program_fullpath, NULL, NULL, NULL, false,
                            DEBUG_ONLY_THIS_PROCESS, NULL, program_dir,
                            &unused_startinfo, procinfo);
    free(program_dir);
    return success;
}

int error(const wchar_t *msg, ...)
{
    wchar_t buffer[500];
    va_list arglist;
    va_start(arglist, msg);
    vswprintf_s(buffer, ARRAYSIZE(buffer), msg, arglist);
    va_end(arglist);
    fwprintf(stderr, L"%s", buffer);
    return 1;
}

int wmain(int argv, wchar_t *args[])
{
    uint32_t addresses[] = {CASE_A, CASE_B};
    PROCESS_INFORMATION procinfo = {0};
    size_t breakpoints_size = ARRAYSIZE(addresses);
    struct breakpoint *breakpoints[breakpoints_size];
    uint32_t address;

    if (argv < 2)
        return error(L"Usage: %s path_to_exe\n", args[0]);

    if (!create_process(args[1], &procinfo))
        return error(L"Error: Failed to create process for %s. "
                      "Could it be an invalid path?\n", args[1]);
    
    for (size_t i = 0; i < breakpoints_size; i++)
        breakpoints[i] = set_breakpoint(procinfo.hProcess, addresses[i]);

    while (get_breakpoints(&procinfo, breakpoints, breakpoints_size, &address))
    {
        switch (address)
        {
        case CASE_A:
            printf("Case A triggered\n");
            break;
        case CASE_B:
            printf("Case B triggered\n");
            break;
        }
    }

    return 0;
}

If you got this far, perhaps you want to subscribe to new tutorials? Then subscribetoj@nsommer.dk and I'll add you to the list. The mail can be empty, but if not I promise I'll read it. You can always unsubscribetoj@nsommer.dk.

Tipping: Help me write more by tipping via bank transfer (IBAN) to DK81 2000 6277 7121 54. Any amount is highly appreciated!