Chasing Symbols (work-in-progress)

Note: I started writing this a while ago, and it's being posted before being finished simply because I feel it already has enough useful information to be worth it. This said, I would prefer to finish the pending discussions here and not spread this over multiple posts. I'll have a quick post when it's all cleaned up back-referencing this - in the meantime, enjoy!

Your program crashes or misbehaves, and it's time to look at the thing under a debugger, but you need to make sense of what's running and what's in memory. If you can't get proper symbols, your work will be orders of magnitude more difficult.

Let's make sure you've got the deck stacked for you.

Yes, we're talking about debug symbols here, not literary symbols, sex symbols or typographical symbols (nor orchestra cymbals, for that matter).

What are symbols?

Debug symbols, or just 'symbols' this time around, are the pieces of data that a debugger like Visual Studio, GDB or LLDB might use to show the names of functions and variables, as well as understand their associated types. The might also include additional information like source location (file name, line and column), or information about lifetime scope, or whatever else the compiler and debugger figure out is useful for debugging.

Debug symbols typically have to be created and maintained throughout the build process, so you can work with your program in the form the computer likes to run it, but understand it in terms of the way that you wrote it.

Compilation symbols

Let's assume a C++ compilation pipeline: there's a source file, and then it gets compiled into an object file, and the object file gets linked with other objects and libraries to produce an executable.

Let's see what we can find starting from the following file symtest.cpp.

#include <stdio.h>

void do_something() {
  printf("hi mom!\n");
}

int main() {
  do_something();
  return 0;
}

I can build this with Visual C++ by running cl symtest.cpp /Zi (the /Zi switch specifies we want debug information and in a separate file), and that will produce a number of files:

If you run cl -c symtest.cpp /Zi instead, you'll only get symtest.obj and vc140.pdb. You can also run with /Z7 instead of /Zi, and then all of the debug information ends up in symtest.obj (you will see the .debug$T section with type debug information grow).

Now, we can take a look at what's in the object file with the dumpbin utility, running dumpbin symtest.obj /SYMBOLS, which shows us something like this.

Dump of file symtest.obj

File Type: COFF OBJECT

COFF SYMBOL TABLE
000 01057555 ABS    notype       Static       | @comp.id
001 80010191 ABS    notype       Static       | @feat.00
...
011 00000000 SECT8  notype       Static       | .debug$S
    Section length  118, #relocs    5, #linenums    0, checksum        0, selection    5 (pick associative Section 0x7)
...
017 00000000 SECT5  notype ()    External     | ___local_stdio_printf_options
018 00000000 UNDEF  notype ()    External     | ___acrt_iob_func
019 00000000 UNDEF  notype ()    External     | ___stdio_common_vfprintf
01A 00000000 SECT7  notype ()    External     | __vfprintf_l
01B 00000000 SECT9  notype ()    External     | _printf
01C 00000000 SECT4  notype ()    External     | ?do_something@@YAXXZ (void __cdecl do_something(void))
01D 00000020 SECT4  notype ()    External     | _main

Here you can see a number of symbols that we've used, including two functions with C linkage (printf and main, shown as _printf and _main) as well as C++ function that's externally visible (do_something, mangled to include signature information, which you'd need to do overload resolution like you have to in C++).

OK, let's try this one more time, with clang, compiling for Linux. If you're running on Windows, I have posted before instructions on setting gcc and clang with Docker, which might come in handy.

With the same file, let's run clang -c -g symtest.cpp. Here g specifies we want debug information, and -c asks that we only compile rather than compile and link.

This produces a single file, symtest.o. Instead of dumpbin, we'll use objdump or nm.

root@b35e59218db2:/usr/src/myapp# nm symtest.o
0000000000000000 T _Z12do_somethingv
0000000000000030 T main
                 U printf

Other useful switches to nm are --extern-only (to show exports), --defined-only (only defined symbols), and --debug-syms (debugger-only symbols).

root@b35e59218db2:/usr/src/myapp# nm --debug-syms symtest.o
0000000000000000 N .debug_abbrev
0000000000000000 N .debug_info
0000000000000000 N .debug_line
0000000000000000 N .debug_str
0000000000000000 r .rodata.str1.1
0000000000000000 t .text
0000000000000000 T _Z12do_somethingv
0000000000000030 T main
                 U printf
0000000000000000 a symtest.cpp

Here's a cheat sheet for nm symbol types:

If using GCC, again, with the same file, we can run gcc -c -g symtest.cpp, which produces symtest.o, and the flow is essentially the same as with clang in this case.

TODO: SAME IN CLANG ON ANDROID

TODO: SAME IN CLANG ON MAC OS

Linking symbols

OK, so at link time, we have all the compilation symbols.

We'll cover dynamic libraries in a moment, but let's touch on static libraries first.

On Windows, the tool to build static libraries is lib, the Microsoft Library Manager. You can also use this to build import libraries and export files, and to extract members from a library.

Let's give this a try with cl -c symtest.cpp /Zi && lib symtest.obj (you can do both with cl, but this is good to see them run separately). Now you end up with a symtest.lib file, which if you look at with dumpbin also contains the debug information.

You can verify that static libraries are simply containers for object files by running lib /list symtest.lib, which outputs the members (just symtest.obj in this case), and you can copy that out again with lib symtest.lib /extract:symtest.obj /ou:mycopy.obj.

So effectively all the debug information is carried around in the .obj files that live within the .lib files.

Dynamic linking symbols

What can you get from a .DLL or shared object?

To test this on Windows, build our sample with cl symtest.cpp /Zi /LD - LD specifies you want a DLL (and defaults to the multithread static version of the CRT).

You can then examine the output with dumpbin symtest.dll /SYMBOLS, but you'll note there are no symbols shown. Instead, you should use dumpbin symtest.dll /IMPORTS, which will show a bunch of functions used by the static CRT.

As an aside, if you use dumpbin symtest.dll /EXPORTS, you will also see no symbols. You can instead change the code to read __declspec(dllexport) void do_something() { in the function declaration, and then exports will include ?do_something@@YAXXZ, which is the C++-decorated name (remember we compiled a C++ file!). You'll also see the incremental linking name and a few other values like export ordinal. The decoration helps support things like overload resolution. To get a simple name instead, change the declaration to extern "C" __declspec(dllexport) void do_something() { and at last you'll get the do_something export you might have expected in the first place.

That said, debug symbols will not be present in the .DLL anymore. Instead they are placed in symtest.pdb. Remember this is different from static files, where debug information is kept within the .obj files.

On Linux, you can build with clang -shared -g symtest.cpp, dropping the -c switch.

You can use objdump like we did before, or you can use ldd to get the shared library dependencies for the program (which is useful, but not really about symbols). ldd actually loads or possibly runs the command, so if you don't trust the binary, the man page recommends objdump -p /path/to/program | grep NEEDED, but note that objdump does not recurse dependencies, while ldd does.

In this case, debug information will be embedded in the shared object file.

When using gdb on Linux or Android (or on Windows via MinGW), there are instructions available on how to separate the debug information which boil down to the following.

# move the debug information into symtest.debug
objcopy --only-keep-debug symtest symtest.debug

# remove the debug information from symtest
strip -g symtest

# add a link between the files
objcopy --add-gnu-debuglink=symtest.debug symtest

When using GDB, you can use set debug-file-directory _directories_ to point to wherever you kept the .debug files.

Debug symbols

Now, once we've linked symtest.exe, symbols are no longer available from within the binary itself in MSVC; instead, we have them in the .PDB file.

One way to poke at the information available in the file is to run windbg -z symtest.exe. The -z switch indicates we want to open a dump file rather than run a process, but you can use any .dll or .exe file as the "dump file", and the debugger will simply map it into memory and allow you to query the symbols it resolves from the .pdb.

This lets you do things like the following:

0:000> x symtest!*main*
00466000          symtest!__scrt_native_dllmain_reason = 0xffffffff
00406ef0          symtest!main (void)
004072cd          symtest!__scrt_main_policy::set_app_type (void)
004074a1          symtest!__scrt_dllmain_uninitialize_c (void)
004072a0          symtest!invoke_main (void)
004072fd          symtest!mainCRTStartup (void *)
00407419          symtest!__scrt_dllmain_before_initialize_c (void)
00407450          symtest!__scrt_dllmain_crt_thread_detach (void)
0040742a          symtest!__scrt_dllmain_crt_thread_attach (void)
004070a9          symtest!__scrt_common_main_seh (void)
...
0:000> u symtest!main
symtest!main [C:\Users\mruiz\AppData\Local\Temp\symtest.cpp @ 7]:
00406ef0 55              push    ebp
00406ef1 8bec            mov     ebp,esp
00406ef3 e812c9ffff      call    symtest!ILT+10245(?do_somethingYAXXZ) (0040380a)
00406ef8 33c0            xor     eax,eax
00406efa 5d              pop     ebp
00406efb c3              ret
00406efc cc              int     3
00406efd cc              int     3

Note how we get a file name and line number.

If you want to write a utility to dump symbol information (I know I've done this a handful of times for various purposes) you can use the Microsoft DIA SDK. This may be handy to find the largest functions for example, or to look at functions that you expect or not expect to see.

Packaging symbols

Symbol Server

On Windows, symbols typically end up stored in a .PDB file that is not necessary for running your program. If you want to make debug symbols available, you can either distribute the .PDB files alongside the program and install them next to corresponding binaries, or set up a symbol server.

A symbol server enables the debugger to automatically retrieve the correct symbol files from a symbol store - an indexed collection of symbol files - without the user needing to know product names, releases, or build numbers.. This works across many debugging tools - Visual Studio, WinDBG and WPA and many other ETW-based tools.

The Debugging Tools for Windows package includes all the tools you need to manage symbols servers - indexing, setting up stores, removal of private information (leaving only the public ones available).

Symbol Server LLDB

I haven't found an equivalent of symbol servers for LLDB (that is, a system where symbols can be looked up and retrieved from a network location).

There are a couple of similar solutions I've run into so far.

Source Indexing with PDBs

Having access to the correct .PDB file via a symbol server is extremely handy, because you don't have to worry about various build systems and versions and such. You will typically have build servers creating and releasing binaries, and indexing the PDB files so consumers can debug properly. However, that doesn't get you the source files at debug time, because those are not available in the .PDB file.

The Microsoft solution to this is to provide support for a source server. The name is a bit of a misnomer - there isn't one piece of server sfotware that acts as a source server. Instead, binaries are source indexed during the build process after the application has been built by storing the information needed in the PDB files.

The Debugging Tools for Windows come with scripts to index from various common source control systems, but they all boil down to genreating a string to be executed, either an HTTP URL or a command to be executed. All the variable replacement support give you a way to have shorter or more consistent strings, but if you decide to generate your own source server data blocks, you can do whatever you please. I remember doing indexing for GitHub sources with vcommit hash at some point fofr example, and with a bit of trial and error, this wored great.

I should write more on this at some point, but until then I will point to this good write-up on source indexing with URLs.

Source link is a source indexing alternative used in the .NET world, supported in both Windows PDB files (the ones discussed in general in this post) as well as in portable PDBs (cross-platform, .NET-specific files). However, native tools in general do not support this, and that's where we're focusing here.

Common errors

You can use nm -gC SO-NAME to list the symbols in a so.

It looks like this (truncated for brevity):

                 U __cxa_throw_bad_array_new_length@@CXXABI_1.3.8

The "U" entries are 'undefined symbols'. But they have a magic @@ suffix that explains (hints?) where they’re expected to be found. So if we have "U" entries without "@@" hints, then you're in trouble.

Try with link.exe.

Try with Linux tools.

REF: https://begriffs.com/posts/2021-07-04-shared-libraries.html?hn=2

REF: file:///E:/store/mruiz-hud-06/books/2018/The%20Linux%20Programming%20Interface.pdf

Happy symbol chasing!

Tags:  codingcppdebugging

Home