Relative VTbl

A few months ago I was debugging something quite odd, and learned about relative vtables. Come join me on my adventure!

The problem

I was writing some code to do the following.

// Declaration.
class IFoo {
public:
 virtual void first(int arg)=0;
 virtual void second(int arg)=0;
};

void makeCall(IFoo* foo, int arg) {
  foo->second(arg); // crash!
}

One particular here is that the instance of IFoo* was obtained by calling a C function in a separate shared library from the main program that was running makeCall. This will come in handy later, but the title might be giving some of this away.

The crash was occurring because the program was jumping into non-executable code at the point of the ->second(arg); invocation.

Looking at the disassembly, I was expecting to see something like this.

ldr     x0, [sp, #8] ; load foo into x0 (first argument)
ldr     w1, [sp, #4] ; load arg into x1 (second argument)
ldr     x8, [x0]     ; load vtbl into x8
ldr     x8, [x8, #8] ; load from x8 vtbl the value at
                     ; 8-byte offset ie second pointer, into x8
blr     x8           ; jumpt to function in x8

However, this is what I saw.

ldr     x0, [sp, #8] ; load foo into x0
ldr     w1, [sp, #4] ; load arg into x1
ldr     x8, [x0]     ; load vtbl into x8
ldrsw   x9, [x8, #4] ; load sign-extended from x8 vtbl
                     ; the 4-byte offset value, into x9
add     x8, x8, x9   ; add x9 into x8
blr     x8           ; jump to function in x8

Asking around, I eventually learned that the second pattern is what you'd expect to see for relative vtables, which makes sense - the extra instructions are calculating the relative offset before making the call.

I double checked that indeed, one of the binaries had -fexperimental-relative-c++-abi-vtables and the other didn't.

The solution

There are a couple of ways to fix this. Unfortunately, we cannot annotate an instance or a type to change the behavior here - we only have flags for the compiler and linker.

One is to use consistent flags across all translation units and objects that will declare and use these things. This may be hard to do when you don't control or cannot rebuild them to your heart's content, or when the viral nature of it makes it get out of control.

Another option is to avoid this altogether and use C APIs across boundaries, which is generally what Win32 traditionally has done. Even COM typically goes to the trouble of emitting C-accessible APIs that are unambiguous as to how virtual table layout should work.

Finally, you can try typecasting your problems away if you know the layout and conventions. This is a bit easier to do while targeting the non-relative API as there is less code, and simpler when you have reasonably shallow and streamlined object layouts, rather than multiple inheritance and so on.

For example, we can fix the problem with something like this.

void makeCall(IFoo* foo, int arg) {
 void ***obj = (void***)foo;
 void **vtbl = *obj;
 using fn = void(IFoo* foo, int arg);
 fn* second = (fn*)vtbl[1];
 second(foo, arg);
}

Under the debugger, you can use something like ln ADDR or x/a ADDR in windbg and lldb respectively to look at the target addresses and see whether they have the function you want.

Happy virtual table spelunking!

Tags:  debugging

Home