Today's post is another one where we work together through problem solving, although this time edited to cut dead ends and be a more straightforward read.
My goal was to be able to query data structures in debugging targets using SQL, to be able to cross-reference data in various subsystems.
For fun, this time we're going to do this in a Linux VM running within Windows.
First, let's make sure we have an Ubuntu VM setup. If you're on some other Linux, something of this sort should also work to setup dependencies as well.
I start off with a straight WSL Ubuntu setup, then install updates and packages that we'll need.
sudo apt update
sudo apt install sqlite3
sudo apt install clang
sudo apt install lldb
Let's make sure we can run all the things we intend to.
This is going to be our hello.c
program to use.
#include <stdio.h>
int main() {
printf("Hello, world!\n");
return 0;
}
Let's make sure that clang works fine.
$ mkdir -p ~/scratch && cd ~/scratch
$ vim hello.c # paste in hello.c
$ clang -g -o hello hello.c
$ ./hello
Hello, world!
If I try lldb right now, I'm going to run into trouble with this message: ModuleNotFoundError: No module named 'lldb.embedded_interpreter'
Here is the fix, brought to us via StackOverflow. ls /usr/lib/llvm-14/lib/python3.10/dist-packages/lldb
shows _lldb.cpython-310-x86_64-linux-gnu.so and lldb-argdumper are broken.
# where is this pointing?
$ readlink /usr/lib/llvm-14/lib/python3.10/dist-packages/lldb/_lldb.cpython-310-x86_64-linux-gnu.so
../../../../../lib/liblldb.so
# where can I find the missing files in the package?
$ find /usr/lib/llvm-14/lib | grep liblldb.so
/usr/lib/llvm-14/lib/liblldb.so.1
$ find /usr/lib/llvm-14 | grep lldb-argdumper
/usr/lib/llvm-14/lib/python3.10/dist-packages/lldb/lldb-argdumper
/usr/lib/llvm-14/bin/lldb-argdumper
# Let's fix it!
$ cd /usr/lib/llvm-14/lib/python3.10/dist-packages/lldb
$ sudo ln -sf /usr/lib/llvm-14/lib/liblldb.so.1 _lldb.cpython-310-x86_64-linux-gnu.so
$ sudo ln -sf /usr/lib/llvm-14/bin/lldb-argdumper lldb-argdumper
# And let's make this accessible.
$ export PYTHONPATH='/usr/lib/llvm-14/lib/python3.10/dist-packages'
OK, let's do this.
$ lldb hello
(lldb) target create "hello"
Current executable set to '/home/mlrdev/scratch/hello' (x86_64).
(lldb) b main
Breakpoint 1: where = hello`main + 15 at hello.c:4:3, address = 0x000000000000114f
(lldb) r
Process 2197 launched: '/home/mlrdev/scratch/hello' (x86_64)
Process 2197 stopped
* thread #1, name = 'hello', stop reason = breakpoint 1.1
frame #0: 0x000055555555514f hello`main at hello.c:4:3
1 #include <stdio.h>
2
3 int main() {
-> 4 printf("Hello, world!\n");
5 return 0;
6 }
(lldb) c
Process 2197 resuming
Hello, world!
Process 2197 exited with status = 0 (0x00000000)
(lldb) exit
Let's see if lldb can use python. Actually, we won't need this straight off python, but it's useful to play around with.
http://lopezruiz.net/2022/07/27-bootstrapping-lldb-scripting.htm
$ python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> exit()
Now we're going to need APSW, so let's get that, including pip.
sudo apt install python3-venv python3-pip
python3 -m pip install apsw
And let's take that for a spin. Let's run python3
and go through this.
import apsw
import apsw.ext
connection = apsw.Connection(":memory:")
# Yield a row at a time
def table_range(start=1, stop=100, step=1):
for i in range(start, stop + 1, step):
yield (i,)
# set column names
table_range.columns = ("value",)
# set how to access what table_range returns
table_range.column_access = apsw.ext.VTColumnAccess.By_Index
# register it
apsw.ext.make_virtual_module(connection, "range", table_range)
# see it work. we can provide both positional and keyword
# arguments
query = "SELECT * FROM range(90) WHERE step=2"
print(apsw.ext.format_query_table(connection, query))
And out come the results, so everything is in order. Let's write some proper code!
If you've run into trouble before this, you'll want to make sure you get things fixed before moving forward - nothing's going to get easier by integrating many things at the same time.
I often find myself working in processes with hundreds of modules. It would sure be nice to be able to filter them by name, order them by address or do other kinds of presentations.
First, let's see how we can print these.
>>> d = lldb.debugger
>>> t = d.GetSelectedTarget()
>>> for m in t.modules:
... print(m)
...
(x86_64) /home/mlrdev/scratch/hello
(x86_64) /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
(x86_64) [vdso](0x00007ffff7fc1000)
(x86_64) /lib/x86_64-linux-gnu/libc.so.6
OK, now let's wrap that in a table.
# Yield a row at a time
d = lldb.debugger
t = d.GetSelectedTarget()
def sfd_table_modules():
idx = 0
for m in t.modules:
yield (idx, m.file.fullpath, m.GetUUIDString(),hex(m.GetObjectFileHeaderAddress().load_addr))
idx += 1
sfd_table_modules.columns = ("idx","file_name","uuid","addr")
sfd_table_modules.column_access = apsw.ext.VTColumnAccess.By_Index
# register it
connection = apsw.Connection(":memory:")
apsw.ext.make_virtual_module(connection, "sfd_modules", sfd_table_modules)
query = "SELECT * FROM sfd_modules order by file_name"
print(apsw.ext.format_query_table(connection, query))
Let's update our hello.c program to be a bit more interesting.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
struct entry {
char* name;
int active;
};
struct entry entries[10];
int entry_count;
int main() {
char buf[20];
for (;;) {
fgets(buf, sizeof(buf), stdin);
buf[strlen(buf)-1]='\0';
printf("[%s]\n", buf);
if (0 == strcmp(buf, "x")) {
return 0;
}
entries[entry_count].name = strdup(buf);
entries[entry_count].active = 1;
entry_count++;
}
return 0;
}
App-specific command: list entries in a global.
>>> d = lldb.debugger
>>> t = lldb.target
>>> v = t.CreateValueFromExpression("entries", "entries")
>>> print(v)
(entry[10]) entries = {
[0] = (name = "hello", active = 1)
[1] = (name = "", active = 1)
[2] = (name = 0x0000000000000000, active = 0)
[3] = (name = 0x0000000000000000, active = 0)
[4] = (name = 0x0000000000000000, active = 0)
[5] = (name = 0x0000000000000000, active = 0)
[6] = (name = 0x0000000000000000, active = 0)
[7] = (name = 0x0000000000000000, active = 0)
[8] = (name = 0x0000000000000000, active = 0)
[9] = (name = 0x0000000000000000, active = 0)
}
>>> print(v.num_children)
10
>>> e0=v.children[0]
>>> print(e0.path)
entries[0]
>>> print(e0.num_children)
2
>>> print(e0.GetChildMemberWithName('active'))
(int) active = 1
>>> e0a = e0.GetChildMemberWithName('active')
>>> print(e0a.value)
1
>>> e0n = e0.GetChildMemberWithName('name')
>>> print(e0n)
(char *) name = 0x0000555555559ac0 "hello"
>>> error = lldb.SBError()
>>> s = lldb.process.ReadCStringFromMemory(e0n.Dereference().load_addr, 100, error)
>>> print(s)
hello
OK, now let's turn that into a proper table read.
d = lldb.debugger
t = d.GetSelectedTarget()
def sfd_table_modules():
idx = 0
for m in t.modules:
yield (idx, m.file.fullpath, m.GetUUIDString(),hex(m.GetObjectFileHeaderAddress().load_addr))
idx += 1
sfd_table_modules.columns = ("idx","file_name","uuid","addr")
sfd_table_modules.column_access = apsw.ext.VTColumnAccess.By_Index
# register it
connection = apsw.Connection(":memory:")
apsw.ext.make_virtual_module(connection, "sfd_modules", sfd_table_modules)
query = "SELECT * FROM sfd_modules order by file_name"
print(apsw.ext.format_query_table(connection, query))
And that's it! We can of course register additional virtual tables, and then start doing things like joining tables or aggregating data.
These are some of the resources I used when putting this together.
Happy debugging!