python: printing a Python string in GDB without python-gdb.py

Sometimes I find myself debugging native code inside a Python program (i.e., an extension module) and want to see the Python stack trace. The python-gdb.py GDB extension is supposed to do this for you, but I’ve had problems getting it to work (or exist at all) in my Conda-based development environments. Luckily, you can finagle this yourself.

Here I’ll be using a Debian Linux 12 ‘bookworm’ system on x86_64 with Python supplied by conda-forge and GDB from Debian. We’ll use this Python script to debug:

import ctypes

def my_native_code():
    intp = ctypes.POINTER(ctypes.c_int)
    bad_address = ctypes.cast(42, intp)
    print(bad_address.contents)

my_native_code()

If run, it’ll predictably crash:

$ python test.py
Segmentation fault

So let’s load it up into a debugger, and get the stack trace:

lidavidm@debian ~> gdb --args python test.py
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(gdb) r
Starting program: /home/lidavidm/miniforge3/envs/adbc/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b5b5ff in i_get (ptr=0x2a, size=4) at /usr/local/src/conda/python-3.11.7/Modules/_ctypes/cfield.c:645
645	/usr/local/src/conda/python-3.11.7/Modules/_ctypes/cfield.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7b5b5ff in i_get (ptr=0x2a, size=4) at /usr/local/src/conda/python-3.11.7/Modules/_ctypes/cfield.c:645
#1  0x00007ffff7b5fb3b in Simple_get_value (_unused_ignored=0x0, self=0x7ffff7601d00) at /usr/local/src/conda/python-3.11.7/Modules/_ctypes/_ctypes.c:4978
#2  Simple_repr (self=0x7ffff7601d00) at /usr/local/src/conda/python-3.11.7/Modules/_ctypes/_ctypes.c:5032
#3  0x000055555577618e in object_str (self=0x7ffff7601d00) at /usr/local/src/conda/python-3.11.7/Objects/typeobject.c:4613
#4  PyObject_Str (v=0x7ffff7601d00) at /usr/local/src/conda/python-3.11.7/Objects/object.c:492
#5  0x00005555558084cb in PyFile_WriteObject (v=0x7ffff7601d00, f=f@entry=0x7ffff7cc81e0, flags=flags@entry=1) at /usr/local/src/conda/python-3.11.7/Objects/fileobject.c:129
#6  0x0000555555807b05 in builtin_print_impl (module=<optimized out>, flush=0, file=0x7ffff7cc81e0, end=0x0, sep=0x0, args=0x7ffff7c92980)
    at /usr/local/src/conda/python-3.11.7/Python/bltinmodule.c:2039
#7  builtin_print (module=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.7/Python/clinic/bltinmodule.c.h:838
#8  0x000055555574e6bf in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7c5d670, args=0x7ffff7fb80e0, nargsf=<optimized out>, kwnames=0x0)
    at /usr/local/src/conda/python-3.11.7/Include/cpython/methodobject.h:52
#9  0x000055555574e5ac in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff7c5d670,
    tstate=0x555555ad0218 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.7/Include/internal/pycore_call.h:92
#10 PyObject_Vectorcall (callable=0x7ffff7c5d670, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.7/Objects/call.c:299
#11 0x0000555555741a36 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0218 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7fb8020, throwflag=throwflag@entry=0)
    at /usr/local/src/conda/python-3.11.7/Python/ceval.c:4769
#12 0x00005555557f88bd in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb8020, tstate=0x555555ad0218 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.7/Include/internal/pycore_ceval.h:73
#13 _PyEval_Vector (tstate=tstate@entry=0x555555ad0218 <_PyRuntime+166328>, func=func@entry=0x7ffff7ca5f80, locals=locals@entry=0x7ffff7cc65c0, args=args@entry=0x0,
    argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.7/Python/ceval.c:6434
#14 0x00005555557f7f4f in PyEval_EvalCode (co=co@entry=0x7ffff776ecd0, globals=globals@entry=0x7ffff7cc65c0, locals=locals@entry=0x7ffff7cc65c0)
    at /usr/local/src/conda/python-3.11.7/Python/ceval.c:1148
#15 0x0000555555816eaa in run_eval_code_obj (tstate=tstate@entry=0x555555ad0218 <_PyRuntime+166328>, co=co@entry=0x7ffff776ecd0, globals=globals@entry=0x7ffff7cc65c0,
    locals=locals@entry=0x7ffff7cc65c0) at /usr/local/src/conda/python-3.11.7/Python/pythonrun.c:1710
#16 0x0000555555812a23 in run_mod (mod=mod@entry=0x555555c0aa60, filename=filename@entry=0x7ffff7718580, globals=globals@entry=0x7ffff7cc65c0, locals=locals@entry=0x7ffff7cc65c0,
    flags=flags@entry=0x7fffffffc7e8, arena=arena@entry=0x7ffff7beee70) at /usr/local/src/conda/python-3.11.7/Python/pythonrun.c:1731
#17 0x0000555555827de0 in pyrun_file (fp=fp@entry=0x555555b39440, filename=filename@entry=0x7ffff7718580, start=start@entry=257, globals=globals@entry=0x7ffff7cc65c0,
    locals=locals@entry=0x7ffff7cc65c0, closeit=closeit@entry=1, flags=0x7fffffffc7e8) at /usr/local/src/conda/python-3.11.7/Python/pythonrun.c:1626
#18 0x000055555582777e in _PyRun_SimpleFileObject (fp=0x555555b39440, filename=0x7ffff7718580, closeit=1, flags=0x7fffffffc7e8) at /usr/local/src/conda/python-3.11.7/Python/pythonrun.c:440
#19 0x00005555558274a4 in _PyRun_AnyFileObject (fp=0x555555b39440, filename=filename@entry=0x7ffff7718580, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffc7e8)
    at /usr/local/src/conda/python-3.11.7/Python/pythonrun.c:79
#20 0x0000555555821b94 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff7718580, program_name=0x7ffff7c26a90) at /usr/local/src/conda/python-3.11.7/Modules/main.c:360
#21 pymain_run_file (config=0x555555ab6260 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.7/Modules/main.c:379
#22 pymain_run_python (exitcode=0x7fffffffc7e4) at /usr/local/src/conda/python-3.11.7/Modules/main.c:601
#23 Py_RunMain () at /usr/local/src/conda/python-3.11.7/Modules/main.c:680
#24 0x00005555557e7f47 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.7/Modules/main.c:734
#25 0x00007ffff7d031ca in __libc_start_call_main (main=main@entry=0x5555557e7ea0 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffca38) at ../sysdeps/nptl/libc_start_call_main.h:58
#26 0x00007ffff7d03285 in __libc_start_main_impl (main=0x5555557e7ea0 <main>, argc=2, argv=0x7fffffffca38, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffca28) at ../csu/libc-start.c:360
#27 0x00005555557e7ded in _start ()

We can see lots of frames from the CPython interpreter. We want to find the PyInterpreterFrame object, which has the Python stack frame. It’s named frame, and here it can be found in the _PyEval_EvalFrame frame.

(gdb) f 12
#12 0x00005555557f88bd in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb8020, tstate=0x555555ad0218 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.7/Include/internal/pycore_ceval.h:73
73	/usr/local/src/conda/python-3.11.7/Include/internal/pycore_ceval.h: No such file or directory.
(gdb) p frame
$1 = (struct _PyInterpreterFrame *) 0x7ffff7fb8020
(gdb) p *frame
$2 = {f_func = 0x7ffff7ca5f80, f_globals = 0x7ffff7cc65c0, f_builtins = 0x7ffff7c64e40, f_locals = 0x7ffff7cc65c0, f_code = 0x7ffff776ecd0, frame_obj = 0x0, previous = 0x0,
  prev_instr = 0x7ffff776eda8, stacktop = 0, is_entry = true, owner = 0 '\000', localsplus = {0x0}}

If we access f_code, that has a property co_filename that is a Python string, i.e. it’s a PyObject*, and its ob_type is PyUnicode_Type. But how do we get the actual string value out?

(gdb) p *frame->f_code
$3 = {ob_base = {ob_base = {ob_refcnt = 3, ob_type = 0x555555aa48c0 <PyCode_Type>}, ob_size = 20}, co_consts = 0x7ffff7669800, co_names = 0x7ffff766aa40,
  co_exceptiontable = 0x555555aa9bc8 <_PyRuntime+9064>, co_flags = 0, co_warmup = -7, _co_linearray_entry_size = 0, co_argcount = 0, co_posonlyargcount = 0, co_kwonlyargcount = 0,
  co_stacksize = 2, co_firstlineno = 1, co_nlocalsplus = 0, co_nlocals = 0, co_nplaincellvars = 0, co_ncellvars = 0, co_nfreevars = 0,
  co_localsplusnames = 0x555555ab5e78 <_PyRuntime+58904>, co_localspluskinds = 0x555555aa9bc8 <_PyRuntime+9064>, co_filename = 0x7ffff7718580, co_name = 0x555555aaccf0 <_PyRuntime+21648>,
  co_qualname = 0x555555aaccf0 <_PyRuntime+21648>, co_linetable = 0x7ffff763c870, co_weakreflist = 0x0, _co_code = 0x0, _co_linearray = 0x0, _co_firsttraceable = 0, co_extra = 0x0,
  co_code_adaptive = "\227"}
(gdb) p *frame->f_code->co_filename
$4 = {ob_refcnt = 5, ob_type = 0x555555a8d7e0 <PyUnicode_Type>}

We can look at the representation in unicodeobject.h. It turns out that a PyObject* is just the pointer to the “head” of the object, so to speak, and we need to cast it to the “actual” type. That’s either PyASCIIObject*, PyCompactUnicodeObject*, or PyUnicodeObject* from the header. Furthermore, these structs “extend” each other, so we can first just cast to PyASCIIObject* and look at the flags within.

(gdb) p *(PyASCIIObject*)frame->f_code->co_filename
$5 = {ob_base = {ob_refcnt = 5, ob_type = 0x555555a8d7e0 <PyUnicode_Type>}, length = 22, hash = -1, state = {interned = 0, kind = 1, compact = 1, ascii = 1, ready = 1}, wstr = 0x0}

The flags in state show that this is a “compact ASCII” string, going by the reference in the header. And, the header states (data starts just after the structure). So in memory, things look like this:

  • struct PyASCIIObject
    • struct PyObject_HEAD
    • Py_ssize_t length
    • Py_hash_t hash
    • struct state
    • wchar_t* wstr
  • char (VLA)

That means to get the actual string value, we need to get a pointer to 1 after the end of the PyASCIIObject, then cast that to a char*:

(gdb) p (char*)(((PyASCIIObject*)frame->f_code->co_filename) + 1)
$6 = 0x7ffff77185b0 "/home/lidavidm/test.py"

And as expected, we get our string value!