Introducing Rudy: A Toolchain for Rust Debuginfo

July 9, 2025 (1mo ago)

rudy is tooling for Rust-specific debuginfo.

In short: when you run cargo build by default cargo invokes rustc with -g and outputs debug information along with the binary (exactly where the debug information lives is platform-specific). On MacOS and Linux, rustc emits debugging information in a format called DWARF. Debuggers can then parse the DWARF information in order to access debugging information like function and type definitions, and mapping between addresses and source code.

rudy makes it easy to work with DWARF information to power debugging features like pretty printing of raw memory addresses, method calling, and source location lookups. It currently consists of two main crates: rudy-dwarf for directly interacting with DWARF information, and rudy-db which is a higher-level interface for common debugging functionality implemented on top of rudy-dwarf.

The project has two main goals:

As an immediate benefit: make the Rust debugging experience with lldb better via the rudy-lldb extension.
Build foundational tooling to help with implementing Rust-specific debugging tools.

Rudy LLDB

Probably the most exciting and relevant feature for most folks: rudy-lldb is a small application that uses rudy-db to provide an extension for the lldb debugger.

Here's a short demo:

How this works:

You add the Python rudy-lldb client as a script to ~/.lldbinit. e.g. command script import /Users/sam/work/rudy/rudy-lldb/python/rudy_lldb.py
Whenever a rudy rd command is invoked the client will attempt to connect to a locally-running rudy-lldb-server. The client and server communicate over TCP.
The rudy-lldb server uses rudy-db to query for debugging information and forwards any memory accesses or evaluations back to lldb to handle.

The core idea is to extend lldb with Rust-specific functionality. rudy-dwarf (more on it below) has built-in parsers for common standard library types like String, Vec, HashMap, and it also understands the memory layout of Rust types like enums and structs.

Similarly, rudy-db is able to differentiate between functions, methods (i.e. functions in impl blocks) and trait implementations, and makes all of those available to be called along with a few other quality-of-life features like indexing into Vecs and HashMaps.

It's still a young project but my hope is that rudy-lldb can immediately make lldb actually useful for debugging Rust programs.

Rudy Dwarf

rudy-dwarf is a "medium-level" library for interacting with DWARF info. It sits above gimli (which is the real DWARF-parsing workhorse) but low-level enough that it should be suitable for debuggers to use.

Most of the core functionality comes down to:

Indexes for fast lookups (e.g. by address, by name, by module)
Parser combinators for extracing data from debugging entries
Visitor trait for walking entries

Dwarf Parsers

The parser is probably the most interesting part. This is where we can start to extract the layout of Rust types via reusable building blocks.

The Parser trait is as simple as:

pub trait Parser<T> {
    fn parse(&self, db: &dyn DwarfDb, entry: Die) -> Result<T>;
    ..
}

The db field is a salsa database which we're using for incremental computation (more on that below), and Die is an opaque wrapper around an offset into a specific debugging file.

For example, we might have this DWARF debugging entry for a struct field:

0x00000037:       DW_TAG_member
                    DW_AT_name  ("id")
                    DW_AT_type  (0x000002e7 "u64")
                    DW_AT_alignment     (8)
                    DW_AT_data_member_location  (0x18)
                    DW_AT_accessibility (DW_ACCESS_private)

which tells us the name of the field ("id"), its type, and layout information like it's byte-alignment and offset within the struct.

Since finding the offset of a field like this is fairly common rudy-dwarf provides a DataOffset parser for it:

impl Parser<usize> for DataOffset {
    fn parse(&self, db: &dyn DwarfDb, entry: Die) -> Result<usize> {
        Ok(entry.udata_attr(db, gimli::DW_AT_data_member_location)?)
    }
}

Which can be constructed with data_offset().

If we want to get back all of the relevant information we'll use multiple parsers and a combinator:

all((
    attr::<String>(gimli::DW_AT_name),
    data_offset(),
    entry_type(),
))

This constructs three parsers to get the (1) name attribute as a String, (2) the data offset as above, and (3) a reference to the entry type (i.e. that location 0x000002e7 which points to the u64 type).

The all combinator turns these three parsers into a single parser returning a tuple of results.

And so if we're iterating through the entries representing the fields of the struct we can call:

let (name, data_offset, entry_type) = all((
    attr::<String>(gimli::DW_AT_name),
    data_offset(),
    entry_type(),
)).parse(db, die)?;

to get back those three fields.

This use of parser combinators is powerful because it lets us create reusable building blocks for common DWARF patterns. The above triple could easily be turned into a StructField parser. We have other combinators to apply the parser to all member entries, which means we can parser all struct fields easily. And so on.

Where this gets really cool is in seeing how it makes it possible to traverse complex Rust types and abstract away the details. The layout for a Rust enum is fairly arcane, but we can build a parser for the Result<T, E> type with:

let (name, size, (discriminant, (ok, err))) = all((
    // get the result name, e.g. `Result<String, Error>`
    attr::<String>(gimli::DW_AT_name),
    // the size of the `Result` enum
    attr::<usize>(gimli::DW_AT_byte_size),
    // find the member entry with the variant_part tag
    member_by_tag(gimli::DW_TAG_variant_part).then(
        // parse the enum discriminant (another parser)
        enum_discriminant().and(
        // attempt to parse each of these child parsers
        // for the children of the current entry
        parse_children((
            // helper parser to extract an enum variant with a known name
            // where the fields are in tuple form `Ok(_)`. Shallow
            // resolve the type of the first field, and 
            enum_named_tuple_variant("Ok", (resolve_type_shallow(),)).map(
                |(discriminant, (inner_field,))| {
                    // we only expect one field, so unwrap it
                    (discriminant, inner_field)
                },
            ),
            // same again, but for the Err variant
            enum_named_tuple_variant("Err", (resolve_type_shallow(),)).map(
                |(discriminant, (inner_field,))| {
                    (discriminant, inner_field)
                },
            ),
        )),
    )),
))
.parse(db, entry)?;

In this example we've parsed a pretty complex type definition but are using many reusable components. e.g. enum_discriminant is a generic parser useful for any enum. The enum_named_tuple_variant parser similarly is useful for any enum with tuple fields, and is also used in the Option definition. Not to mention the more abstract combinators like then, and, and combinators for walking members like parse_children, and member_by_tag.

The goal of having a parser combinator framework like this is to make it easy for applications using rudy-dwarf to build their own custom functionality. For example, a debugger that wanted to provide support for async programs can use this combinator framework to extract information about the layout of tokio structs. It's also possible to add validation steps in the parser combinator framework. That way any changes in the internal layout of important structs are caught loudly rather than causing silent bugs/failures.

Using Salsa

When starting out on this project I was originally targeting a much broader use case: building a Rust-specific debugger from the ground up. As such I was thinking about the overall architecture and looked to rust-analyzer for inspiration.

Eventually, I decided to split out rudy as its own thing and keep the scope smaller. But one part that stuck with me was to (a) have a long-running server process and (b) use salsa to do incremental computation and re-computation with caching. (As an aside: I'd highly recommend watching David Barsky & Lukas Wirth's talk about salsa in rust-analyzer).

Those seem like useful qualities in a debugger too:

Debugging info can be large (especially when considering the standard library debug info) and so DWARF is structured such that you can avoid parsing the entire thing up front.
On the other hand, certain computations are fairly slow: like computing indexes of all symbols; or needing to traverse multiple entries to parse a struct.

Therefore: incrementally computing information and caching is a good idea!

Furthermore, you may have many debug sessions for a single binary. And so it's nice to have a long-running process that persists the cache between sessions.

And so we use salsa within rudy-dwarf and rudy-db as our caching mechanism.

The results are quite nice (on MacOS). Using this example code on some small example projects and a relatively large project (>700 dependencies).

rudy-db computes a bunch of indexes up front. For the smaller examples, this takes 20-30ms. On the larger project, it can take upwards of 1-2s for a larger project.
From there, individual operations like "find function by name" tend to take around ~1ms for the smaller project, ~10ms on the larger project.
Once the relevant indexes for those operations are computed, those times becomes about 0.2ms and 3ms respectively.

The implication is that much like with rust-analyzer there is an up-front indexing cost. After that most operations are pretty fast and only get faster. Keeping these times as small as possible is fairly important when considering that a debugger may want to run certain queries as the program is running (e.g. for watchpoints). And it's also nice to keep the UI snappy.

Overall though, I think there are bunch of improvements to be made. I didn't properly understand salsa when first using it, nor how DWARF information is typically laid out on different platforms. Particularly on Linux. The current approach works pretty well on MacOS because Rust by default leaves debuginfo split from the binary leaving it with a bunch of small files. That actually works really nicely when you are selectively computing indexes on individual files. However, on Linux we're doing the indexing for the entire binary once up front, which seems unnecessarily expensive.

I plan on writing a follow up post to this that goes more into our use of salsa and my understanding of how to effectively use it!

Digression: Why do we need better debugging tooling for Rust?

So that's Rudy! Please try it out, poke around, take a look and let me know what you think: GitHub repo.

I wanted to conclude this post with some thoughts on the current state of debugging for Rust.

I've now been using Rust for over 10 years. In all that time I think the only times I ever reached for an interactive debugger like lldb/gdb is when getting stack traces for stack overflows.

In many ways this was part of the draw of Rust for me. I picked up Rust after flailing miserably trying to program in C. I had no systems programming background, didn't understand anything about pointer or memory management.

But having started out using Python/Java/Ruby I did expect there to be a better debugging story for Rust. I just didn't find the experience of using lldb much better compared to adding tracing statements and reading through logs.

Recently I noticed that print debugging can be a bit of a trap for me. It's easy to get lured into a sense of productivity as you go through the loop:

Read the logs.
Narrow in on where the problem might be.
Adjust tracing levels or add more logging (and incur a compilation cost).
Repeat

I've invested many hours in making nice debug printed formats for my types, making it easy to set configurable log levels, etc. But sometimes it would be helpful to drop into an interactive debugger at a point in the program and be able to query for the state of the system.

I also think there's untapped potential here for better debug tooling that goes further than either of those two approaches.

And so that's the motivation behind Rudy: I want to make lldb debugging of a Rust program a viable option and alternative to print debugging when the time calls for it. But I also want to provide foundational tooling that can unlock the next generation of debug tooling. That does for debugging what projects like rust-analyzer did for IDE support.