rudy
is tooling for Rust-specific debuginfo.
In short: when you run cargo build
by default cargo
invokes rustc
with -g
and outputs debug information
along with the binary (exactly where the debug information lives is platform-specific).
On MacOS and Linux, rustc
emits debugging information in a format called DWARF.
Debuggers can then parse the DWARF information in order to access debugging information like function
and type definitions, and mapping between addresses and source code.
rudy
makes it easy to work with DWARF information to power debugging features like
pretty printing of raw memory addresses, method calling, and source location lookups.
It currently consists of two main crates: rudy-dwarf
for directly interacting with
DWARF information, and rudy-db
which is a higher-level interface for common debugging
functionality implemented on top of rudy-dwarf
.
The project has two main goals:
- As an immediate benefit: make the Rust debugging experience with
lldb
better via therudy-lldb
extension. - Build foundational tooling to help with implementing Rust-specific debugging tools.
Rudy LLDB
Probably the most exciting and relevant feature for most folks: rudy-lldb
is
a small application that uses rudy-db
to provide an extension for the lldb
debugger.
Here's a short demo:
How this works:
- You add the Python
rudy-lldb
client as a script to~/.lldbinit
. e.g.command script import /Users/sam/work/rudy/rudy-lldb/python/rudy_lldb.py
- Whenever a rudy
rd
command is invoked the client will attempt to connect to a locally-runningrudy-lldb-server
. The client and server communicate over TCP. - The
rudy-lldb
server usesrudy-db
to query for debugging information and forwards any memory accesses or evaluations back tolldb
to handle.
The core idea is to extend lldb
with Rust-specific functionality. rudy-dwarf
(more on
it below) has built-in parsers for common standard library types like String
, Vec
, HashMap
, and
it also understands the memory layout of Rust types like enums and structs.
Similarly, rudy-db
is able to differentiate between functions, methods (i.e. functions in impl
blocks) and
trait implementations, and makes all of those available to be called along with a few other quality-of-life
features like indexing into Vec
s and HashMap
s.
It's still a young project but my hope is that rudy-lldb
can immediately make lldb
actually useful for
debugging Rust programs.
Rudy Dwarf
rudy-dwarf
is a "medium-level" library for interacting with DWARF info. It sits above gimli
(which is the real DWARF-parsing workhorse) but low-level enough that it should be suitable
for debuggers to use.
Most of the core functionality comes down to:
- Indexes for fast lookups (e.g. by address, by name, by module)
- Parser combinators for extracing data from debugging entries
- Visitor trait for walking entries
Dwarf Parsers
The parser is probably the most interesting part. This is where we can start to extract the layout of Rust types via reusable building blocks.
The Parser
trait is as simple as:
pub trait Parser<T> {
fn parse(&self, db: &dyn DwarfDb, entry: Die) -> Result<T>;
..
}
The db
field is a salsa
database which we're using for incremental computation (more on that below), and Die
is an opaque wrapper around an offset into a specific debugging file.
For example, we might have this DWARF debugging entry for a struct field:
0x00000037: DW_TAG_member
DW_AT_name ("id")
DW_AT_type (0x000002e7 "u64")
DW_AT_alignment (8)
DW_AT_data_member_location (0x18)
DW_AT_accessibility (DW_ACCESS_private)
which tells us the name of the field ("id"), its type, and layout information like it's byte-alignment and offset within the struct.
Since finding the offset of a field like this is fairly common rudy-dwarf
provides a DataOffset
parser for it:
impl Parser<usize> for DataOffset {
fn parse(&self, db: &dyn DwarfDb, entry: Die) -> Result<usize> {
Ok(entry.udata_attr(db, gimli::DW_AT_data_member_location)?)
}
}
Which can be constructed with data_offset()
.
If we want to get back all of the relevant information we'll use multiple parsers and a combinator:
all((
attr::<String>(gimli::DW_AT_name),
data_offset(),
entry_type(),
))
This constructs three parsers to get the (1) name attribute as a String
, (2) the data offset as above, and (3) a
reference to the entry type (i.e. that location 0x000002e7 which points to the u64
type).
The all
combinator turns these three parsers into a single parser returning a tuple of results.
And so if we're iterating through the entries representing the fields of the struct we can call:
let (name, data_offset, entry_type) = all((
attr::<String>(gimli::DW_AT_name),
data_offset(),
entry_type(),
)).parse(db, die)?;
to get back those three fields.
This use of parser combinators is powerful because it lets us create reusable building blocks for common DWARF patterns.
The above triple could easily be turned into a StructField
parser. We have other combinators
to apply the parser to all member entries, which means we can parser all struct fields easily. And so on.
Where this gets really cool is in seeing how it makes it possible to traverse complex Rust types and abstract away
the details. The layout for a Rust enum is fairly arcane, but we can build a parser for the Result<T, E>
type with:
let (name, size, (discriminant, (ok, err))) = all((
// get the result name, e.g. `Result<String, Error>`
attr::<String>(gimli::DW_AT_name),
// the size of the `Result` enum
attr::<usize>(gimli::DW_AT_byte_size),
// find the member entry with the variant_part tag
member_by_tag(gimli::DW_TAG_variant_part).then(
// parse the enum discriminant (another parser)
enum_discriminant().and(
// attempt to parse each of these child parsers
// for the children of the current entry
parse_children((
// helper parser to extract an enum variant with a known name
// where the fields are in tuple form `Ok(_)`. Shallow
// resolve the type of the first field, and
enum_named_tuple_variant("Ok", (resolve_type_shallow(),)).map(
|(discriminant, (inner_field,))| {
// we only expect one field, so unwrap it
(discriminant, inner_field)
},
),
// same again, but for the Err variant
enum_named_tuple_variant("Err", (resolve_type_shallow(),)).map(
|(discriminant, (inner_field,))| {
(discriminant, inner_field)
},
),
)),
)),
))
.parse(db, entry)?;
In this example we've parsed a pretty complex type definition but are using many reusable components.
e.g. enum_discriminant
is a generic parser useful for any enum. The enum_named_tuple_variant
parser similarly
is useful for any enum with tuple fields, and is also used in the Option
definition.
Not to mention the more abstract combinators like then
, and
,
and combinators for walking members like parse_children
, and member_by_tag
.
The goal of having a parser combinator framework like this is to make it easy for applications using rudy-dwarf
to build their own custom functionality. For example, a debugger that wanted to provide support for async programs
can use this combinator framework to extract information about the layout of tokio
structs. It's also possible to
add validation steps in the parser combinator framework. That way any changes in the internal layout of important
structs are caught loudly rather than causing silent bugs/failures.
Using Salsa
When starting out on this project I was originally targeting a much broader use case: building
a Rust-specific debugger from the ground up. As such I was thinking about the overall architecture
and looked to rust-analyzer
for inspiration.
Eventually, I decided to split out rudy
as its own thing and keep the scope smaller. But one part that
stuck with me was to (a) have a long-running server process and (b) use salsa
to do incremental computation
and re-computation with caching. (As an aside: I'd highly recommend watching David Barsky & Lukas Wirth's talk about
salsa
in rust-analyzer
).
Those seem like useful qualities in a debugger too:
- Debugging info can be large (especially when considering the standard library debug info) and so DWARF is structured such that you can avoid parsing the entire thing up front.
- On the other hand, certain computations are fairly slow: like computing indexes of all symbols; or needing to traverse multiple entries to parse a struct.
Therefore: incrementally computing information and caching is a good idea!
Furthermore, you may have many debug sessions for a single binary. And so it's nice to have a long-running process that persists the cache between sessions.
And so we use salsa
within rudy-dwarf
and rudy-db
as our caching mechanism.
The results are quite nice (on MacOS). Using this example code on some small example projects and a relatively large project (>700 dependencies).
rudy-db
computes a bunch of indexes up front. For the smaller examples, this takes 20-30ms. On the larger project, it can take upwards of 1-2s for a larger project.- From there, individual operations like "find function by name" tend to take around ~1ms for the smaller project, ~10ms on the larger project.
- Once the relevant indexes for those operations are computed, those times becomes about 0.2ms and 3ms respectively.
The implication is that much like with rust-analyzer there is an up-front indexing cost. After that most operations are pretty fast and only get faster. Keeping these times as small as possible is fairly important when considering that a debugger may want to run certain queries as the program is running (e.g. for watchpoints). And it's also nice to keep the UI snappy.
Overall though, I think there are bunch of improvements to be made. I didn't properly understand salsa
when first using it,
nor how DWARF information is typically laid out on different platforms. Particularly on Linux. The current approach works pretty well
on MacOS because Rust by default leaves debuginfo split from the binary leaving it with a bunch of small files. That actually works
really nicely when you are selectively computing indexes on individual files. However, on Linux we're doing the indexing for the entire
binary once up front, which seems unnecessarily expensive.
I plan on writing a follow up post to this that goes more into our use of salsa
and my understanding of how to effectively use it!
Digression: Why do we need better debugging tooling for Rust?
So that's Rudy! Please try it out, poke around, take a look and let me know what you think: GitHub repo.
I wanted to conclude this post with some thoughts on the current state of debugging for Rust.
I've now been using Rust for over 10 years. In all that time I think the only times I ever reached for an interactive debugger like lldb/gdb is when getting stack traces for stack overflows.
In many ways this was part of the draw of Rust for me. I picked up Rust after flailing miserably trying to program in C. I had no systems programming background, didn't understand anything about pointer or memory management.
But having started out using Python/Java/Ruby I did expect there to be a better debugging story for
Rust. I just didn't find the experience of using lldb
much better compared to adding tracing statements and
reading through logs.
Recently I noticed that print debugging can be a bit of a trap for me. It's easy to get lured into a sense of productivity as you go through the loop:
- Read the logs.
- Narrow in on where the problem might be.
- Adjust tracing levels or add more logging (and incur a compilation cost).
- Repeat
I've invested many hours in making nice debug printed formats for my types, making it easy to set configurable log levels, etc. But sometimes it would be helpful to drop into an interactive debugger at a point in the program and be able to query for the state of the system.
I also think there's untapped potential here for better debug tooling that goes further than either of those two approaches.
And so that's the motivation behind Rudy: I want to make lldb
debugging of a Rust program a viable option and alternative
to print debugging when the time calls for it. But I also want to provide foundational tooling that can unlock the next generation of debug
tooling. That does for debugging what projects like rust-analyzer
did for IDE support.