Building a Rust Crate Summarizer with Workers AI: Lessons Learned
July 25, 2025 (Today)
Next week, I'm joining Cloudflare on the Workers AI team. I'm excited about joining because I believe the primitives on their developer platform make it a great choice for folks building AI tooling and infrastructure.
But belief is one thing. To put that belief to the test, I decided to build something I've personally been wanting: a service that summarizes Rust crates to give AI agents better context. This project forced me to confront the realities of the platform: the good and the bad. I wrestled with WASM, discovered the power of durable workflows, and was genuinely surprised by how simple the AI integration was.
Here's what I learned.
Background
For the past five months or so, I've been experimenting with using AI in my day-to-day workflows. I used it to help me build a Rust debugger for MacOS, and released a part of this in the form of Rudy.
Rust is a fantastic target for AI agents: a combination of the strong type system, detailed compiler messages, and linting tooling like clippy results in a robust feedback loop for agents writing software. Most of my workflow looks similar to antirez. I use it as a pairing buddy to talk through large structural changes, and a tool to write tedious parts of the code.
However, the biggest challenge I've been facing with Rust specifically is the lack of knowledge about niche crates
or recently modified crates. For my debugger the AI would frequently attempt to use mach
--
a crate that has been abandoned since ~2020. I've also been trying to use salsa
which is "Very much a WORK IN PROGRESS at this point". While Claude seems to roughly
know about salsa, it has undergone a few large API changes in recent times, and so Claude keeps misusing APIs and getting stuck.
Typically what happens is: Claude gets a ton of compiler errors -> realizes its not using salsa correctly -> attempts to find documentation or the salsa source code to figure out why -> gets stuck and decides to instead delete the code or tests until it compiles and pretends it succeeded.


I figured it should be possible to have a service (perhaps an MCP server) that can augment agents like Claude Code with information about the crate: basic description, important APIs, etc.
Implementation
The rough flow of what I wanted was:
- An API to request indexing a crate by name and version
- Index requests get put on an indexing queue
- We have some processing tasks that pull requests off the queue -> Download the crate -> Parse it -> Do "analysis" -> Store the analysis somewhere
- Users can access the analysis via an API endpoint, eg.
GET /crates/<name>/<version>/analysis
.
Parsing Rust Code
For my v0, I figured I could have a single endpoint that does the full download + analyze loop without any fancy storage/caching of intermediate results or queues.
My first hurdle was parsing Rust code. The obvious choice was tree-sitter, but its standard Node.js bindings rely on native C++ modules, which are a non-starter in the Workers sandbox.
The next logical step was WASM. Both tree-sitter and my own custom Rust parser could be compiled to WASM.
In theory, this should be straightforward. In practice, I spent the better part of a day trying to get my Worker to correctly import the compiled module.
I wrestled with wasm-bindgen
and wasm-pack
, trying to generate a compatible ES module, but kept hitting dead ends with import resolution errors inside the Worker runtime.
The core issue is that the Workers runtime, while V8-based, has its own module loading logic that differs from both browsers and Node.js.
I would typically hit errors like either:
Error: Failed to initialize WASM Rust parser: Error: ENOENT: no such file or directory, open './wasm/tree-sitter-rust.wasm'
❯ WasmRustParser.initialize src/parser/wasm-parser.ts:53:13
51| this.initialized = true
52| } catch (error) {
53| throw new Error(`Failed to initialize WASM Rust parser: ${error}`)
| ^
54| }
55| }
or
WebAssembly.instantiate(): Imports argument must be present and must be an object
at null.<anonymous> (index.js:84:34)
There's probably a way to make it work, but not one that I could figure out in the time I had given myself.
This is where I had the realization: I was fighting the platform. While examples of importing WASM were scarce, there were plenty of examples
of using Rust-based WASM workers directly. I created a dedicated rust-parser
Worker -- a simple service whose only job is to accept Rust code and return a JSON AST.
It was trivial to deploy and call via RPC. In a production setting I would want to understand the performance implications of adding the RPC call, but in
my informal testing it worked great.
What I liked: Architecturally, adding another Worker is lightweight. It's a clean way to isolate complex dependencies like a WASM parser, turning a frustrating build-time problem into a simple runtime RPC call.
What I disliked: Importing third-party WASM modules in a Worker shouldn't be this difficult. The path of least resistance was to pay the RPC overhead, which feels like a workaround for a tooling or documentation gap.
Discovering Workflows
Now that I could reliably parse Rust code, I was at the "draw the rest of the owl" step.
Going in I had roughly zero understanding of different Cloudflare products, and so my thought process was roughly:
- I need a job queue -- great, Cloudflare has queues
- This is cool, I can write workers that read from the queue.
- I'll have my workers write to D1 (Cloudflare's sqlite as a service) to keep track of state.
And so I started building just that. However, I was getting a bit nervous around how long my processing worker would need to take. Based on the Worker limits it looked like my CPU time would be limited to 10ms on the free tier. Even though that doesn't seem to count async IO time, I was worried I would blow through that. Which got me thinking: am I doing this wrong? Workers seem to be designed to be small lightweight functions.
It turns out I was wrong. Workers that run as queue consumers have different limits, with much more generous CPU time allowances. But this exploration nevertheless led to discovering Cloudflare Workflows.
Workflows was an "aha" moment for me. I got to delete a bunch of code for dealing with state management, errors, and retries. And the result was a much more robust pipeline than I would have bothered to implement for myself. At this point the concept of Workers started to click: the Worker is the primitive inside the platform. By picking a relatively small primitive (i.e. JavaScript functions as opposed to arbitrary code) Cloudflare has much more leverage in building additional abstractions on top of the primitive. Such as building a durable execution engine composed of small functions. This is presumably all fairly obvious if you're been following along with the serverless world.
Here's what the final architecture looks like:


The result is a pretty robust workflow:
- A request to index a crate comes in -> gets put on a queue
- In crate processing task, pull from the queue, and kick off a workflow to:
- Fetch the crate, store in R2
- Parse the crate, store the parsed result in R2
- Subsquent API requests for that crate's info returns whether it has been processed or not
What I liked: Workers as a primitive makes it close to trivial to build complex use cases without needing to worry about any of the orchestration parts.
What I disliked: Cloudflare has a lot of products, and it wasn't obvious to me which to use when. I was missing a conceptual overview to help me think about my use case. And I'm still not sure if I should have tried to implement everything in one worker, or if multiple workers is the best practice.
AI-enhanced
Finally, it was time to actually use Workers AI!
I was making this up as I went, but the idea was to feed the AI different levels of detail about the codebase and get it to provide summaries. e.g. "here are the public APIs of the crate: what does it do?", or "here's a struct and its methods: what is it's purpose?". I also had a vague idea that it might be nice to fit RAG in there somewhere so that for an arbitrary query we could find relevant snippets.
This was about as easy as it gets. Adding support for the AI models is a few lines of config:
"ai": {
"binding": "AI"
},
and some typical AI SDK code:
async generateAISummary(prompt: string): Promise<string> {
try {
const { response } = await this.env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{
messages: [
{
role: 'system',
content: getSystemPrompt(),
},
{
role: 'user',
content: prompt,
},
],
max_tokens: 200,
},
)
return response || 'AI summary unavailable'
} catch (error) {
console.warn('AI summary generation failed:', error)
return 'AI summary unavailable'
}
}
It's pretty cool that it doesn't even require an API key to run this. The RAG part was even more impressive. With AutoRag, by simply choosing to put the parsed Rust code in R2, I had already done all the work to make it queryable. All I had to do was go to the dashboard and turn it on. There was no setup, no vector database to manage, and no embedding code to write. This was the easiest part of the entire project by far, and I think it's evidence of the power of the existing primitives.
At that point, my proof of concept was officially done -- functionally complete and the infrastructure working perfectly.
However, to be truly useful the AI responses require tuning. Here are a couple of examples:
Suppose you want to know what the undocumented function rudy_dwarf::modules::get_containing_module
function does.
You get back:
**Summary:**
The `get_containing_module` function retrieves the path of the module that contains a given DIE (Debug Information Entry) in a DWARF database.
**Parameters:**
- `db`: A reference to a `DwarfDb` object, which represents the DWARF database.
- `die`: A `Die` object, which represents a single DIE in the database.
**Return Value:**
The function returns an `Option` containing a `String` representing the path of the containing module, or `None` if the DIE is not found in the database.
**Key Logic:**
1. The function first calculates the index of the module that contains the DIE's file using the `module_index` function.
2. It then finds the DIE's offset in the module using the `die.offset()` method.
3. The function uses the `find_by_offset` method on the module index to locate the DIE's range in the module.
4. If the
That's verbatim the response, including the abrupt end. I have max_tokens: 200
so presumably that's why it gets cut off.
It's not too bad. The summary is actually pretty accurate. That is indeed what it does. I could probably steal that and add it to the documentation. It's way too long though. It includes details that a caller would never need to care about (the "key logic").
Similarly, the RAG search is okay, e.g. for the query "what parser combinators are there?" I get back:
### Parser Combinators
The available parser combinators are defined in the [crates/rudy-dwarf/0.4.0/parser.json](#) document. They include:
- `and`: Combine this parser with another, applying both and combining results
- `filter`: Turns the input into an optional output if the provided parser succeeds
- `map`: Transform the output of this parser
- `map_with_entry`: Transform the output of this parser with access to the database or entry
- `map_res`: Transform the output of this parser with a result
- `then`: Chain this parser with another, where the second operates on the first's result
- `context`: Add context to errors from this parser
These combinators are part of the `Parser` trait, which is a core part of the parsing functionality in the Rudy Dwarf library.
Note: The [crates/rudy-dwarf/0.4.0/parser.json](#) document provides more detailed information about each combinator, including their methods and usage.
Which is pretty accurate. However it's referring to the original parsed document parser.json
which is not ideal since that isn't actually accessible by the caller.
But it's not bad. There's promise here!
Originally this project was more about trying out the Cloudflare platform. But I still believe there's a need for this kind of service... which is why I plan to keep building it out. From here the main gap is refining the AI implementation: writing better prompts and passing in the right level of context. The code currently lives here.
What I would want to do first is work out a decent local testing setup where I can iterate on different parsing outputs
and prompts without hitting Workers AI directly. On one hand it's cool that I can do local development backed by the
actual AI APIs in a super easy way. But it was hard for me to know how expensive that would be while developing.
In hindsight, looking at billing analyzing just a couple of crates
(my rudy-dwarf
crate as above, and serde
) just once, I spent about 3k neurons (Workers AI billing unit, similar to tokens)
which works out at about 3 cents. So not too bad, but it was hard to anticipate how much it would be.
What I liked: Incredibly easy to get this all set up. This was the easiest part of the whole project by far.
What I disliked: So far no complaints! Again there are quite a few products which could use some unifying documentation. And choosing a model is a little overwhelming.
Misc: Testing and Monorepos.
On the topic of testing. The area where I spent the most amount of development time (other than the WASM issues) was trying to figure out the right combination of JavaScript bundling tools, testing, and worker configuration files.
On one hand, it's awesome that you can run a simulated version of Cloudflare locally. It's called miniflare, and in my understanding this is what powers the local development experience
when running wrangler dev
.
At my previous job we heavily used AWS. Testing interactions with AWS itself was one of the bigger coverage gaps, and so we started using tools like localstack to do better end-to-end testing. So it was very cool to see that I could simply run my workers locally, and even have a fully functioning setup with queues, workflows, databases, and storage.
However, it was challenging to get everything all hooked up correctly. I spent a long time trawling through
example repositories trying to find the right way to configure it so that I could run tests, with the appropriate bindings,
while keeping TypeScript happy.
Eventually the thing that unblocked me was Cloudflare's own mcp-server-cloudflare
project.
Following the patterns in that repository helped get my structure cleaned up, although there weren't many examples of tests that I could follow.
What I liked: Being able to run the entire workflow end-to-end locally.
What I disliked: I would love to see more guidance and/or tooling for multi-worker repositories. It would also be great to something like a workspace-level configuration for wrangler. Also, I would love to write snapshot tests and use fixture data. But without access to the filesystem it's not obvious how I would do this.
Conclusion
Going into this, my hypothesis was that Cloudflare's existing infrastructure primitives would make it a powerful platform for building AI applications. And so far, I'm feeling validated.
The journey definitely had its challenges. Initially navigating the available products was quite overwhelming, and I'm still left wondering whether there's a tooling gap, a documentation gap, or simply user error in how I was trying to do WASM, multiple workers, and testing. But those all seem like tractable problems.
Shortly after finishing this project, I started working on a turn-based strategy game (a port of a friend's tabletop game). On that project, having more experience and fluency with the Cloudflare platform made things go a lot smoother. I also got to see the power of Durable Objects being used for websocket-based multiplayer games.
It's an impressive set of products and technologies that Cloudflare has built, and I'm excited to join in and help!