Skip to content

Proposal: New Architecture for sqlx::query!() Macros #1598

@abonander

Description

@abonander

Status Quo

Currently, the default mode of operation for the sqlx::query!() family of macros is to connect to a running database server (or open that database in-process for SQLite) as specified by the DATABASE_URL environment variable, and then use database-specific commands to parse and analyze the passed query, to get information on bind parameters and output columns and generate code based on that information.

As an option, the macros can also operate in an "offline" mode where instead of connecting to a database server, they read a sqlx-data.json file created by cargo sqlx prepare which contains the same data and use that to generate code.

The implementation is relatively straightforward and functions well enough, but has some drawbacks:

Compiling a project with a lot of query!() macro invocations is painfully slow.

Since each macro invocation involves opening and working with a TCP connection to a database server from within an asynchronous runtime, the query macros execute a lot more code than your typical procedural macro. This is compounded by the fact that procedural macros are exclusively compiled in debug mode which means no optimizations (which async code relies pretty heavily on to be fast).

I have tried playing around with setting rustflags in a config.toml to force sqlx-macros to compile in release mode but it didn't really help and seemed to actually worsen compile times if it did anything at all; the most noticeable thing that it did was clobber all the cached compiler artifacts for the whole dependency tree, requiring a full recompile.

Macro invocations are currently executed serially which further compounds the problem.

Currently as a mitigation we only start a single runtime and then cache it in a lazy_static!(); since proc macros are executed in a separate process that is kept running for the duration of the compiler, this saves some overhead per invocation. I've been meaning to see if caching a Pool of connections saves any overhead as well. I imagine it would, at least somewhat. We could cache analysis results as well, but that would only help for queries that are copy-pasted a lot.

Many people find the idea of procedural macros opening arbitrary TCP connections to be highly unpalatable.

Oftentimes when someone comes across SQLx and starts reading about the query macros, their first reaction is something along the lines of, "wow, the compiler actually lets you do that? That's disgusting!"

Fair enough, the current procedural macro environment is extremely laissez-faire and there have been calls to lock down what procedural macros (and build scripts) can actually do, thanks to sensationalized proof-of-concept projects showing how a proc macro executed by a transitive dependency could steal sensitive information like SSH keys and the like.

Personally, I think the threat is kind of overblown, or at least rather moot, considering a malicious crate in your dependency tree could accomplish all kinds of nastiness already, just at runtime instead of compile time. I fear that a big push to sandbox procedural macros and build scripts could lure people into a false sense of security and make them feel like they don't need to audit their dependencies. But, I digress.

The query!() macros don't play that well with IDEs.

This situation has improved drastically compared to when we first introduced SQLx, to the point where IntelliJ-Rust and Rust Analyzer can both provide code completion for sqlx::query!(). However, it can still take some time to get results, in part thanks to point 1 here, and the macros aren't "pure" in the sense that their output could change from run to run based on schema changes, without the compiler or IDEs being able to pick up on it. They should be working from cached output where possible, and be able to invalidate that cache in a way that IDE plugins can learn about.

Offline mode is finicky to use and it's easy to forget to run cargo sqlx prepare before committing.

Offline mode isn't easily discoverable (in part thanks to just how dense the documentation on sqlx::query!() has gotten... we really need to move that stuff to an MDBook or something). It's also easy to forget to run cargo sqlx prepare before committing, which often leads to really confusing CI errors.

In the past, what I've done is add cargo sqlx prepare && git add sqlx-data.json to .git/hooks/precommit to make sure I don't forget, but this makes committing take forever thanks to point 1.

The macros don't have a way to ensure that the database is up-to-date with migrations.

This is somewhat on purpose as we don't want the macros automatically executing migrations which may end up breaking the user's local development database and/or clobbering manually generated test data. Still, it does get annoying.

Proposal

Instead of having the macros connect directly to the database server at compile time, they could communicate with a daemon process that the user starts and leaves running while working on their project.

I'm imagining this daemon process as another subcommand of sqlx-cli, perhaps cargo sqlx watch or something like that. It would watch target/sqlx which the macros would write their query strings to as files. I considered using a crate like interprocess to let the macros and daemon talk directly, but I think communicating with the filesystem is better because that's a medium that the compiler natively understands as I'll elaborate on in a second.

Since a cargo installed binary is built in release mode by default, the latency of the actual request to parse and analyze the query should improve quite a bit.

The macros could submit their request to the daemon and then return some dummy code (or even a temporary error, like Waiting for query analysis. Is `cargo sqlx watch` running?), telling the compiler to watch target/sqlx for the daemon to write the result of processing the query. Then the compiler could move on to the next macro invocation instead of blocking on the completion of the first one. This would let the macros essentially run in parallel, which should significantly reduce overall compile times.

Assuming #570 is implemented as part of this rearchitecting, the daemon could also automatically update entries in .sqlx/ for successfully parsed queries so that cargo sqlx prepare doesn't have to be manually run before committing.

This would also be more amenable to sandboxing as we would only require write access to target/ and read access to the project, which wouldn't be doing anything more than what build scripts are already expected to do, and any sandboxing scheme that is implemented would be required to support. Support for Windows named pipes or Unix domain sockets would probably be right out in this case unless we specifically asked for it, and that would be an uphill battle.

The daemon could also periodically re-run the analysis of queries to pick up changes to the database schema.

@mehcode suggested that the daemon could automatically migrate the database if necessary, so no other manual setup would be needed. It could even spin up a temporary database so migrations could be automatically tested and synchronized with the query macros without breaking a local development database.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions