-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
dev@arrow.apache.com mailing list thread: https://lists.apache.org/thread/f0xb61z4yw611rw0v8vf9rht0qtq8opc
Usecase
InfluxDB IOx / 3.0 would like to allow customers to create prepared SQL statements with parameters so they can send parameterized queries and parameter values to the serve. Without this feature, they have to do the parameter substitution on the client side, which is both subject to possible SQL injection attacks, or (if they use a pre existing library) may not have the same parameter typing as our SQL implementation.
Given the JDBC driver doesn't yet support binding parameters to prepared statements (see #33961) I am not sure how widely used the parameter support is, but I think interest is growing -- for example apache/arrow-rs#4797 adds client side support to the Rust implementation
Background: Stateless services
A common design pattern in cloud services is that the request from a client can be handled by one of a number of identical backend servers as shown in the diagram below.
Subsequent requests may be processed by different backend servers. Any state needed to continue a session is sent to the client which passes it back in subsequent requests.
This design can used to support features such as zero downtime deployments and automatic workload based scaling. It also has the nice property that there is no server side state to clean up (via timeout or other mechanism).
┌────────────────────┐
┌ ─ ─ ─ ─▶│ Server 1 │
└────────────────────┘
│
┌────────────────────┐
┌────────────────────┐ │ │ Server 2 │
│ FlightSQL │ └────────────────────┘
│ Client │─ ─ ─ ─ ─ ▶ ... Network ... ─ ─ ─ ─ ┘
│ │ ...
└────────────────────┘
┌────────────────────┐
ActionCreatePreparedStatementRequest │ Server N │
handled by Server 1 └────────────────────┘
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌────────────────────┐
│ Server 1 │
└────────────────────┘
┌────────────────────┐
┌────────────────────┐ │ Server 2 │
│ FlightSQL │ └────────────────────┘
│ Client │─ ─ ─ ─ ─ ▶ ... Network ... ─ ─ ─ ─ ┐
│ │ ...
└────────────────────┘ │
┌────────────────────┐
└ ─ ─ ─ ─▶│ Server N │
ActionPreparedStatementExecute └────────────────────┘
handled by Server N
Problem
As currently specified, I don't think we can implement FlightSQL prepared statements with parameters with such a stateless design.
In IOx, the handle returned from ActionCreatePreparedStatementRequest contains the original SQL query text among other things. Thus the subsequent call to ActionPreparedStatementExecute have access to the SQL query.
However, the CommandPrepareStatementQuery message to bind parameters does not return anything to the client that is sent to calls to ActionPreparedStatementExecute. Thus there is no way for the server that processes ActionPreparedStatementExecute to know the values of the parameters.
FlightSQL sequence Diagram
Here is the sequence diagram from https://arrow.apache.org/docs/format/FlightSql.html for reference
Strawman Proposal
One way to support stateless implementation of prepared statements with bind parameters would be to extend the response returned from calling DoPut with CommandPrepareStatementQuery to include a new CommandPrepareStatementQueryResponse , similar to
Lines 1782 to 1792 in 15a8ac3
| * Returned from the RPC call DoPut when a CommandStatementUpdate | |
| * CommandPreparedStatementUpdate was in the request, containing | |
| * results from the update. | |
| */ | |
| message DoPutUpdateResult { | |
| option (experimental) = true; | |
| // The number of records updated. A return value of -1 represents | |
| // an unknown updated record count. | |
| int64 record_count = 1; | |
| } |
/**
* Response returned when `DoPut` is called with `CommandPrepareStatementQuery`
message DoPutStatementPrepareResult {
option (experimental) = true;
// (potentially updated) opaque handle for the prepared statement on the server.
// All subsequent requests for his prepared statement must use this new handle, if specified
bytes prepared_statement_handle = 1;
}I think this would be a fairly low overhead and easy extension. Existing clients that support bind parameters would require an update, but existing servers would not. Given that bind parameters are just starting to be used more I think the overall ecosystem impact would be low
See also
See this discussion for more context: https://github.com/apache/arrow-rs/pull/4797/files#r1319807938
Component(s)
FlightRPC