Skip to content

Fixes for the Avatica JDBC driver#12709

Merged
clintropolis merged 3 commits intoapache:masterfrom
paul-rogers:220625-jdbc
Jul 27, 2022
Merged

Fixes for the Avatica JDBC driver#12709
clintropolis merged 3 commits intoapache:masterfrom
paul-rogers:220625-jdbc

Conversation

@paul-rogers
Copy link
Copy Markdown
Contributor

PR #12636 amends Druid to make a single pass through the Calcite planner. This was originally done in support of the catalog project to avoid resolving the same catalog object multiple times, which could encounter race conditions if the resolutions occurred on either side of a change to the catalog.

While discussing that change, @clintropolis called our attention to the "query optimized" way that Druid handles parameters, which then raised questions about how we ensure the proposed changes do not break our JDBC support. @gianm suggested we focus testing on JDBC.

That testing revealed that Druid's implementation of the JDBC protocol is a bit off, which may have led to some of the confusion around when to prepare and when to expect parameters. Issue #12682 and issue #12684 spell out the gaps.

While JDBC has two kinds of statements (Statement and PreparedStatement), Druid has just one that attempts to do what the two JDBC statements do. That is, it tries to be both prepared and non-prepared.

Also, while JDBC separates the idea of statement and ResultSet, Druid combines the two: closing the result set closes the statement as well. While this might be handy, for a Statement, it does work at cross-purposes to the PreparedStatement which is designed to be prepared once and executed multiple times. Closing the statement after the first execution forces PreparedStatement to be identical to Statement (except that the former takes parameters.)

Corrects the Druid JDBC Driver Behavior

We could argue about the merits of the Druid design. However, JDBC is a standard and the correct behavior is spelled out in the JDBC 4.1 Spec. That behavior is:

  • A Statement is a reusable object that can execute multiple SQL statements, using a ResultSet. Closing the ResultSet does not close the statement. Druid can, however, close the ResultSet on EOF.
  • A PreparedStatement is a reusable object created with a (typically parameterized) SQL statement. Each execution is given a set of parameter values, and produces a ResultSet. Again, closing the ResultSet does not close the statement, which allows the statement to be executed with a different set of parameters.

To correct the Druid behavior, the Avatica JDBC implementation is refactored to clearly model the JDBC concepts:

  • DruidJdbcStatement models a (non-prepared) Statement.
  • DruidJdbcPreparedStatement models a PreparedStatement.
  • AbstractDruidJdbcStatement is the common base class for the above.
  • DruidJdbcResultSet represents a result set (a running query).

While PR #12636 went on to modify the planner and the SqlLifecyle layers, this PR leaves those layers unchanged, and implements the desired behavior with the existing code. The hope is that the result is a bit easier on reviewers. This code will be modified again later to support the single-pass planner.

The Avatica driver tests were modified to be consistent with the JDBC standard. The original code didn't pass, but the revised code does.

The result is that we now have a clean SQL stack for JDBC, HTTP and internal use that make a single planner pass when possible, and use two passes for the JDBC prepare/execute model.

Flaky Test: DruidAvaticaHandlerTest.testConcurrentQueries()

Also fixes a flaky test: DruidAvaticaHandlerTest.testConcurrentQueries(). This test has occasionally failed in builds. The problem was first mentioned in 2017 in Issue #4408 , which was claimed to be fixed. When run in a tight loop, however, the test eventually fails with an NPE. The issue is traced to the context on the DruidConnection: it is passed to each statement, where it is shared and concurrently modified. The fix is to make a copy for each new statement which avoids queries trying to share the same map.

Verified the problem by running the test 100 times in a tight loop. Verified the fix the same way.

Minor Revisions

Careful inspection of the code identified several other improvements:

  • Hold the yielder thread pool on the connection, not statement, so we get some benefit from caching. (It might be even better to define the pool globally.)
  • Define a SqlRequest to hold the quad of (SQL, context, parameters, auth result), which makes function signatures simpler.
  • Shifted some functionality from DruidMeta to DruidConnection. DruidMeta resolves connections, then connections work with statements.

This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

Build and test on ARM64 CPU architecture (1) failed with:

Automatic restarts limited: Please try restarting this job later or contact support@travis-ci.com.

This failure is unrelated to this change. Let's ignore it.

Build takes 10 hours. Restarting to fix an IT that expects too much detail in an error response.

* items can be filled in later as needed (except for the SQL
* and auth result, which are required.)
*/
public class SqlRequest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure on naming, seems a bit strange to me for something with request in the name to have things called result inside of it. I think since SqlQuery is the HTTP request object and there is a method to create one of these from one of those it adds to the perceived strangeness of the naming.

In some ways this seems like the JDBC analog of SqlQuery, though also with the auth information decorated on it, should we swap the names of SqlQuery and SqlRequest? or call this DruidQueryStatement .. something? idk naming is hard

Comment thread sql/src/main/java/org/apache/druid/sql/SqlRequest.java Outdated
Comment thread sql/src/main/java/org/apache/druid/sql/SqlRequest.java Outdated
Comment thread sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java Outdated
Correctly implement regular and prepared statements
Correctly implement result sets
Fix race condition with contexts
Clarify when parameters are used
Prepare for single-pass through the planner
@paul-rogers
Copy link
Copy Markdown
Contributor Author

Renamed SqlRequest to SqlQueryPlus. Rebased to resolve merge conflict.

Comment thread sql/src/main/java/org/apache/druid/sql/avatica/DruidMeta.java Outdated
@paul-rogers
Copy link
Copy Markdown
Contributor Author

paul-rogers commented Jul 15, 2022

Ready for re-review. The build failed due to a flaky IT recorded in this issue. Otherwise, the code should be ready for final review.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The only thing i'm unsure about is if the executor change for running stuff changes the characteristics of anything, but not sure how to find out without merging it to see

throw DruidMeta.logFailure(new ISE("Too many open statements, limit is [%,d]", maxStatements));
}

@SuppressWarnings("GuardedBy")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this still need suppressed?

@clintropolis clintropolis merged commit a8b155e into apache:master Jul 27, 2022
@paul-rogers
Copy link
Copy Markdown
Contributor Author

Release note item:

In previous releases, Druid would automatically close the JDBC Statement when the ResultSet is closed. Druid would close the ResultSet on EOF. Druid would close the statement on any exception. This behavior is, however, non-standard.

In this release, Druid's JDBC driver follows the JDBC standards more closely:

  • The ResultSet closes automatically on EOF, but does not close the Statement or PreparedStatement. Your code must close these statements, perhaps by using a try-with-resources block.
  • The PreparedStatement can now be used multiple times with different parameters. (Previously this was not true since closing the ResultSet closed the PreparedStatement.)
  • If any call to a Statement or PreparedStatement raises an error, the client code must still explicitly close the statement. According to the JDBC standards, statements are not closed automatically on errors. This allows you to obtain information about a failed statement before closing it.

If you have code that depended on the old behavior, you may have to change your code to add the required close statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants