Skip to content

Sql statement api error messaging fixes. #14629

Merged
cryptoe merged 3 commits intoapache:masterfrom
cryptoe:sql_statement_fixes
Jul 20, 2023
Merged

Sql statement api error messaging fixes. #14629
cryptoe merged 3 commits intoapache:masterfrom
cryptoe:sql_statement_fixes

Conversation

@cryptoe
Copy link
Copy Markdown
Contributor

@cryptoe cryptoe commented Jul 20, 2023

This patch contains the following fixes:

  1. Error message changes when durable storage is not enabled.
  2. Changing the undocumented context param selectDestination values from TASK_REPORT and DURABLE_STORAGE to taskReport and durableStorage.
  3. Follow up PR from Adding null fix for rows and col stats information. #14617
  4. Documentation of context param selectDestination.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cryptoe cryptoe added Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 and removed Area - Documentation labels Jul 20, 2023
@cryptoe cryptoe added this to the 27.0 milestone Jul 20, 2023
Comment thread docs/multi-stage-query/reference.md Outdated

For queries where you want to use fault tolerance for workers, set `faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to `true`.

For select queryies which want to write the final result to `durableStoage`, set `selectDestination`:`durableStorage`. Which shuffle mesh the job uses can still be controller by `durableShuffleStorage` flag ie. a combination where `selectDestination`:`durableStorage` and `durableShuffleStorage`:`false` is perfectly valid.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. spelling of queries and durableStorage
  2. Something seems off in the second sentence. I think it should be "controlled" instead of "controller".
  3. durableStoage (first use) should be written as "durable storage" (without double quotes) since we are using it as a noun instead of code word.
  4. We should use something else instead of "shuffle mesh" in the docs since it's not a well-documented term in the docs and could be simplified further. Wdyt about simplifying it to something with following structure:
Set `selectDestination`:`durableStorage` for select queries that want to write the final
results to durable storage instead of the task reports. Saving the results in the durable
storage allows users to .. (benefits for the user)

The location where the workers can write the intermediate results is independent of the
location where the final results get stored. Therefore "durableShuffleStorage":false and
"selectDestination":"durableStorage" is a valid configuration to use in the query context, that
instructs the controller to persist only the final result in the durable storage, and not the
intermediate results. 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like you structure better. Going ahead with that.

* Writes all the results directly to the report.
*/
TASK_REPORT(false),
TASKREPORT("taskReport", false),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a snake case for the static variable names. Please revert it to the original values.
Since this is changed for equality, we can create a static method that checks if the String provided equals the enum destination.

Suggested change
TASKREPORT("taskReport", false),
TASK_REPORT("taskReport", false),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done so that we are able to still use the query context enum methods. Also I think ResultFormat class also uses a similar trick. Checkout objectlines

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

* Writes the results as frame files to durable storage. Task report can be truncated to a preview.
*/
DURABLE_STORAGE(true);
DURABLESTORAGE("durableStorage", true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DURABLESTORAGE("durableStorage", true);
DURABLE_STORAGE("durableStorage", true);

);
public static final String JSON_STRING = "{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"pages\":[{\"numRows\":1,\"sizeInBytes\":1,\"id\":0}]}";
public static final String JSON_STRING_1 = "{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"sampleRecords\":[[\"1\"],[\"2\"],[\"3\"]],\"pages\":[{\"numRows\":1,\"sizeInBytes\":1,\"id\":0}]}";
public static final String JSON_STRING = "{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"pages\":[{\"id\":0,\"numRows\":1,\"sizeInBytes\":1}]}";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few tests that ensure that nulls in the size or rows are not serialized?

key,
StringUtils.format("a value of enum [%s]", clazz.getSimpleName()),
StringUtils.format(
"referring to one of the values[%s] of enum [%s]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's inconsistent spacing between values[%s] and enum [%s]

@LakshSingla
Copy link
Copy Markdown
Contributor

LakshSingla commented Jul 20, 2023

TY for picking up the older comments from the reviews!


For select queryies which want to write the final result to `durableStoage`, set `selectDestination`:`durableStorage`. Which shuffle mesh the job uses can still be controller by `durableShuffleStorage` flag ie. a combination where `selectDestination`:`durableStorage` and `durableShuffleStorage`:`false` is perfectly valid.
Set `selectDestination`:`durableStorage` for select queries that want to write the final results to durable storage instead of the task reports. Saving the results in the durable
storage allows users to fetch large result sets. The location where the workers write the intermediate results is different than the location where final results get stored. Therefore, `durableShuffleStorage`:`false` and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage allows users to fetch large result sets. The location where the workers write the intermediate results is different than the location where final results get stored. Therefore, `durableShuffleStorage`:`false` and
storage allows users to fetch large result sets. The location where the workers write the intermediate results can be different from the location where the final results get stored. Therefore, `durableShuffleStorage`:`false` and

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location where the workers write the intermediate results is always different hence I would like to keep the wording as is unless you feel strongly about it.

* Writes all the results directly to the report.
*/
TASK_REPORT(false),
TASKREPORT("taskReport", false),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Copy Markdown
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitty comment, overall PR LGTM

@cryptoe cryptoe merged commit 77e0c16 into apache:master Jul 20, 2023
cryptoe added a commit that referenced this pull request Jul 20, 2023
* Error messaging fixes.

* Static check fix

* Review comments

(cherry picked from commit 77e0c16)
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
* Error messaging fixes.

* Static check fix

* Review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants