Sql statement api error messaging fixes. #14629
Conversation
|
|
||
| For queries where you want to use fault tolerance for workers, set `faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to `true`. | ||
|
|
||
| For select queryies which want to write the final result to `durableStoage`, set `selectDestination`:`durableStorage`. Which shuffle mesh the job uses can still be controller by `durableShuffleStorage` flag ie. a combination where `selectDestination`:`durableStorage` and `durableShuffleStorage`:`false` is perfectly valid. |
There was a problem hiding this comment.
- spelling of queries and durableStorage
- Something seems off in the second sentence. I think it should be "controlled" instead of "controller".
durableStoage(first use) should be written as "durable storage" (without double quotes) since we are using it as a noun instead of code word.- We should use something else instead of "shuffle mesh" in the docs since it's not a well-documented term in the docs and could be simplified further. Wdyt about simplifying it to something with following structure:
Set `selectDestination`:`durableStorage` for select queries that want to write the final
results to durable storage instead of the task reports. Saving the results in the durable
storage allows users to .. (benefits for the user)
The location where the workers can write the intermediate results is independent of the
location where the final results get stored. Therefore "durableShuffleStorage":false and
"selectDestination":"durableStorage" is a valid configuration to use in the query context, that
instructs the controller to persist only the final result in the durable storage, and not the
intermediate results.
There was a problem hiding this comment.
I like you structure better. Going ahead with that.
| * Writes all the results directly to the report. | ||
| */ | ||
| TASK_REPORT(false), | ||
| TASKREPORT("taskReport", false), |
There was a problem hiding this comment.
We should use a snake case for the static variable names. Please revert it to the original values.
Since this is changed for equality, we can create a static method that checks if the String provided equals the enum destination.
| TASKREPORT("taskReport", false), | |
| TASK_REPORT("taskReport", false), |
There was a problem hiding this comment.
This was done so that we are able to still use the query context enum methods. Also I think ResultFormat class also uses a similar trick. Checkout objectlines
| * Writes the results as frame files to durable storage. Task report can be truncated to a preview. | ||
| */ | ||
| DURABLE_STORAGE(true); | ||
| DURABLESTORAGE("durableStorage", true); |
There was a problem hiding this comment.
| DURABLESTORAGE("durableStorage", true); | |
| DURABLE_STORAGE("durableStorage", true); |
| ); | ||
| public static final String JSON_STRING = "{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"pages\":[{\"numRows\":1,\"sizeInBytes\":1,\"id\":0}]}"; | ||
| public static final String JSON_STRING_1 = "{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"sampleRecords\":[[\"1\"],[\"2\"],[\"3\"]],\"pages\":[{\"numRows\":1,\"sizeInBytes\":1,\"id\":0}]}"; | ||
| public static final String JSON_STRING = "{\"numTotalRows\":1,\"totalSizeInBytes\":1,\"resultFormat\":\"object\",\"dataSource\":\"ds\",\"pages\":[{\"id\":0,\"numRows\":1,\"sizeInBytes\":1}]}"; |
There was a problem hiding this comment.
Can you add a few tests that ensure that nulls in the size or rows are not serialized?
| key, | ||
| StringUtils.format("a value of enum [%s]", clazz.getSimpleName()), | ||
| StringUtils.format( | ||
| "referring to one of the values[%s] of enum [%s]", |
There was a problem hiding this comment.
There's inconsistent spacing between values[%s] and enum [%s]
|
TY for picking up the older comments from the reviews! |
|
|
||
| For select queryies which want to write the final result to `durableStoage`, set `selectDestination`:`durableStorage`. Which shuffle mesh the job uses can still be controller by `durableShuffleStorage` flag ie. a combination where `selectDestination`:`durableStorage` and `durableShuffleStorage`:`false` is perfectly valid. | ||
| Set `selectDestination`:`durableStorage` for select queries that want to write the final results to durable storage instead of the task reports. Saving the results in the durable | ||
| storage allows users to fetch large result sets. The location where the workers write the intermediate results is different than the location where final results get stored. Therefore, `durableShuffleStorage`:`false` and |
There was a problem hiding this comment.
| storage allows users to fetch large result sets. The location where the workers write the intermediate results is different than the location where final results get stored. Therefore, `durableShuffleStorage`:`false` and | |
| storage allows users to fetch large result sets. The location where the workers write the intermediate results can be different from the location where the final results get stored. Therefore, `durableShuffleStorage`:`false` and |
There was a problem hiding this comment.
The location where the workers write the intermediate results is always different hence I would like to keep the wording as is unless you feel strongly about it.
| * Writes all the results directly to the report. | ||
| */ | ||
| TASK_REPORT(false), | ||
| TASKREPORT("taskReport", false), |
LakshSingla
left a comment
There was a problem hiding this comment.
A nitty comment, overall PR LGTM
* Error messaging fixes. * Static check fix * Review comments (cherry picked from commit 77e0c16)
* Error messaging fixes. * Static check fix * Review comments
This patch contains the following fixes:
selectDestinationvalues fromTASK_REPORTandDURABLE_STORAGEtotaskReportanddurableStorage.selectDestination.This PR has: