Improve the wording around the InvalidNullByteException#13794
Improve the wording around the InvalidNullByteException#13794LakshSingla wants to merge 2 commits intoapache:masterfrom
InvalidNullByteException#13794Conversation
abhishekagarwal87
left a comment
There was a problem hiding this comment.
thank you for making this error message more actionable.
| if (!allowNullBytes && b == 0) { | ||
| throw new InvalidNullByteException(); | ||
| throw new InvalidNullByteException( | ||
| "Unable to add the frame because it contains null bytes. This usually happens when the added string columns " |
There was a problem hiding this comment.
Thanks for the improved message. The user, however, knows nothing about frames. Can we word this from the user's perspective?
Druid does not support null (0x00) bytes in strings. File %s, row $d, column %s contains null bytes: [%s].
The string would be encoded so that control characters appear as \U0000 so the user can see the position of the null bytes.
Maybe we don't know the row number (in a form useful to the user.) If not, just list the column.
This is a case of unparsable data. Should we have caught it at the time we read the data rather than when writing to a frame? Should we invoke our bad-row logic to skip this row? That logic should log the bad row for later re-ingestion, but I don't think we've added that ability.
If we catch the problem on read, then the check here is more of an assertion. Though, perhaps the data was created by an expression, so it is still worth validating.
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
Raised an improved PR tackling this issue |
Description
This PR improves the wording around the mysterious
InvalidNullByteException. The exception occurs when the strings that are added to the frame contain the 0x0000 byte which is internally being used as a delimiter in the case of a string column. The current use case for the frames is MSQ exclusively, and this error will only be generated if the ingested external data contains hidden null bytes, which in most cases can safely be sanitized.This PR has: