Support Json unquote function#8407
Conversation
|
/run-all-tests |
4 similar comments
|
/run-all-tests |
|
/run-all-tests |
|
/run-all-tests |
|
/run-all-tests |
|
/rebuild |
|
/run-integration-test |
|
/run-all-tests |
|
/hold |
|
run-integration-test |
| auto & factory = FunctionFactory::instance(); | ||
| ColumnsWithTypeAndName columns({input_column}); | ||
| ColumnNumbers argument_column_numbers; | ||
| for (size_t i = 0; i < columns.size(); ++i) | ||
| argument_column_numbers.push_back(i); | ||
|
|
||
| ColumnsWithTypeAndName arguments; | ||
| for (const auto argument_column_number : argument_column_numbers) | ||
| arguments.push_back(columns.at(argument_column_number)); | ||
|
|
||
| const String func_name = "cast_json_as_string"; | ||
| auto builder = factory.tryGet(func_name, context); | ||
| if (!builder) | ||
| throw TiFlashTestException(fmt::format("Function {} not found!", func_name)); | ||
| auto func = builder->build(arguments, nullptr); | ||
| auto * function_build_ptr = builder.get(); | ||
| if (auto * default_function_builder = dynamic_cast<DefaultFunctionBuilder *>(function_build_ptr); | ||
| default_function_builder) | ||
| { | ||
| auto * function_impl = default_function_builder->getFunctionImpl().get(); | ||
| if (auto * function_cast_json_as_string = dynamic_cast<FunctionsCastJsonAsString *>(function_impl); | ||
| function_cast_json_as_string) | ||
| { | ||
| function_cast_json_as_string->setOutputTiDBFieldType(field_type); | ||
| } | ||
| else | ||
| { | ||
| throw TiFlashTestException(fmt::format("Function {} not found!", func_name)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Seems useless because DAGExpressionAnalyerHelper will be called when raw_function_test is false
There was a problem hiding this comment.
Can't get your point here, just introduce this method for test to set tidb field type here
|
/run-all-tests |
|
/rebuild |
|
/run-all-tests |
3 similar comments
|
/run-all-tests |
|
/run-all-tests |
|
/run-all-tests |
| byte_length = std::min(byte_length, orig_length); | ||
| if (byte_length < element_write_buffer.count()) | ||
| context.getDAGContext()->handleTruncateError("Data Too Long"); | ||
| write_buffer.write(reinterpret_cast<char *>(container_per_element.data()), byte_length); |
There was a problem hiding this comment.
Looks like if byte_length > element_write_buffer.count(), it will append random bytes, is it the expected behavior?
There was a problem hiding this comment.
And Looks like if there is a method to get current pos in write_buffer, we don't need to write tmp result into element_write_buffer and copy it to write_buffer after the byte length check?
There was a problem hiding this comment.
byte_length is expected to be equal or fewer than orig_length, thus shouldn't be byte_length > element_write_buffer.count() case.
And it is not common to set char length here, thus use tmp result to make code more readable.
There was a problem hiding this comment.
But theoretical speaking, we still need to handle the case of byte_length > element_write_buffer.count()? Maybe we should throw Exception in charLengthToByteLengthFromUTF8 if ret > length?
There was a problem hiding this comment.
byte_length = std::min(byte_length, orig_length); is executed after charLengthToByteLengthFromUTF8, thus byte_length <= orig_length. Not sure if this answer your question.
There was a problem hiding this comment.
But charLengthToByteLengthFromUTF8 can not guarantee this if it is not a valid utf8 string, so I suggest to throw an exception in charLengthToByteLengthFromUTF8 if ret > length
| FormatImpl<FromDataType>::execute(vec_from[i], element_write_buffer, &type, nullptr); | ||
| size_t byte_length = element_write_buffer.count(); | ||
| if (tp.flen() > 0) | ||
| if (tp.flen() >= 0) |
There was a problem hiding this comment.
Yes, it is a existing bug.
|
/hold |
|
/run-all-tests |
Signed-off-by: yibin <huyibin@pingcap.com>
|
/run-all-tests |
| json_binary.toStringInBuffer(element_write_buffer); | ||
| } | ||
|
|
||
| size_t orig_length = element_write_buffer.count(); |
There was a problem hiding this comment.
L475-L483 should be inside the above else branch?
There was a problem hiding this comment.
Yeah, it can reduce useless code for null case. I'll move it.
| byte_length = std::min(byte_length, orig_length); | ||
| if (byte_length < element_write_buffer.count()) | ||
| context.getDAGContext()->handleTruncateError("Data Too Long"); | ||
| write_buffer.write(reinterpret_cast<char *>(container_per_element.data()), byte_length); |
There was a problem hiding this comment.
But theoretical speaking, we still need to handle the case of byte_length > element_write_buffer.count()? Maybe we should throw Exception in charLengthToByteLengthFromUTF8 if ret > length?
Signed-off-by: yibin <huyibin@pingcap.com>
| reinterpret_cast<char *>(container_per_element.data()), | ||
| orig_length, | ||
| tidb_tp->flen()); | ||
| byte_length = std::min(byte_length, orig_length); |
There was a problem hiding this comment.
Looks like this is not necessary since charLengthToByteLengthFromUTF8 should ensure that the return value is less than orig_length?
| JsonBinary::JsonBinaryWriteBuffer element_write_buffer(container_per_element); | ||
| JsonBinary json_binary( | ||
| data_from[current_offset], | ||
| StringRef(&data_from[current_offset + 1], json_length - 1)); | ||
| json_binary.toStringInBuffer(element_write_buffer); | ||
| size_t orig_length = element_write_buffer.count(); | ||
| auto byte_length = charLengthToByteLengthFromUTF8( | ||
| reinterpret_cast<char *>(container_per_element.data()), | ||
| orig_length, | ||
| tidb_tp->flen()); | ||
| byte_length = std::min(byte_length, orig_length); | ||
| if (byte_length < element_write_buffer.count()) | ||
| context.getDAGContext()->handleTruncateError("Data Too Long"); | ||
| write_buffer.write(reinterpret_cast<char *>(container_per_element.data()), byte_length); |
There was a problem hiding this comment.
how about
| JsonBinary::JsonBinaryWriteBuffer element_write_buffer(container_per_element); | |
| JsonBinary json_binary( | |
| data_from[current_offset], | |
| StringRef(&data_from[current_offset + 1], json_length - 1)); | |
| json_binary.toStringInBuffer(element_write_buffer); | |
| size_t orig_length = element_write_buffer.count(); | |
| auto byte_length = charLengthToByteLengthFromUTF8( | |
| reinterpret_cast<char *>(container_per_element.data()), | |
| orig_length, | |
| tidb_tp->flen()); | |
| byte_length = std::min(byte_length, orig_length); | |
| if (byte_length < element_write_buffer.count()) | |
| context.getDAGContext()->handleTruncateError("Data Too Long"); | |
| write_buffer.write(reinterpret_cast<char *>(container_per_element.data()), byte_length); | |
| auto start_pos = write_buffer.offset(); | |
| JsonBinary json_binary( | |
| data_from[current_offset], | |
| StringRef(&data_from[current_offset + 1], json_length - 1)); | |
| json_binary.toStringInBuffer(write_buffer); | |
| auto end_pos = write_buffer.offset(); | |
| auto orig_length = end_pos - start_pos; | |
| auto byte_length = charLengthToByteLengthFromUTF8( | |
| reinterpret_cast<char *>(write_buffer.data() + start_offset), | |
| orig_length, | |
| tidb_tp->flen()); | |
| byte_length = std::min(byte_length, orig_length); | |
| if (byte_length < orig_length) | |
| { | |
| context.getDAGContext()->handleTruncateError("Data Too Long"); | |
| write_buffer.setOffset(start_pos + byte_length); | |
| } |
?
There was a problem hiding this comment.
Yeah,you're right. I just think this code path is not common used(because cast json as fixed length char is valid but strange), and even if it is used the performance won't drop significantly, thus choose to use the temporary buffer here to make code more easier.
Signed-off-by: yibin <huyibin@pingcap.com>
|
/unhold |
|
/run-all-tests |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SeaRise, windtalker The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/run-all-tests |
|
@yibin87: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests
If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: close #8334
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note