-
Notifications
You must be signed in to change notification settings - Fork 506
ORC-1116: [C++] Fix csv-import tool when exporting long bytes #1044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,6 +24,8 @@ | |
| #include "wrap/gmock.h" | ||
| #include "wrap/gtest-wrapper.h" | ||
|
|
||
| #include <fstream> | ||
|
|
||
| TEST (TestCSVFileImport, test10rows) { | ||
| // create an ORC file from importing the CSV file | ||
| const std::string pgm1 = findProgram("tools/src/csv-import"); | ||
|
|
@@ -90,3 +92,36 @@ TEST (TestCSVFileImport, testTimezoneOption) { | |
| EXPECT_EQ("", error); | ||
| } | ||
| } | ||
|
|
||
| TEST (TestCSVFileImport, testLongString) { | ||
| // create an ORC file from importing the CSV file | ||
| const std::string pgm1 = findProgram("tools/src/csv-import"); | ||
| const std::string csvFile = "/tmp/test_csv_import_test_long_string.csv"; | ||
| const std::string orcFile = "/tmp/test_csv_import_test_long_string.orc"; | ||
| const std::string schema = "'struct<_a:string,b_:binary,_c:varchar(10)>'"; | ||
| std::string output; | ||
| std::string error; | ||
|
|
||
| std::ofstream csvFileStream(csvFile, std::ios::binary | std::ios::out | std::ios::trunc); | ||
| if(csvFileStream.is_open()) | ||
| { | ||
| std::string longStr; | ||
| longStr.resize(4 * 1024 * 1024 + 1, 'x'); | ||
| csvFileStream << "str1," << longStr << ",var1\n"; | ||
| csvFileStream << "str2," << longStr << ",var2\n"; | ||
| csvFileStream.close(); | ||
| } | ||
|
|
||
| EXPECT_EQ(0, runProgram({pgm1, schema, csvFile, orcFile}, output, error)); | ||
| EXPECT_EQ("", error); | ||
|
|
||
| // verify the ORC file content | ||
| const std::string pgm2 = findProgram("tools/src/orc-contents"); | ||
| std::string option = "--columns=0,2"; | ||
| const std::string expected = | ||
| "{\"_a\": \"str1\", \"_c\": \"var1\"}\n" | ||
| "{\"_a\": \"str2\", \"_c\": \"var2\"}\n"; | ||
| EXPECT_EQ(0, runProgram({pgm2, option, orcFile}, output, error)); | ||
| EXPECT_EQ(expected, output); | ||
| EXPECT_EQ("", error); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. The test case seems to pass without your patch still. The valid test case should fail on main branch without your patch. Could you take a look at this once more and make it sure?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if this is a code problem or some other issues, the output like: /home/kai/github/orc/tools/test/TestCSVFileImport.cc:126: Failure $git diff main --stat $md5sum tools/test/TestCSVFileImport.cc $cat /etc/redhat-release
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without the fix, the test failed in my env too I'm on Ubuntu 16.04.6. The invalid pointers might have undefined behaviors. I think it's ok if this catch the bug in some env but not all.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you try it on
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried it on macOS 10.13.6, the test passed with and without the fix.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't reproduce the problem on MacOS either.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe the clang++ compiled code is not reusing the invalidated address, so the referenced data is still there, undefined behaviour.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. Thank you for confirming, @stiga-huang and @KyleGrains .
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you too, @stiga-huang and @dongjoon-hyun . |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this.