Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Dec 10, 2022

Which issue does this PR close?

re #4460

Rationale for this change

A follow up from #4547 which parsed CSV writer output

After #4547, the sqllogictest code writes RecordBatches to csv and then reparses it. This can lead to misleading results if the data had ' ' in it, for example.

What changes are included in this PR?

  1. Generate Vec<Vec<String>> directly
  2. Use arrow::util::display for string display rather than the writer. This is what pretty_format_batches uses underneath so should be more consistent with existing tests
  3. Check that all output RecordBatches have the same schema

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 10, 2022
SELECT var_pop(c6) FROM aggregate_test_100
----
2.615633434202189e37
26156334342021890000000000000000000000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these differences are due the difference in arrow::util::display::array_value_to_string and the CSV writer. I think the changes are an improvement

@alamb alamb force-pushed the alamb/sqllogic_test_less_parse branch from 89c6b3b to 596b3e1 Compare December 10, 2022 11:53
@xudong963 xudong963 self-requested a review December 10, 2022 11:55
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear, thanks @alamb

})
.collect();
// Convert to normal string representation
let s = arrow::util::display::array_value_to_string(col, row)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following is more simple and reduces assignment. (85~97)

Suggested change
let s = arrow::util::display::array_value_to_string(col, row)
let mut s = arrow::util::display::array_value_to_string(col, row)
.map_err(DFSqlLogicTestError::Arrow)?;
// apply subsequent normalization depending on type
if matches!(col.data_type(), DataType::Utf8 | DataType::LargeUtf8) && s.is_empty() {
// All empty strings are replaced with this value
s = "(empty)".to_string()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea -- done in d0e7db9

@alamb alamb changed the title Output sqllogictests with arrow display Output sqllogictests with arrow display rather than CSV writer Dec 11, 2022
@xudong963 xudong963 merged commit fb85049 into apache:master Dec 11, 2022
@alamb alamb deleted the alamb/sqllogic_test_less_parse branch August 8, 2023 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants