-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Using datafusion version 42.2.0.
Follow up to #13092, which was fixed by #13117 thanks to @Omega359.
However, this fix will not catch mistakes like reordered columns. For example, if table A has columns a, b and table B has columns b, a, then DataFusion will happily compute the union, with the wrong values in the wrong columns.
So why not just compare the entire schema? Or at least the column names and types (i.e. ignoring metadata)? The docs explicitly say that the schemas must be equal.
To Reproduce
#[tokio::test]
async fn test_union() {
use crate::data_frame;
use datafusion::assert_batches_sorted_eq;
use datafusion::common::arrow::array::{ArrayRef, StringArray};
use datafusion::common::arrow::record_batch::RecordBatch;
use std::sync::Arc;
let ctx = SessionContext::new();
let a = ctx
.read_batch(
RecordBatch::try_from_iter([
("a", Arc::new(StringArray::from(vec!["a"])) as ArrayRef),
("b", Arc::new(StringArray::from(vec!["b"])) as ArrayRef),
])
.unwrap(),
)
.unwrap();
let b = ctx
.read_batch(
RecordBatch::try_from_iter([
("b", Arc::new(StringArray::from(vec!["b"])) as ArrayRef),
("a", Arc::new(StringArray::from(vec!["a"])) as ArrayRef),
])
.unwrap(),
)
.unwrap();
let union = a.union(b).unwrap();
assert_batches_sorted_eq!(
[
"+---+---+",
"| a | b |",
"+---+---+",
"| a | b |",
"| a | b |",
"+---+---+",
],
&union.collect().await.unwrap()
);
}Expected behavior
Test passes.
Actual behavior
assertion `left == right` failed:
expected:
[
"+---+---+",
"| a | b |",
"+---+---+",
"| a | b |",
"| a | b |",
"+---+---+",
]
actual:
[
"+---+---+",
"| a | b |",
"+---+---+",
"| a | b |",
"| b | a |",
"+---+---+",
]
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working