-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When attempting to prune containers such as parquet row groups based on boolean columns (e.g. a flag column), the pruning logic does not work.
So for example, with a query like
select * from my_parquet_based_table where my_flag_column = trueWill not prune any row groups based on the my_flag_column predicate.
Describe the solution you'd like
I would like pruning to occur for boolean columns. Aka add support here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_optimizer/pruning.rs
Here is an example test that fails:
diff --git a/datafusion/src/physical_optimizer/pruning.rs b/datafusion/src/physical_optimizer/pruning.rs
index 3a5a64c6f..d2e93b9b5 100644
--- a/datafusion/src/physical_optimizer/pruning.rs
+++ b/datafusion/src/physical_optimizer/pruning.rs
@@ -508,6 +508,16 @@ mod tests {
}
}
+ fn new_bool<'a>(
+ min: impl IntoIterator<Item = Option<bool>>,
+ max: impl IntoIterator<Item = Option<bool>>,
+ ) -> Self {
+ Self {
+ min: Arc::new(min.into_iter().collect::<BooleanArray>()),
+ max: Arc::new(max.into_iter().collect::<BooleanArray>()),
+ }
+ }
+
fn min(&self) -> Option<ArrayRef> {
Some(self.min.clone())
}
@@ -927,8 +937,8 @@ mod tests {
#[test]
fn prune_api() {
let schema = Arc::new(Schema::new(vec![
- Field::new("s1", DataType::Utf8, false),
- Field::new("s2", DataType::Int32, false),
+ Field::new("s1", DataType::Utf8, true),
+ Field::new("s2", DataType::Int32, true),
]));
// Prune using s2 > 5
@@ -953,4 +963,35 @@ mod tests {
assert_eq!(result, expected);
}
+
+
+ #[test]
+ fn prune_api_bool() {
+ let schema = Arc::new(Schema::new(vec![
+ Field::new("b1", DataType::Boolean, true),
+ ]));
+
+ let statistics = TestStatistics::new().with(
+ "b1",
+ ContainerStats::new_bool(
+ vec![Some(false), Some(false), Some(true), None, Some(false)], // min
+ vec![Some(false), Some(true), Some(true), None, None ], // max
+ ),
+ );
+
+ // For predicate "b1" (boolean expr)
+ // b1 [false, false] ==> no rows should pass
+ // b1 [false, true] ==> some rows could pass
+ // b1 [true, true] ==> some rows could pass
+ // b1 [NULL, NULL] ==> no rows could pass
+ // b1 [false, NULL] ==> no rows could pass
+ let expr = col("b1");
+ let expected = vec![false, true, true, false, false];
+
+ let p = PruningPredicate::try_new(&expr, schema).unwrap();
+ let result = p.prune(&statistics).unwrap();
+
+ assert_eq!(result, expected);
+ }
+
}Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request