Skip to content

Support pruning for boolean columns #490

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When attempting to prune containers such as parquet row groups based on boolean columns (e.g. a flag column), the pruning logic does not work.

So for example, with a query like

select * from my_parquet_based_table where my_flag_column = true

Will not prune any row groups based on the my_flag_column predicate.

Describe the solution you'd like
I would like pruning to occur for boolean columns. Aka add support here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_optimizer/pruning.rs

Here is an example test that fails:

diff --git a/datafusion/src/physical_optimizer/pruning.rs b/datafusion/src/physical_optimizer/pruning.rs
index 3a5a64c6f..d2e93b9b5 100644
--- a/datafusion/src/physical_optimizer/pruning.rs
+++ b/datafusion/src/physical_optimizer/pruning.rs
@@ -508,6 +508,16 @@ mod tests {
             }
         }
 
+        fn new_bool<'a>(
+            min: impl IntoIterator<Item = Option<bool>>,
+            max: impl IntoIterator<Item = Option<bool>>,
+        ) -> Self {
+            Self {
+                min: Arc::new(min.into_iter().collect::<BooleanArray>()),
+                max: Arc::new(max.into_iter().collect::<BooleanArray>()),
+            }
+        }
+
         fn min(&self) -> Option<ArrayRef> {
             Some(self.min.clone())
         }
@@ -927,8 +937,8 @@ mod tests {
     #[test]
     fn prune_api() {
         let schema = Arc::new(Schema::new(vec![
-            Field::new("s1", DataType::Utf8, false),
-            Field::new("s2", DataType::Int32, false),
+            Field::new("s1", DataType::Utf8, true),
+            Field::new("s2", DataType::Int32, true),
         ]));
 
         // Prune using s2 > 5
@@ -953,4 +963,35 @@ mod tests {
 
         assert_eq!(result, expected);
     }
+
+
+    #[test]
+    fn prune_api_bool() {
+        let schema = Arc::new(Schema::new(vec![
+            Field::new("b1", DataType::Boolean, true),
+        ]));
+
+        let statistics = TestStatistics::new().with(
+            "b1",
+            ContainerStats::new_bool(
+                vec![Some(false), Some(false), Some(true), None, Some(false)], // min
+                vec![Some(false), Some(true),  Some(true), None, None ], // max
+            ),
+        );
+
+        // For predicate "b1" (boolean expr)
+        // b1 [false, false] ==> no rows should pass
+        // b1 [false, true] ==> some rows could pass
+        // b1 [true, true] ==> some rows could pass
+        // b1 [NULL, NULL]  ==> no rows could pass
+        // b1 [false, NULL]  ==> no rows could pass
+        let expr = col("b1");
+        let expected = vec![false, true, true, false, false];
+
+        let p = PruningPredicate::try_new(&expr, schema).unwrap();
+        let result = p.prune(&statistics).unwrap();
+
+        assert_eq!(result, expected);
+    }
+
 }

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions