-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Describe the bug
While using multiple conditions are used, a stack overflow error occurs.
In particular, when used with tokio, more limitations arise because the default stack size is 2MiB.
To Reproduce
I referred to reproduce code from issue #1434 provided by @mcassels.
SELECT * FROM table WHERE <condition0> OR <condition1> OR ...
use datafusion::{
arrow::datatypes::{DataType, Field, Schema},
common::Result,
config::ConfigOptions,
error::DataFusionError,
logical_expr::{
logical_plan::builder::LogicalTableSource, AggregateUDF, ScalarUDF, TableSource,
},
sql::{
planner::{ContextProvider, SqlToRel},
sqlparser::{dialect::GenericDialect, parser::Parser},
TableReference,
},
};
use std::{collections::HashMap, sync::Arc};
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
#[tokio::main]
async fn main() -> Result<()> {
let num_conditions = 255;
let where_clause = (0..num_conditions)
.map(|i| format!("column1 = 'value{:?}'", i))
.collect::<Vec<String>>()
.join(" OR ");
let sql = format!("SELECT * from table1 where {};", where_clause);
get_optimized_plan(sql).await?;
println!("query succeeded with {:?} conditions", num_conditions);
let num_conditions = 256;
let where_clause = (0..num_conditions)
.map(|i| format!("column1 = 'value{:?}'", i))
.collect::<Vec<String>>()
.join(" OR ");
let sql = format!("SELECT * from table1 where {};", where_clause);
get_optimized_plan(sql).await?;
println!("query succeeded with {:?} conditions", num_conditions);
Ok(())
}
async fn get_optimized_plan(sql: String) -> Result<()> {
let schema_provider = TestSchemaProvider::new();
let dialect = GenericDialect {};
let ast = Parser::parse_sql(&dialect, &sql).unwrap();
let statement = &ast[0];
let sql_to_rel = SqlToRel::new(&schema_provider);
sql_to_rel.sql_statement_to_plan(statement.clone()).unwrap();
Ok(())
}
struct TestSchemaProvider {
options: ConfigOptions,
tables: HashMap<String, Arc<dyn TableSource>>,
}
impl TestSchemaProvider {
pub fn new() -> Self {
let mut tables = HashMap::new();
tables.insert(
"table1".to_string(),
create_table_source(vec![Field::new(
"column".to_string(),
DataType::Utf8,
false,
)]),
);
Self {
options: Default::default(),
tables,
}
}
}
fn create_table_source(fields: Vec<Field>) -> Arc<dyn TableSource> {
Arc::new(LogicalTableSource::new(Arc::new(
Schema::new_with_metadata(fields, HashMap::new()),
)))
}
impl ContextProvider for TestSchemaProvider {
fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn TableSource>> {
match self.tables.get(name.table()) {
Some(table) => Ok(table.clone()),
_ => Err(DataFusionError::Plan(format!(
"Table not found: {}",
name.table()
))),
}
}
fn get_function_meta(&self, _name: &str) -> Option<Arc<ScalarUDF>> {
None
}
fn get_aggregate_meta(&self, _name: &str) -> Option<Arc<AggregateUDF>> {
None
}
fn get_variable_type(&self, _variable_names: &[String]) -> Option<DataType> {
None
}
fn options(&self) -> &ConfigOptions {
&self.options
}
}Output
query succeeded with 255 conditions
thread 'main' has overflowed its stack
fatal runtime error: stack overflowIf there are more than 256 conditions, stack overflow occurs. This happens only debug mode, related to #1434 (comment).
Expected behavior
Work without overflows..
Additional context
I guess 2 approaches to this problem.
Approach#1
Parameters are received as reference or without using box pointers in some functions, such as select_to_plan and plan_selection. This maybe can make Stack grow faster.
And I found some stack allocation with enumeration.
https://www.reddit.com/r/rust/comments/zbla3j/how_does_enums_work_where_are_they_allocated/
Approach#2
Using Address Sanitizer with the above example, error occurred in fmt::Display. But, I'm not sure exactly where it happened.
This would be related to rust issue: rust-lang/rust#45838 .