Skip to content

Lag function creates unwanted projection #17630

@timsaucer

Description

@timsaucer

Describe the bug

When creating a lag function we are getting an unwanted projection when used with with_column on a dataframe. See the minimal reproducible example below.

This appears related to #12000

To Reproduce

use arrow::array::{Int32Array, RecordBatch};
use arrow::datatypes::{DataType, Field, Schema};
use datafusion::error::Result as DataFusionResult;
use datafusion::prelude::SessionContext;
use datafusion_catalog::MemTable;
use datafusion_expr::col;
use datafusion_functions_window::expr_fn::lag;
use std::sync::Arc;

#[tokio::test]
async fn with_column_lag() -> DataFusionResult<()> {
    let schema = Schema::new(vec![Field::new("a", DataType::Int32, true)]);

    let batch = RecordBatch::try_new(
        Arc::new(schema.clone()),
        vec![Arc::new(Int32Array::from(vec![1, 2, 3, 4, 5]))],
    )?;

    let ctx = SessionContext::new();

    let provider = MemTable::try_new(Arc::new(schema), vec![vec![batch]])?;
    ctx.register_table("t", Arc::new(provider))?;

    let df = ctx.table("t").await?;

    let lag_expr = lag(col("a"), Some(1), None);

    df.with_column("lag_val", lag_expr)?.show().await?;

    Ok(())
}

Generates output:

+---+---------------------------------------------------------------------------------+---------+
| a | lag(t.a,Int64(1),NULL) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | lag_val |
+---+---------------------------------------------------------------------------------+---------+
| 1 |                                                                                 |         |
| 2 | 1                                                                               | 1       |
| 3 | 2                                                                               | 2       |
| 4 | 3                                                                               | 3       |
| 5 | 4                                                                               | 4       |
+---+---------------------------------------------------------------------------------+---------+

Expected behavior

The output above should only contain two columns, a and lag_val.

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions