Skip to content

Streaming model creation fails with incompatible schema error #1235

@adam-staros95

Description

@adam-staros95

Describe the bug

Streaming model creation fails with incompatible schema error when --full-refresh flag is provided to dbt run command. Problem appears when underlying parquet files have different columns across s3 paths.

Steps To Reproduce

Create dbt model as follows:

-- users.sql
{% set env = get_databricks_env_name() %}
{% set s3_path = var(env ~ '_s3_path') %}
{% set model_name = 'Users' %}
{{
    config(
        catalog='bronze_'~env,
        alias=model_name|lower,
        materialized='streaming_table'
    )
}}

select *
from
    stream read_files(
        '{{ s3_path }}{{ get_model_prefix_per_env(env, model_name) }}', format => 'parquet'
    )

And execute dbt run --select users --full-refresh

Expected behavior

Streaming table should be created and schema should be inferred correctly

Screenshots and log output

Database Error in model users (models/bronze/users.sql)
  Table 'users' has a user-specified schema that is incompatible with the schema
   inferred from its query.
  "
  Streaming tables are stateful and remember data that has already been
  processed. If you want to recompute the table from scratch, please full refresh
  the table.
                
  
  Declared schema:
  root
   |-- columns list
  
  
  Inferred schema:
  root
   |-- columns list with extra columns not listed in `Declared schema`
  compiled code at target/run/<project_name>/models/bronze/users.sql

System information

The output of dbt --version:

Core:
  - installed: 1.10.5 
  - latest:    1.10.13 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.10.14 - Up to date!
  - redshift:   1.8.1   - Update available!
  - postgres:   1.8.2   - Update available!
  - spark:      1.9.3   - Up to date!

The operating system you're using:
macOS Sequoia

The output of python --version:
Python 3.9.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions