-
Notifications
You must be signed in to change notification settings - Fork 178
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Streaming model creation fails with incompatible schema error when --full-refresh flag is provided to dbt run command. Problem appears when underlying parquet files have different columns across s3 paths.
Steps To Reproduce
Create dbt model as follows:
-- users.sql
{% set env = get_databricks_env_name() %}
{% set s3_path = var(env ~ '_s3_path') %}
{% set model_name = 'Users' %}
{{
config(
catalog='bronze_'~env,
alias=model_name|lower,
materialized='streaming_table'
)
}}
select *
from
stream read_files(
'{{ s3_path }}{{ get_model_prefix_per_env(env, model_name) }}', format => 'parquet'
)And execute dbt run --select users --full-refresh
Expected behavior
Streaming table should be created and schema should be inferred correctly
Screenshots and log output
Database Error in model users (models/bronze/users.sql)
Table 'users' has a user-specified schema that is incompatible with the schema
inferred from its query.
"
Streaming tables are stateful and remember data that has already been
processed. If you want to recompute the table from scratch, please full refresh
the table.
Declared schema:
root
|-- columns list
Inferred schema:
root
|-- columns list with extra columns not listed in `Declared schema`
compiled code at target/run/<project_name>/models/bronze/users.sql
System information
The output of dbt --version:
Core:
- installed: 1.10.5
- latest: 1.10.13 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- databricks: 1.10.14 - Up to date!
- redshift: 1.8.1 - Update available!
- postgres: 1.8.2 - Update available!
- spark: 1.9.3 - Up to date!
The operating system you're using:
macOS Sequoia
The output of python --version:
Python 3.9.6
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working