Skip to content
This repository was archived by the owner on May 2, 2023. It is now read-only.

Conversation

@dbeatty10
Copy link
Contributor

resolves #21

History

#22 was an initial attempt at unblocking data-diff --dbt for DuckDB. This is a second attempt.

Happy to iterate on this PR to fix things for DuckDB the proper way 🤗 Just let me know anything that needs to be addressed.

Enabling 3-part fully-qualified names (FQN)

This PR enables all of the following for DuckDB connections:

  • 1-part paths
  • 2-part paths
  • 3-part paths

The latter is especially important for the following reasons:

  1. DuckDB 0.7.0 adds support for attaching multiple databases
  2. It appears that prior to DuckDB 0.7.0, supplying a database/catalog in a 3-part path is allowed and handled gracefully

Implementation details

Both DuckDB and Redshift have lots of similarities with PostgreSQL. So for this implementation, I just copied-pasted two methods from redshift to duckdb:

  1. select_table_schema
  2. _normalize_table_path

Caveats

  • I only tested with with DuckDB 0.6.1 -- I didn't test this with 0.7.0.
  • 3-part paths worked for me with duckcli 0.2.1 + DuckDB 0.6.1, even when I supplied a random identifier for the database
  • 3-part paths worked for me with dbt-duckdb 1.3.4 + DuckDB 0.6.1, even when I supplied a random identifier for the database
  • I didn't thoroughly investigate why 3-part paths worked in all of these cases, but I suspect DuckDB 0.6.1 just ignores the database specifier if supplied.

See it in action

Wanna see it in action? Follow the simple instructions here.

@erezsh
Copy link
Contributor

erezsh commented Feb 24, 2023

Please base your PR on this recently merged PR - #18

Use a similar implementation. (using lower() is probably wrong for duckdb)

Also add DuckDB to the simple test that it introduces.

@dbeatty10
Copy link
Contributor Author

@erezsh I took a look at #18 and tried to align with it, namely:

  • Copying the implementations of select_table_schema and _normalize_table_path from Postgres rather than Redshift
  • Added DuckDB to the simple test that it introduces

Happy to iterate further as-needed.

@dlawin
Copy link
Contributor

dlawin commented Feb 24, 2023

File "/home/runner/work/sqeleton/sqeleton/sqeleton/databases/base.py", line 503, in _query_cursor
    c.execute(sql_code)
duckdb.CatalogException: Catalog Error: Table with name duckdb does not exist!
Did you mean "duckdb_types"?

@dbeatty10
Copy link
Contributor Author

Thanks for sharing that error message from the CI checks @dlawin.

Are there any instructions I can follow for running the CI checks locally?

@erezsh
Copy link
Contributor

erezsh commented Feb 24, 2023

@dbeatty10 See https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md#development-setup

@dlawin
Copy link
Contributor

dlawin commented Feb 24, 2023

1 and 2 part drop statements are working, but seems that 3 part is not:
DROP TABLE "duckdb"."main"."tbl__pti4o" - > duckdb.CatalogException: Catalog Error: Table with name duckdb does not exist!

@dlawin
Copy link
Contributor

dlawin commented Feb 24, 2023

Hmm I think the issue here is that duckdb doesn't seem to support 3 part paths. Only <schema>.<object>

@erezsh
Copy link
Contributor

erezsh commented Feb 24, 2023

It should. https://duckdb.org/docs/sql/statements/attach#name-qualification

Are you sure you're using the newest duckdb?

@dlawin
Copy link
Contributor

dlawin commented Feb 24, 2023

seems to be added in 0.7.0

@dbeatty10
Copy link
Contributor Author

I have not tested how 3-part paths for DELETE statements works for the latest DuckDB (0.7.0), but I did just bump pyproject.toml to 0.7.0.

Could you try re-running the workflow?

More detail

Hmm I think the issue here is that duckdb doesn't seem to support 3 part paths. Only <schema>.<object>

I just did some more testing with DuckDB 0.6 (which is not the newest version).

It does support 3-part paths for SELECT statements, but it does not support 3-part paths for DELETE statements.

image

@dbeatty10
Copy link
Contributor Author

Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run poetry lock [--no-update] to fix it.

Looks like I need to generate a new lock file after updating pyproject.toml.

@dlawin
Copy link
Contributor

dlawin commented Feb 24, 2023

Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run poetry lock [--no-update] to fix it.

Looks like I need to generate a new lock file after updating pyproject.toml.

Oh I beat you to it. Works for me locally and the tests are passing here

@dlawin dlawin requested a review from erezsh February 24, 2023 20:47
@erezsh erezsh merged commit e5e51e3 into datafold:master Feb 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error when connecting to DuckDB with data-diff --dbt

3 participants