Skip to content

Added support for spark importer table level comments#761

Merged
jochenchrist merged 34 commits intodatacontract:mainfrom
robert-altmiller:main
May 22, 2025
Merged

Added support for spark importer table level comments#761
jochenchrist merged 34 commits intodatacontract:mainfrom
robert-altmiller:main

Conversation

@robert-altmiller
Copy link
Contributor

@robert-altmiller robert-altmiller commented May 18, 2025

Right now the spark importer only captures column level comments in the Data Contract Specification (DCS), and it does not capture table level comments with model.description = table_comment. I have added in code which captures it in DCS, and then it shows up in the final ODCS contract after DCS is exported to ODCS. See the image below what happens when I tested these changes.

image

@robert-altmiller
Copy link
Contributor Author

image

@robert-altmiller
Copy link
Contributor Author

robert-altmiller commented May 19, 2025

We want all three methods here from a performance perspective. The last method "Describe Table Extended..." in the '_table_comment_from_spark()' Python function is slow if it has to run for hundreds of tables. You would think that method would cover all Databricks cluster types using the correct DBR runtime but it fails if column masking is enabled on the Delta table you are reading from. In this case The WorkspaceClient() method is the only one that works for fetching the table_comment when column masking is enabled on the table.

I have also modified how the output error messages look so as not to confuse the end user using the data_contract.import_from_source("spark", "<table_name>")

image
image

acreese11 and others added 2 commits May 20, 2025 15:03
Enhance SparkImporter with logging and improve table comment retrieva…
@jochenchrist jochenchrist merged commit 7257137 into datacontract:main May 22, 2025
5 checks passed
@jochenchrist
Copy link
Contributor

Thanks for your contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants