Added support for spark importer table level comments#761
Merged
jochenchrist merged 34 commits intodatacontract:mainfrom May 22, 2025
Merged
Added support for spark importer table level comments#761jochenchrist merged 34 commits intodatacontract:mainfrom
jochenchrist merged 34 commits intodatacontract:mainfrom
Conversation
Contributor
Author
Contributor
Author
|
We want all three methods here from a performance perspective. The last method "Describe Table Extended..." in the '_table_comment_from_spark()' Python function is slow if it has to run for hundreds of tables. You would think that method would cover all Databricks cluster types using the correct DBR runtime but it fails if column masking is enabled on the Delta table you are reading from. In this case The WorkspaceClient() method is the only one that works for fetching the table_comment when column masking is enabled on the table. I have also modified how the output error messages look so as not to confuse the end user using the data_contract.import_from_source("spark", "<table_name>") |
Enhance SparkImporter with logging and improve table comment retrieva…
Contributor
|
Thanks for your contribution |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Right now the spark importer only captures column level comments in the Data Contract Specification (DCS), and it does not capture table level comments with model.description = table_comment. I have added in code which captures it in DCS, and then it shows up in the final ODCS contract after DCS is exported to ODCS. See the image below what happens when I tested these changes.