-
Notifications
You must be signed in to change notification settings - Fork 333
fix: consider sqlmesh major/minor for migrate and rollback #3446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like dbt-fabric published a new "patch" version and renamed a public function I'm attempting to exclude this patch version for the moment. |
|
Looks like PySpark Pandas has an issue that will be fixed in v4 because of the removal of distutils from Python 3.12 https://github.com/apache/spark/blob/v3.5.3/python/pyspark/sql/pandas/utils.py#L24 Still working out how to handle this. |
|
This was when the Bitnami image moved to Python 3.12 Looks like we should be able to use this image https://hub.docker.com/layers/bitnami/spark/3.5.3-debian-12-r0/images/sha256-0c045caae9ec3498fa2ddbe8d34ecd7d96178113502cac4feb2a58e837af2f6c?context=explore |
|
Going to wait on feedback/suggestions now with the PySpark failure. |
| [ | ||
| migrations, | ||
| major_minor(SQLGLOT_VERSION) != versions.minor_sqlglot_version, | ||
| major_minor(SQLMESH_VERSION) != versions.minor_sqlmesh_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this change? The sqlmesh minor version was omitted intentionally to avoid migration when it's not needed even if the minor version was bumped. Generally there are only 2 reasons to migrate: if there's a new migration script (the migrations collection) or if the sqlglot version changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the issue carefully and now I understand the motivation better. However, migrate_rows not only controls whether the backups are made but also whether the migration should be performed. I suggested decoupling the 2 and only do backups on sqlmesh minor versions changes without affecting the migrate_rows flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading your comments and digging in to the code a bit more I believe I can see what you're talking about.
Hopefully I've addressed your concerns by only performing a table copy when it's a sqlmesh minor change without impacting the flag that drives the complex snapshot and environment migration process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks great, thank you! However, I'm not sure why the refactor was performed in the other method (def migrate)? Is that change necessary? If not, can we please revert it.
|
Hello - we've fixed the Pyspark issue, so you should be good to revert the Dockerfile change and rebase |
627ccc9 to
7769448
Compare
izeigerman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once green. Thanks for addressing comments!
7769448 to
9554f26
Compare
Fixes #3445
I was considering on top of this if it would be sensible to extend/copy the circle ci migration test.