Removing uniqueness constraints on tables table#6718
Conversation
|
This is an important bugfix when you want to add the same table via multiple schemas in vertica or when you have a presto table replicated across multiple different databases. Logically speaking a table is only unique within the <database, schema>. |
There was a problem hiding this comment.
I like having more insight into what alembic is doing. I think that level of verbosity is a good idea for such risky operations like database upgrading.
|
There's a bigger question here as to whether we want any sort of unique constraint on this. You could have different owners that want to have diverging definition on top of the same table. There are pro/cons here... Contrainst in multi-engine alembic is a shitshow, we have functions somewhere to help with this. Check out We may want a new "attempt_remove_constraint" to wrap up the |
|
I kind of agree with you: Currently, the SQL definition aspect is tied to a table, and that too the SQL itself is not shown on the CRUD Ui so itsn't apparent then 'which version of the table to choose'. I agree that ideally we want no uniqueness. I am happy to change this to no uniqueness constraint whatsoever and also exposing the SQL definition on the CRUD UI. What would you suggest I do here: Keep the <database, schema, table> uniqueness but do the refactoring for the constraint stuff, OR should we remove constraints all together and/or expose the SQL definition on the CRUD ? Basically, the current scenario of uniqueness of table-name alone is broken and I am interested in getting past that pain point first. |
4bc8b0d to
b67e699
Compare
agrawaldevesh
left a comment
There was a problem hiding this comment.
@mistercrunch, I need your help fixing the failing unit tests (example https://travis-ci.org/agrawaldevesh/incubator-superset/jobs/482223605).
Do people really use the superset CLI's import_datasources and export_datasources that writes the sqla Databses and Druid cluster configuration to a YAML file and back ?
The failing unit tests (dict_import_export_tests.py's test_import_table_override_append) test this feature that one can export a YAML file, change some stuff and re-import it: Which should replace the existing configuration.
The problem is that the "id" field is not exposed in the YAML file, and so it determines uniqueness using the table-name for the SqlaTable. That's exactly the behavior we are trying to change. So I don't know how to fix these tests without actually having SqlaTable have unique fields :( :(.
So is this behavior of YAML import/export really used ?
There was a problem hiding this comment.
This hacking of is-sqlite checking was necessary to force it to drop the unique constraint. In the absence of this (even with batch_alter_table, recreate=always) it wasn't really creating a new copy of the table with the uniq constraint removed.
This is a bit hacky for sure, but it works for mysql, sqlite and postgresql. Further, it only pays the recreate cost when needed.
I think it would be a bit premature to refactor this and move into utils, but I would be happy to put that there just as well.
There was a problem hiding this comment.
Since no one should be using sqlite in production, that shouldn't be a big deal. Deleting the entry that created the constraint in the original migration would fix the non-production use cases.
There was a problem hiding this comment.
I am not sure of the changes in this entire file: They are working around test failures but I think they are hack and don't help. I will revert them once we have agreed about what to do about the failing tests
There was a problem hiding this comment.
Oh. Man. I don't remember too much about this import/export stuff, I helped out the person who wrote it originally a bit. I remember something about keeping track of the source system's id to allow for merging. Sorry you have to deal with this.
We should really just have an extra column with a uuid for syncing objects across databases.
There was a problem hiding this comment.
Removed any table constraints as suggested by @mistercrunch
There was a problem hiding this comment.
@agrawaldevesh I'm not sure why we should be removing this uniqueness constraint. I've actually proposed having a more restrictive uniqueness constraint per #5449.
In general there have been missing/incomplete uniqueness constraints in Superset which have caused a number of issues, especially in the CRUD views.
There was a problem hiding this comment.
Will remove the debug statements before commiting
There was a problem hiding this comment.
The reason I am removing all the table unique constraints is that in certain databases like postgresql, the uniq-constraint is on <database-id, schema, table> (caused by an old migration b4456560d4f3_change_table_unique_constraint.py), but that very same migration errors out in mysql, wherein the constraint is on . Whereas the constraint is on <database-id, table_name> in sqlite.
As you said, alembic migrations are a royal shitshow and frankly I think this is not scaleable :(.
Anyway, at least the DB migration aspect of this diff works for all database types.
|
Ping @mistercrunch ... The test is failing because of the unit test issue with dict_import_export_tests.py. I need your advise on what to do here. I am not sure how to make this change without breaking/changing existing yaml save/restore functionality. At the same time, I think I need this fix since it prevents me from adding the different presto/vertica table names that have same name but diff schema. |
|
Personally I feel like the right thing to do is to remove the unique constraint. For the sake of moving forward in your environment, you can probably just drop the constraint directly against the database. The import/export logic is not great. It's been buggy since it was contributed. Let me see if I can help. |
|
Oh btw another ruse to go around the fact that some contraints will never get deleted by alembic on some database engines, is to go on the original migration that created the constraint in the first place. While this doesn't fix existing environments, it fixes the issue for brand new installations. |
|
I did not follow you :) can you please paraphrase it ?
…On Wed, Jan 23, 2019 at 9:22 PM Maxime Beauchemin ***@***.***> wrote:
Oh btw another ruse to go around the fact that some contraints will never
get deleted by alembic on some database engines, is to go on the original
migration that created the constraint in the first place. While this
doesn't fix existing environments, it fixes the issue for brand new
installations.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#6718 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ATnI_-m4JaaGvnn9GIFAZdrox6KulAZcks5vGUMPgaJpZM4aHO3s>
.
|
There was a problem hiding this comment.
Should we log something? Like "Constraint couldn't be removed, you may want to do it manually" ?
There was a problem hiding this comment.
I agree we should log something :)
|
Oh let me try to explain again. Since there are no easy way to clean up constraints on some database engines, we can go and edit the original database migration that created the constraint in the first place and edit it to remove the line of code that creates the constraint. This does not fix existing installation, but it prevents new installation from creating the constraint in the first place. |
|
@mistercrunch, Thanks for the suggestion of editing the original migration file directly. I feel its a bit hacky :) It also goes against the alembic approach. I feel the problem isn't related to sqlite itself: Meaning, yes the alembic code needs to be different for sqlite vs non sqlite codepaths but that can be refactored away. I think the migration code I have here is fine, except that it breaks some obscure feature of export/import. So I am wondering if you think we should first try to fix that and if that does not work, then fall back to modifying the older migration scripts ? What do you think ? Or I can always give up and hack it to what you are suggesting :) |
60cec17 to
7d2c4a8
Compare
Codecov Report
@@ Coverage Diff @@
## master #6718 +/- ##
===========================================
- Coverage 73.42% 56.07% -17.36%
===========================================
Files 75 525 +450
Lines 10084 23227 +13143
Branches 0 2777 +2777
===========================================
+ Hits 7404 13024 +5620
- Misses 2680 9794 +7114
- Partials 0 409 +409
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #6718 +/- ##
==========================================
+ Coverage 56.07% 56.07% +<.01%
==========================================
Files 525 525
Lines 23220 23227 +7
Branches 2777 2777
==========================================
+ Hits 13020 13024 +4
- Misses 9791 9794 +3
Partials 409 409
Continue to review full report at Codecov.
|
|
@mistercrunch, the PR is ready to be merged in. As we discussed, it simply expands the uniqueness constraints as also suggested by @john-bodley. It causes no other backward incompat issues. @kristw, I am not sure what the tag risk-db-migration means. This diff adds a new migration but it does not require a manual step, just the usual superset db upgrade command. |
7d2c4a8 to
b39b893
Compare
Summary: Superset code enforces (in Table crud view pre_add) that the table is unique within <database, schema, table_name). Indeed in commit 15b67b2 (in 2016), the model was updated to reflect that. However, it was never ported over to a migration. I am fixing that in this diff. I am choosing to make this be a new migration instead of fixing an existing one since I want to fix existing installations also cleanly. I also considered removing the uniqueness constraint, but that won't work: First because anyway there are other places where the <database, schema, table> uniqueness is enforced in code. But also, the .sql field isn't a first citizen yet: The schema of the table is picked up from the table-name and the sql part is only used when creating the explore query. So indeed we want this uniqueness constraint. (Also it breaks the unit tests in dict_import_export_tests.py) [Perhaps it can be removed when we have true .sql support, but for now the user would have to create a database view and he can use that as the 'table name'. That way he gets schema inference also] Also added INFO logging to the alembic migration.
b39b893 to
01b0218
Compare
|
@agrawaldevesh We flag any PR with db-migration to be aware of them before including in release and testing on staging/production environment. |
This reverts commit c4fb7a0.
Summary: Superset code enforces (in Table crud view pre_add) that the table is unique within <database, schema, table_name). Indeed in commit 416abff (in 2016), the model was updated to reflect that. However, it was never ported over to a migration. I am fixing that in this diff. I am choosing to make this be a new migration instead of fixing an existing one since I want to fix existing installations also cleanly. I also considered removing the uniqueness constraint, but that won't work: First because anyway there are other places where the <database, schema, table> uniqueness is enforced in code. But also, the .sql field isn't a first citizen yet: The schema of the table is picked up from the table-name and the sql part is only used when creating the explore query. So indeed we want this uniqueness constraint. (Also it breaks the unit tests in dict_import_export_tests.py) [Perhaps it can be removed when we have true .sql support, but for now the user would have to create a database view and he can use that as the 'table name'. That way he gets schema inference also] Also added INFO logging to the alembic migration.
* Revert "creating new circular-json safe stringify and replacing one call (apache#6772)" This reverts commit 147bfb6. * Revert "Improve Unicode support for MSSQL (apache#6690)" This reverts commit 313890c. * Revert "Fix uniqueness constraints on tables table (apache#6718)" This reverts commit 04312db.
Summary: The tables table was inconsistent wrt to the schema resulting
from the migrations and the model schema. The migrations said there was
uniqueness constraint on table_name, but the model schema said there
should be one on (database_id, table_name)
I hit this bug when trying to add the same table via two different
databases in the UI. It refused me and it wasn't immediately apparent
what was the cause (since the code referred to the uniquenes
constraint). After a bunch of debugging I narrowed down to this
"missing" migration script: That migrates from the uniq(table_name) to
uniq(database_id, schema, table_name) and fixes both the model and
migration script
And as an aside, also adds ctags file to .gitignore