Skip to content

Removing uniqueness constraints on tables table#6718

Merged
mistercrunch merged 1 commit into
apache:masterfrom
agrawaldevesh:fix-uniqueness-table
Jan 29, 2019
Merged

Removing uniqueness constraints on tables table#6718
mistercrunch merged 1 commit into
apache:masterfrom
agrawaldevesh:fix-uniqueness-table

Conversation

@agrawaldevesh
Copy link
Copy Markdown

Summary: The tables table was inconsistent wrt to the schema resulting
from the migrations and the model schema. The migrations said there was
uniqueness constraint on table_name, but the model schema said there
should be one on (database_id, table_name)

I hit this bug when trying to add the same table via two different
databases in the UI. It refused me and it wasn't immediately apparent
what was the cause (since the code referred to the uniquenes
constraint). After a bunch of debugging I narrowed down to this
"missing" migration script: That migrates from the uniq(table_name) to
uniq(database_id, schema, table_name) and fixes both the model and
migration script

And as an aside, also adds ctags file to .gitignore

@agrawaldevesh
Copy link
Copy Markdown
Author

This is an important bugfix when you want to add the same table via multiple schemas in vertica or when you have a presto table replicated across multiple different databases. Logically speaking a table is only unique within the <database, schema>.

Comment thread superset/migrations/alembic.ini Outdated
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having more insight into what alembic is doing. I think that level of verbosity is a good idea for such risky operations like database upgrading.

@mistercrunch
Copy link
Copy Markdown
Member

mistercrunch commented Jan 19, 2019

There's a bigger question here as to whether we want any sort of unique constraint on this. You could have different owners that want to have diverging definition on top of the same table. There are pro/cons here...

Contrainst in multi-engine alembic is a shitshow, we have functions somewhere to help with this. Check out superset.utils.core.*constraint.* You can see how they are used in previous migrations.

We may want a new "attempt_remove_constraint" to wrap up the try logic.

@agrawaldevesh
Copy link
Copy Markdown
Author

I kind of agree with you: Currently, the SQL definition aspect is tied to a table, and that too the SQL itself is not shown on the CRUD Ui so itsn't apparent then 'which version of the table to choose'. I agree that ideally we want no uniqueness.

I am happy to change this to no uniqueness constraint whatsoever and also exposing the SQL definition on the CRUD UI.

What would you suggest I do here: Keep the <database, schema, table> uniqueness but do the refactoring for the constraint stuff, OR should we remove constraints all together and/or expose the SQL definition on the CRUD ?

Basically, the current scenario of uniqueness of table-name alone is broken and I am interested in getting past that pain point first.

@agrawaldevesh agrawaldevesh force-pushed the fix-uniqueness-table branch 8 times, most recently from 4bc8b0d to b67e699 Compare January 21, 2019 02:19
Copy link
Copy Markdown
Author

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch, I need your help fixing the failing unit tests (example https://travis-ci.org/agrawaldevesh/incubator-superset/jobs/482223605).

Do people really use the superset CLI's import_datasources and export_datasources that writes the sqla Databses and Druid cluster configuration to a YAML file and back ?

The failing unit tests (dict_import_export_tests.py's test_import_table_override_append) test this feature that one can export a YAML file, change some stuff and re-import it: Which should replace the existing configuration.

The problem is that the "id" field is not exposed in the YAML file, and so it determines uniqueness using the table-name for the SqlaTable. That's exactly the behavior we are trying to change. So I don't know how to fix these tests without actually having SqlaTable have unique fields :( :(.

So is this behavior of YAML import/export really used ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hacking of is-sqlite checking was necessary to force it to drop the unique constraint. In the absence of this (even with batch_alter_table, recreate=always) it wasn't really creating a new copy of the table with the uniq constraint removed.

This is a bit hacky for sure, but it works for mysql, sqlite and postgresql. Further, it only pays the recreate cost when needed.

I think it would be a bit premature to refactor this and move into utils, but I would be happy to put that there just as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since no one should be using sqlite in production, that shouldn't be a big deal. Deleting the entry that created the constraint in the original migration would fix the non-production use cases.

Comment thread superset/models/helpers.py Outdated
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the changes in this entire file: They are working around test failures but I think they are hack and don't help. I will revert them once we have agreed about what to do about the failing tests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Man. I don't remember too much about this import/export stuff, I helped out the person who wrote it originally a bit. I remember something about keeping track of the source system's id to allow for merging. Sorry you have to deal with this.

We should really just have an extra column with a uuid for syncing objects across databases.

Comment thread superset/connectors/sqla/models.py Outdated
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed any table constraints as suggested by @mistercrunch

Copy link
Copy Markdown
Member

@john-bodley john-bodley Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawaldevesh I'm not sure why we should be removing this uniqueness constraint. I've actually proposed having a more restrictive uniqueness constraint per #5449.

In general there have been missing/incomplete uniqueness constraints in Superset which have caused a number of issues, especially in the CRUD views.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove the debug statements before commiting

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I am removing all the table unique constraints is that in certain databases like postgresql, the uniq-constraint is on <database-id, schema, table> (caused by an old migration b4456560d4f3_change_table_unique_constraint.py), but that very same migration errors out in mysql, wherein the constraint is on . Whereas the constraint is on <database-id, table_name> in sqlite.

As you said, alembic migrations are a royal shitshow and frankly I think this is not scaleable :(.

Anyway, at least the DB migration aspect of this diff works for all database types.

@agrawaldevesh agrawaldevesh changed the title Fix uniqueness constraints on tables table Removing uniqueness constraints on tables table Jan 22, 2019
@agrawaldevesh
Copy link
Copy Markdown
Author

Ping @mistercrunch ... The test is failing because of the unit test issue with dict_import_export_tests.py. I need your advise on what to do here. I am not sure how to make this change without breaking/changing existing yaml save/restore functionality. At the same time, I think I need this fix since it prevents me from adding the different presto/vertica table names that have same name but diff schema.

@mistercrunch
Copy link
Copy Markdown
Member

Personally I feel like the right thing to do is to remove the unique constraint.

For the sake of moving forward in your environment, you can probably just drop the constraint directly against the database.

The import/export logic is not great. It's been buggy since it was contributed. Let me see if I can help.

@mistercrunch
Copy link
Copy Markdown
Member

Oh btw another ruse to go around the fact that some contraints will never get deleted by alembic on some database engines, is to go on the original migration that created the constraint in the first place. While this doesn't fix existing environments, it fixes the issue for brand new installations.

@agrawaldevesh
Copy link
Copy Markdown
Author

agrawaldevesh commented Jan 24, 2019 via email

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log something? Like "Constraint couldn't be removed, you may want to do it manually" ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should log something :)

@mistercrunch
Copy link
Copy Markdown
Member

Oh let me try to explain again. Since there are no easy way to clean up constraints on some database engines, we can go and edit the original database migration that created the constraint in the first place and edit it to remove the line of code that creates the constraint.

This does not fix existing installation, but it prevents new installation from creating the constraint in the first place.

@kristw kristw added risk:db-migration PRs that require a DB migration !deprecated-label:bug Deprecated label - Use #bug instead labels Jan 24, 2019
@agrawaldevesh
Copy link
Copy Markdown
Author

@mistercrunch, Thanks for the suggestion of editing the original migration file directly. I feel its a bit hacky :) It also goes against the alembic approach.

I feel the problem isn't related to sqlite itself: Meaning, yes the alembic code needs to be different for sqlite vs non sqlite codepaths but that can be refactored away.

I think the migration code I have here is fine, except that it breaks some obscure feature of export/import. So I am wondering if you think we should first try to fix that and if that does not work, then fall back to modifying the older migration scripts ?

What do you think ? Or I can always give up and hack it to what you are suggesting :)

@agrawaldevesh agrawaldevesh force-pushed the fix-uniqueness-table branch 2 times, most recently from 60cec17 to 7d2c4a8 Compare January 25, 2019 20:25
@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 25, 2019

Codecov Report

Merging #6718 into master will decrease coverage by 17.35%.
The diff coverage is 70%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #6718       +/-   ##
===========================================
- Coverage   73.42%   56.07%   -17.36%     
===========================================
  Files          75      525      +450     
  Lines       10084    23227    +13143     
  Branches        0     2777     +2777     
===========================================
+ Hits         7404    13024     +5620     
- Misses       2680     9794     +7114     
- Partials        0      409      +409
Impacted Files Coverage Δ
superset/connectors/sqla/models.py 78.08% <100%> (ø) ⬆️
superset/models/helpers.py 84.35% <66.66%> (-1.11%) ⬇️
superset/assets/src/components/Checkbox.jsx 100% <0%> (ø)
...ations/deckgl/layers/Polygon/PolygonChartPlugin.js 0% <0%> (ø)
...ets/src/dashboard/components/dnd/DragDroppable.jsx 97.14% <0%> (ø)
...c/visualizations/deckgl/layers/Polygon/Polygon.jsx 0% <0%> (ø)
...ssets/src/visualizations/presets/MapChartPreset.js 0% <0%> (ø)
superset/assets/src/components/EditableTitle.jsx 77.14% <0%> (ø)
...assets/src/visualizations/Iframe/transformProps.js 0% <0%> (ø)
superset/assets/src/setup/setupPlugins.js 0% <0%> (ø)
... and 442 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83ee917...01b0218. Read the comment docs.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #6718 into master will increase coverage by <.01%.
The diff coverage is 70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6718      +/-   ##
==========================================
+ Coverage   56.07%   56.07%   +<.01%     
==========================================
  Files         525      525              
  Lines       23220    23227       +7     
  Branches     2777     2777              
==========================================
+ Hits        13020    13024       +4     
- Misses       9791     9794       +3     
  Partials      409      409
Impacted Files Coverage Δ
superset/connectors/sqla/models.py 78.08% <100%> (ø) ⬆️
superset/models/helpers.py 84.35% <66.66%> (-1.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8100a8f...7d2c4a8. Read the comment docs.

@agrawaldevesh
Copy link
Copy Markdown
Author

@mistercrunch, the PR is ready to be merged in. As we discussed, it simply expands the uniqueness constraints as also suggested by @john-bodley. It causes no other backward incompat issues.

@kristw, I am not sure what the tag risk-db-migration means. This diff adds a new migration but it does not require a manual step, just the usual superset db upgrade command.

Summary: Superset code enforces (in Table crud view pre_add) that the
table is unique within <database, schema, table_name). Indeed in commit
15b67b2 (in 2016), the model was
updated to reflect that. However, it was never ported over to a
migration.

I am fixing that in this diff. I am choosing to make this be a new
migration instead of fixing an existing one since I want to fix existing
installations also cleanly.

I also considered removing the uniqueness constraint, but that won't
work: First because anyway there are other places where the <database,
schema, table> uniqueness is enforced in code. But also, the .sql field
isn't a first citizen yet: The schema of the table is picked up from the
table-name and the sql part is only used when creating the explore
query. So indeed we want this uniqueness constraint. (Also it breaks the
unit tests in dict_import_export_tests.py)
[Perhaps it can be removed when we have true .sql support, but for now
the user would have to create a database view and he can use that as the
'table name'. That way he gets schema inference also]

Also added INFO logging to the alembic migration.
@mistercrunch mistercrunch merged commit c4fb7a0 into apache:master Jan 29, 2019
@kristw
Copy link
Copy Markdown
Contributor

kristw commented Jan 29, 2019

@agrawaldevesh We flag any PR with db-migration to be aware of them before including in release and testing on staging/production environment.

john-bodley added a commit that referenced this pull request Jan 30, 2019
john-bodley added a commit that referenced this pull request Jan 31, 2019
* Revert "creating new circular-json safe stringify and replacing one call (#6772)"

This reverts commit 11a7ad0.

* Revert "Improve Unicode support for MSSQL (#6690)"

This reverts commit c44ae61.

* Revert "Fix uniqueness constraints on tables table (#6718)"

This reverts commit c4fb7a0.
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 First shipped in 0.34.0 labels Feb 28, 2024
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
Summary: Superset code enforces (in Table crud view pre_add) that the
table is unique within <database, schema, table_name). Indeed in commit
416abff (in 2016), the model was
updated to reflect that. However, it was never ported over to a
migration.

I am fixing that in this diff. I am choosing to make this be a new
migration instead of fixing an existing one since I want to fix existing
installations also cleanly.

I also considered removing the uniqueness constraint, but that won't
work: First because anyway there are other places where the <database,
schema, table> uniqueness is enforced in code. But also, the .sql field
isn't a first citizen yet: The schema of the table is picked up from the
table-name and the sql part is only used when creating the explore
query. So indeed we want this uniqueness constraint. (Also it breaks the
unit tests in dict_import_export_tests.py)
[Perhaps it can be removed when we have true .sql support, but for now
the user would have to create a database view and he can use that as the
'table name'. That way he gets schema inference also]

Also added INFO logging to the alembic migration.
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
* Revert "creating new circular-json safe stringify and replacing one call (apache#6772)"

This reverts commit 147bfb6.

* Revert "Improve Unicode support for MSSQL (apache#6690)"

This reverts commit 313890c.

* Revert "Fix uniqueness constraints on tables table (apache#6718)"

This reverts commit 04312db.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels !deprecated-label:bug Deprecated label - Use #bug instead risk:db-migration PRs that require a DB migration 🚢 0.34.0 First shipped in 0.34.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants