Skip to content

Add support for period character in table names#7453

Merged
villebro merged 9 commits into
apache:masterfrom
villebro:table_metadata
May 26, 2019
Merged

Add support for period character in table names#7453
villebro merged 9 commits into
apache:masterfrom
villebro:table_metadata

Conversation

@villebro
Copy link
Copy Markdown
Member

@villebro villebro commented May 4, 2019

CATEGORY

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

In SQL Lab, table names are currently assumed to follow the following convention:

  • schema.table or
  • table

This is handled in the frontend by assuming that a period in the table name always implies a separator between schema and table name. Since there is no standardized way for SQLAlchemy inspectors to return table names (some return schema.table, others only table), they can be in either format. Since some databases (at least Apache Drill and Postgres) support periods in schema and table names, and their respective inspectors don't return schema prefixed table names, this causes problems when querying tables, as TableSelector strips away everything before the period character. This PR moves this logic from the frontend to the backend, and makes it possible to configure this behavior per engine.

This proposal changes table name handling in the following way:

  1. An attribute try_remove_schema_from_table_name is added to db_engine_specs (defaults to True). By default, when True, get_table_names() and get_view_names() checks if a table name starts with the schema name followed by a period, and if so, removes the schema name from the table name. Example: schema.table becomes table, while table remains unchanged.
  2. Table names are passed as dicts {'schema': 'schema_name', 'table': 'table_name'} when handed to the frontend. Previously they were either of the format table or schema.table. This removes any ambiguity in the frontend.
  3. SQL Lab UI now works in the following way:
    • If no schema is selected, table/view names are displayed in the dropdown as schema.table.
    • If a schema is selected, tables are shown as table only in the dropdown.
      Filtering also supports this, i.e. when no schema is chosen, the filter substring makes the comparison assuming that the table name is schema.table, making it possible to include the schema name in the filter string.

SCREENSHOTS

When no schema is chosen, table names are displayed as schema.table:
Screenshot 2019-05-09 at 21 09 07

If a schema is selected, only the table name is shown (in this case the table name is test.table, not table in schema test):
Screenshot 2019-05-09 at 21 09 43

TEST PLAN

Tested locally on Postgres and sqlite. Js unit tests updated to correspond to new data structures and python unit tests added to test table name fetching.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@cgivre

@codecov-io
Copy link
Copy Markdown

codecov-io commented May 10, 2019

Codecov Report

Merging #7453 into master will increase coverage by 0.05%.
The diff coverage is 37.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7453      +/-   ##
==========================================
+ Coverage   65.17%   65.23%   +0.05%     
==========================================
  Files         433      433              
  Lines       21428    21433       +5     
  Branches     2360     2358       -2     
==========================================
+ Hits        13966    13981      +15     
+ Misses       7342     7332      -10     
  Partials      120      120
Impacted Files Coverage Δ
superset/cli.py 36.01% <0%> (ø) ⬆️
.../assets/src/SqlLab/components/SqlEditorLeftBar.jsx 40.42% <0%> (+3.88%) ⬆️
superset/utils/core.py 88.21% <100%> (+0.06%) ⬆️
superset/assets/src/components/TableSelector.jsx 84.16% <100%> (-0.52%) ⬇️
superset/security.py 75% <100%> (+0.22%) ⬆️
superset/db_engine_specs.py 62.35% <35.71%> (+0.87%) ⬆️
superset/models/core.py 83.74% <69.23%> (+0.28%) ⬆️
superset/views/core.py 72.82% <8%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ae000a...75387cd. Read the comment docs.

@villebro
Copy link
Copy Markdown
Member Author

@cgivre please take a look at this PR. If this works out this should make merging your Drill PR slightly easier.

@cgivre
Copy link
Copy Markdown
Contributor

cgivre commented May 10, 2019

@villebro I'll take a look this weekend. This pretty much will solve the issues with the Drill integration.

Comment thread superset/db_engine_specs.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason db wasn't type annotated is because it caused a circular import (models.core already has a reference to db_engine_specs). This should probably be refactored, but is outside the scope of this PR.

@villebro villebro changed the title [WIP] Refine table name handling in SQL Lab Add support for period in table name May 11, 2019
@villebro villebro changed the title Add support for period in table name Add support for period character in table name May 11, 2019
@villebro villebro changed the title Add support for period character in table name Add support for period character in table names May 11, 2019
@cgivre
Copy link
Copy Markdown
Contributor

cgivre commented May 12, 2019

@villebro I tried this out and it worked really well even without the Drill PR! I think pretty much the only thing that Drill needs now for the integration is the time grains! Thanks for your help with this.
LGTM +1

@villebro
Copy link
Copy Markdown
Member Author

Thanks for verifying that this works @cgivre . @mistercrunch @john-bodley @betodealmeida would really appreciate help reviewing this, as I think this is one of the last hurdles to being able to support engines that rely on the presence of non-standard characters in schema/table names.

@villebro
Copy link
Copy Markdown
Member Author

Kind reminder to committers that this is pending review, would be great to get this reviewed/merged so Drill integration can be finalized (is blocked by this PR).

@john-bodley
Copy link
Copy Markdown
Member

john-bodley commented May 20, 2019

@villebro my main comment which is somewhat related to #7490 is given that the cluster/schema/table name construct can be quite complicated and historically we've often flatten these names into a single string (and then split or used regular expressions to extract the components) whether we should move towards using a class (possibly a dataclass) to represent these objects everywhere in a canonical way.

@villebro
Copy link
Copy Markdown
Member Author

Good point @john-bodley , having a dedicated class with proper parsing/formatting functionality would probably be a good idea. Do you feel this should be addressed as part of this PR, or start a new PR for that?

@john-bodley
Copy link
Copy Markdown
Member

I think a separate PR is fine as it’s probably a large change.

@villebro villebro merged commit f7d3413 into apache:master May 26, 2019
@mistercrunch mistercrunch added the 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels label Feb 28, 2024
@mistercrunch mistercrunch added the 🚢 0.34.0 First shipped in 0.34.0 label Feb 28, 2024
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
* Move schema name handling in table names from frontend to backend

* Rename all_schema_names to get_all_schema_names

* Fix js errors

* Fix additional js linting errors

* Refactor datasource getters and fix linting errors

* Update js unit tests

* Add python unit test for get_table_names method

* Add python unit test for get_table_names method

* Fix js linting error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.34.0 First shipped in 0.34.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants