Skip to content

feat: rename TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG#11509

Merged
ktmud merged 4 commits into
apache:masterfrom
ktmud:data-cache
Nov 14, 2020
Merged

feat: rename TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG#11509
ktmud merged 4 commits into
apache:masterfrom
ktmud:data-cache

Conversation

@ktmud
Copy link
Copy Markdown
Member

@ktmud ktmud commented Oct 31, 2020

SUMMARY

💥 Breaking Change

Rename config value TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG, and save datasource query results to this cache, too.

Motivation

Companies like Airbnb have different security levels for Superset itself and the datasources Superset connects to: anyone can login to the same Superset, but not everyone have access to all data. Superset has SecurityManager to check whether a user has access to certain datasource, but all query results are still stored in the same cache, regardless of users' access levels. If a malicious party obtained access to the cache server via certain Superset user in a high-risk region, they get access to all cached data.

By separating cache storage for Superset metadata and the actual data users consume, we can have a better insulation of sensitive data by configuring different cache backend for high-risk regions while keeping the cache for all the Superset metadata in sync.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

CI

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration: Superset admins need to manually update cache configs
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@codecov-io
Copy link
Copy Markdown

codecov-io commented Oct 31, 2020

Codecov Report

Merging #11509 (226fbe9) into master (68693c7) will increase coverage by 4.26%.
The diff coverage is 65.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11509      +/-   ##
==========================================
+ Coverage   62.86%   67.12%   +4.26%     
==========================================
  Files         889      889              
  Lines       43055    43062       +7     
  Branches     4017     4017              
==========================================
+ Hits        27065    28905    +1840     
+ Misses      15811    14060    -1751     
+ Partials      179       97      -82     
Flag Coverage Δ
cypress 56.05% <ø> (?)
javascript 62.81% <ø> (ø)
python 62.87% <65.68%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/utils/decorators.py 100.00% <ø> (+34.17%) ⬆️
superset/utils/screenshots.py 36.36% <0.00%> (ø)
superset/viz_sip38.py 0.00% <0.00%> (ø)
superset/utils/cache.py 57.57% <54.38%> (-22.43%) ⬇️
superset/models/dashboard.py 75.50% <75.00%> (+0.12%) ⬆️
superset/viz.py 59.40% <88.88%> (+0.07%) ⬆️
superset/__init__.py 100.00% <100.00%> (ø)
superset/common/query_context.py 83.23% <100.00%> (-4.72%) ⬇️
superset/config.py 90.07% <100.00%> (-0.04%) ⬇️
superset/db_engine_specs/hive.py 84.04% <100.00%> (+0.06%) ⬆️
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68693c7...226fbe9. Read the comment docs.

Comment thread superset/common/query_context.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import the cache_manager instead of cache to make it clear which cache we are using.

Comment thread superset/common/query_context.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and cache is not needed because of #9020

Comment thread superset/models/core.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small refactor to make the memoize function more flexible.

Comment thread superset/utils/cache.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved over from superset.utils.decorators

Comment thread superset/utils/cache_manager.py Outdated
Copy link
Copy Markdown
Member Author

@ktmud ktmud Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't add backward compatibility because the data objects put into this cache will become much much larger. If a Superset admin has configured in-memory cache for TABLE_NAMES_CACHE_CONFIG, they might run into problems. Therefore it's better to make things break to make them aware of this change.

For most users, the migration is as simple as renaming the config value.

@mistercrunch mistercrunch added the risk:breaking-change Issues or PRs that will introduce breaking changes label Oct 31, 2020
@bkyryliuk
Copy link
Copy Markdown
Member

I will be on PTO next week, won't have a change to give a full review. 1 suggestion it would be nice to add a unit test that will document the expected behavior and will prevent from the regressions, tests can impersonate the user to hit various endpoints.

@ktmud
Copy link
Copy Markdown
Member Author

ktmud commented Oct 31, 2020

I will be on PTO next week, won't have a change to give a full review. 1 suggestion it would be nice to add a unit test that will document the expected behavior and will prevent from the regressions, tests can impersonate the user to hit various endpoints.

I added two test assertions in cache_tests.py. Do you think we need more?

Comment thread UPDATING.md Outdated
Comment thread superset/config.py Outdated
Comment thread superset/config.py Outdated
Comment thread superset/utils/cache.py Outdated
Comment thread superset/utils/cache.py Outdated
@ktmud ktmud force-pushed the data-cache branch 2 times, most recently from 45ce586 to c0eb415 Compare November 1, 2020 23:29
Comment thread UPDATING.md Outdated
Comment thread superset/models/dashboard.py Outdated
Comment thread superset/utils/cache_manager.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the global default is applied as documented.

@ktmud ktmud force-pushed the data-cache branch 4 times, most recently from 93729d6 to 9543b47 Compare November 9, 2020 05:46
@ktmud ktmud force-pushed the data-cache branch 2 times, most recently from c2a46af to ca25463 Compare November 10, 2020 20:34
@ktmud
Copy link
Copy Markdown
Member Author

ktmud commented Nov 10, 2020

@john-bodley @etr2460 So previously the test tests/databases/api_tests.py::TestDatabaseApi::test_create_database_conn_fail was failing because I setup a cache backend for DATA_CACHE_CONFIG while in the old test config TABLE_NAMES_CACHE_CONFIG was set to cache_type: null.

Not sure why this cache would fail test_create_database_conn_fail though. Maybe @dpgaspar have some idea?

@ktmud
Copy link
Copy Markdown
Member Author

ktmud commented Nov 10, 2020

Here's a test run that fails when the only change comparing to master is to enable TABLE_NAMES_CACHE_CONFIG.

Comment thread tests/databases/api_tests.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count = 2 seems to assume some tests run before this test. If you just initialized the database and run this test alone, this test will fail.

Copy link
Copy Markdown

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please resolve conflicts. otherwise LGTM.

Copy link
Copy Markdown
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, however I didn't quite see how the "timeout search path" happens, is that something happening outside this PR in local configs or implicitly via Flask-Caching?

Comment thread superset/utils/cache.py Outdated
Comment on lines 49 to 72
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@ktmud
Copy link
Copy Markdown
Member Author

ktmud commented Nov 12, 2020

LGTM, however I didn't quite see how the "timeout search path" happens, is that something happening outside this PR in local configs or implicitly via Flask-Caching?

I think it is referring to this logic here: https://github.com/apache/incubator-superset/blob/600a6fa92a0bbe5bfd93371db5ced6af556c3697/superset/viz.py#L423-L433

@villebro
Copy link
Copy Markdown
Member

LGTM, however I didn't quite see how the "timeout search path" happens, is that something happening outside this PR in local configs or implicitly via Flask-Caching?

I think it is referring to this logic here:

https://github.com/apache/incubator-superset/blob/600a6fa92a0bbe5bfd93371db5ced6af556c3697/superset/viz.py#L423-L433

Ah sorry, I misinterpreted the comment, never mind.

Copy link
Copy Markdown
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good layer of optional additional security, thanks for implementing.

Copy link
Copy Markdown
Member

@bkyryliuk bkyryliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

Comment thread superset/utils/cache.py Outdated
@ktmud ktmud force-pushed the data-cache branch 2 times, most recently from 0811d14 to 60283e2 Compare November 14, 2020 05:48
@ktmud ktmud merged commit 4cfcaeb into apache:master Nov 14, 2020
@ktmud ktmud deleted the data-cache branch November 14, 2020 06:35
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
)

* feat: rename TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG

The corresponding cache will now also cache the query results.

* Slice use DATA_CACHE_CONFIG CACHE_DEFAULT_TIMEOUT

* Add test for default cache timeout

* rename FAR_FUTURE to ONE_YEAR_IN_SECS
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 First shipped in 1.0.0 labels Mar 12, 2024
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
)

* feat: rename TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG

The corresponding cache will now also cache the query results.

* Slice use DATA_CACHE_CONFIG CACHE_DEFAULT_TIMEOUT

* Add test for default cache timeout

* rename FAR_FUTURE to ONE_YEAR_IN_SECS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:breaking-change Issues or PRs that will introduce breaking changes size/XL 🚢 1.0.0 First shipped in 1.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants