Schema Viewer Drawer#3291
Conversation
|
Note: the way that schema updates work now is through a periodic celery task that runs the queries to get column names and types etc. The results are stored in the new schema tables. Whenever the schema is fetched from the UI, it just directly queries the data in these tables. Since the schema is set to refresh only every 30 min (https://github.com/getredash/redash/blob/master/redash/settings/__init__.py#L48), this is likely why the We can either increase the frequency of schema update (quicker option, but not as good) or have a one-off schema refresh that is done on init so that the schema is available. I'll look into the latter. |
a47575c to
ab13344
Compare
There was a problem hiding this comment.
I've created db-seed.js with this purpose 🤔, so this kind of dependency could be created by using npm run cypress db-seed prior to all tests and this would be avoided among then:
// create_query_spec.js - a few upper lines that were not shown
const pg = {
name: 'test',
options: {
dbname: 'postgres',
host: 'postgres',
password: 'postgres',
user: 'postgres',
},
type: 'pg',
};LMK what you think haha
PS: if you are just testing, ignore this 😅
There was a problem hiding this comment.
@gabrieldutra I was in fact, just testing. Though I could use some help. I cannot reproduce this percy issue locally that shows up here. In fact, when I run the create_query_spec.js test on master locally, the DOM snapshots seem to be missing the shema data (included screenshot below) And on the other hand, the snapshot for this PR seems to show the schema locally (screenshot also included below).
Any idea what might be going on here or how I can reproduce this?
There was a problem hiding this comment.
I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container. I'll handle this further, but a quick fix to make it respond properly is to, after start cypress server just like you did, run npm run start for webpack development server and open cypress with CYPRESS_baseUrl=http://localhost:8080 npm run cypress open
There was a problem hiding this comment.
Also, I don't know if it's related, but I noticed the Chinook data source is not showing schema info in the preview.
I'll try to reproduce this locally and give you some help with Percy anyway
Edit: the Chinook issue is probably related to the missing schema queue in one of the files (docker-compose.production.yml perhaps)
There was a problem hiding this comment.
I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container.
Does the Cypress Docker Compose configuration use VOLUMEs?
There was a problem hiding this comment.
Cypress is using docker-compose.cypress.yml when in CI and the development docker-compose.yml when not.
Edit: Forgot to mention about the volumes haha, but the first one doesn't use and the second one does.
However it uses http://localhost:5000, which I guess doesn't use webpack to watch files, so frontend in this case only updates after a rebuild. The two options I see to make it friendlier to the developer would be either adding a npm run start to a frontend container in docker-compose.yml or adding this outside docker in cypress scripts.
There was a problem hiding this comment.
I think we should add to the instructions to run npm run build before running Cypress tests. Running npm in the container is not possible, because the container will not have Node (currently it does, but it's a temporary thing).
There was a problem hiding this comment.
Aweosme! Updating docker-compose.cypress.yml did the trick! I didn't realize cypress had its own yml file. Thank you for your help @gabrieldutra!
e6be093 to
3c4e8c8
Compare
|
I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - |
Don't forget to add the |
3c4e8c8 to
95d3ff6
Compare
I've added the schemas queue in a couple of other spots as you suggested. However, I was hesitant at first to add it since the |
|
I will do a review of all the Docker Compose files and add I do realize now that everyone who are using the AMIs we build, use a Docker Compose setup without this queue. Which means that: 1) this queue is growing in size, but nothing is processing it; 2) they don't get schema refreshes. 🤦♂️ |
|
Change of plans: #3325. |
There was a problem hiding this comment.
Is removing this redis caching of schema information intentional? Is there a performance impact?
There was a problem hiding this comment.
Thanks for pointing this out @washort!
It was intentional because from what I recall back in the Berlin work-week, I think @arikfr was saying he felt that using redis to store schema was a bit of a hack and he would prefer it stored in a table. Of course, we could be storing the data in tables and have additional caching for performance, but I felt this added complexity of maintaining both a cache and tables for the same data was perhaps not worth the performance gain.
I did a quick test on my machine and with 5 runs of the old vs. the new get_schema() function, the redis one averages 7.2ms per call and this one (from this pr) averages 44ms per call. It's a big relative difference, but 44ms isn't so bad. Though of course this could be worse in different scenarios, e.g. slower network/machine or more data. I suppose I will defer this decision to @arikfr
There was a problem hiding this comment.
you'll need a migration to create these tables
There was a problem hiding this comment.
Ah! I had missed this, thank you!
95d3ff6 to
2ac1607
Compare
2ac1607 to
0c1813c
Compare
There was a problem hiding this comment.
Similar blocks of code found in 2 locations. Consider refactoring.
There was a problem hiding this comment.
Similar blocks of code found in 2 locations. Consider refactoring.
There was a problem hiding this comment.
Missing whitespace around operator
0c1813c to
f072e64
Compare
f072e64 to
9a950cc
Compare
|
Looks like there is no re-open button? I guess a new PR will be needed. Btw, the migrations has a bug. The foreign key references of |
|
Yeah, @emtwo would you mind opening a new PR please? |
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes getredash#927, getredash#928: Schema refresh improvements. * Closes getredash#934, getredash#935: Remove type from schema browser and don't show empty example column in schema drawer (getredash#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes getredash#939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.



This is a fresh PR with the code from #2990 rebased and linted.
It is ready for review now. This PR is the first of a series of PRs for schema enhancements. I will link the subsequent PRs here as they become available.
[1] Schema viewer drawer #3291 (this one)
[2] Schema admin configuration #3292
[3] Schema query samples #3293
[4] Data source descriptions #3401