Skip to content

Schema Viewer Drawer#3291

Merged
jezdez merged 21 commits intogetredash:masterfrom
emtwo:emtwo/schema1_info_drawer
Mar 13, 2019
Merged

Schema Viewer Drawer#3291
jezdez merged 21 commits intogetredash:masterfrom
emtwo:emtwo/schema1_info_drawer

Conversation

@emtwo
Copy link
Copy Markdown

@emtwo emtwo commented Jan 16, 2019

This is a fresh PR with the code from #2990 rebased and linted.

It is ready for review now. This PR is the first of a series of PRs for schema enhancements. I will link the subsequent PRs here as they become available.

[1] Schema viewer drawer #3291 (this one)
[2] Schema admin configuration #3292
[3] Schema query samples #3293
[4] Data source descriptions #3401

@emtwo
Copy link
Copy Markdown
Author

emtwo commented Jan 16, 2019

Note: the way that schema updates work now is through a periodic celery task that runs the queries to get column names and types etc. The results are stored in the new schema tables. Whenever the schema is fetched from the UI, it just directly queries the data in these tables.

Since the schema is set to refresh only every 30 min (https://github.com/getredash/redash/blob/master/redash/settings/__init__.py#L48), this is likely why the percy/redash visual error shows up.

We can either increase the frequency of schema update (quicker option, but not as good) or have a one-off schema refresh that is done on init so that the schema is available. I'll look into the latter.

@emtwo emtwo requested a review from arikfr January 16, 2019 17:46
@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch from a47575c to ab13344 Compare January 17, 2019 20:30
Copy link
Copy Markdown
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created db-seed.js with this purpose 🤔, so this kind of dependency could be created by using npm run cypress db-seed prior to all tests and this would be avoided among then:

// create_query_spec.js - a few upper lines that were not shown
  const pg = {
      name: 'test',
      options: {
        dbname: 'postgres',
        host: 'postgres',
        password: 'postgres',
        user: 'postgres',
      },
      type: 'pg',
    };

LMK what you think haha

PS: if you are just testing, ignore this 😅

Copy link
Copy Markdown
Author

@emtwo emtwo Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieldutra I was in fact, just testing. Though I could use some help. I cannot reproduce this percy issue locally that shows up here. In fact, when I run the create_query_spec.js test on master locally, the DOM snapshots seem to be missing the shema data (included screenshot below) And on the other hand, the snapshot for this PR seems to show the schema locally (screenshot also included below).

Any idea what might be going on here or how I can reproduce this?

Screenshot from master
screen shot 2019-01-21 at 5 18 28 pm

Screenshot from this PR:
screen shot 2019-01-21 at 5 25 07 pm

Copy link
Copy Markdown
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container. I'll handle this further, but a quick fix to make it respond properly is to, after start cypress server just like you did, run npm run start for webpack development server and open cypress with CYPRESS_baseUrl=http://localhost:8080 npm run cypress open

Copy link
Copy Markdown
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't know if it's related, but I noticed the Chinook data source is not showing schema info in the preview.

I'll try to reproduce this locally and give you some help with Percy anyway

Edit: the Chinook issue is probably related to the missing schema queue in one of the files (docker-compose.production.yml perhaps)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container.

Does the Cypress Docker Compose configuration use VOLUMEs?

Copy link
Copy Markdown
Member

@gabrieldutra gabrieldutra Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cypress is using docker-compose.cypress.yml when in CI and the development docker-compose.yml when not.
Edit: Forgot to mention about the volumes haha, but the first one doesn't use and the second one does.

However it uses http://localhost:5000, which I guess doesn't use webpack to watch files, so frontend in this case only updates after a rebuild. The two options I see to make it friendlier to the developer would be either adding a npm run start to a frontend container in docker-compose.yml or adding this outside docker in cypress scripts.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add to the instructions to run npm run build before running Cypress tests. Running npm in the container is not possible, because the container will not have Node (currently it does, but it's a temporary thing).

Comment thread docker-compose.yml Outdated
Copy link
Copy Markdown
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it haha, there are other docker-compose.yml files inside .circleci, just add this to them and Percy should do fine 🚀

Edit: Only docker-compose.cypress.yml affects the Percy screenshots
Edit2: There are probably other files where this may be necessary:
queues

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aweosme! Updating docker-compose.cypress.yml did the trick! I didn't realize cypress had its own yml file. Thank you for your help @gabrieldutra!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome @emtwo! 🙂

@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch 3 times, most recently from e6be093 to 3c4e8c8 Compare January 22, 2019 15:30
@emtwo
Copy link
Copy Markdown
Author

emtwo commented Jan 22, 2019

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

@gabrieldutra
Copy link
Copy Markdown
Member

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

Don't forget to add the schema queue in the other files (such as docker-compose.production.yml) as this could cause some bugs in the future 😁

@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch from 3c4e8c8 to 95d3ff6 Compare January 22, 2019 17:45
@emtwo
Copy link
Copy Markdown
Author

emtwo commented Jan 22, 2019

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

Don't forget to add the schema queue in the other files (such as docker-compose.production.yml) as this could cause some bugs in the future 😁

I've added the schemas queue in a couple of other spots as you suggested. However, I was hesitant at first to add it since the schemas queue was used in the code prior to this PR but it did not appear in any of the docker files. So without knowing the use cases of all the docker files, it's hard to tell if it's required in them. Though, I'm sure it doesn't hurt to add them. Perhaps @arikfr might have more insight into this.

@arikfr
Copy link
Copy Markdown
Member

arikfr commented Jan 23, 2019

I will do a review of all the Docker Compose files and add schemas where needed.

I do realize now that everyone who are using the AMIs we build, use a Docker Compose setup without this queue. Which means that: 1) this queue is growing in size, but nothing is processing it; 2) they don't get schema refreshes. 🤦‍♂️

@arikfr
Copy link
Copy Markdown
Member

arikfr commented Jan 23, 2019

Change of plans: #3325.

Comment thread redash/models/__init__.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is removing this redis caching of schema information intentional? Is there a performance impact?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @washort!

It was intentional because from what I recall back in the Berlin work-week, I think @arikfr was saying he felt that using redis to store schema was a bit of a hack and he would prefer it stored in a table. Of course, we could be storing the data in tables and have additional caching for performance, but I felt this added complexity of maintaining both a cache and tables for the same data was perhaps not worth the performance gain.

I did a quick test on my machine and with 5 runs of the old vs. the new get_schema() function, the redis one averages 7.2ms per call and this one (from this pr) averages 44ms per call. It's a big relative difference, but 44ms isn't so bad. Though of course this could be worse in different scenarios, e.g. slower network/machine or more data. I suppose I will defer this decision to @arikfr

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, just curious.

Comment thread redash/models/__init__.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll need a migration to create these tables

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I had missed this, thank you!

@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch from 95d3ff6 to 2ac1607 Compare January 24, 2019 19:48
@ghost ghost added the in progress label Jan 24, 2019
@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch from 2ac1607 to 0c1813c Compare January 30, 2019 19:47
Comment thread redash/query_runner/presto.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar blocks of code found in 2 locations. Consider refactoring.

Comment thread redash/query_runner/athena.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar blocks of code found in 2 locations. Consider refactoring.

Comment thread redash/tasks/queries.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing whitespace around operator

Comment thread redash/models/__init__.py Outdated
@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch from 0c1813c to f072e64 Compare February 5, 2019 19:44
@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch from f072e64 to 9a950cc Compare February 6, 2019 19:02
@arikfr
Copy link
Copy Markdown
Member

arikfr commented Mar 14, 2019

Looks like there is no re-open button? I guess a new PR will be needed.

Btw, the migrations has a bug. The foreign key references of org_id needs to reference organizations.id and not organizations.id.id.

@jezdez
Copy link
Copy Markdown
Contributor

jezdez commented Mar 14, 2019

Yeah, @emtwo would you mind opening a new PR please?

@emtwo emtwo mentioned this pull request Mar 18, 2019
1 task
washort pushed a commit to mozilla/redash that referenced this pull request Mar 26, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE
jezdez pushed a commit to mozilla/redash that referenced this pull request May 13, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.
@jezdez jezdez mentioned this pull request May 13, 2019
2 tasks
jezdez pushed a commit to mozilla/redash that referenced this pull request May 16, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>
washort pushed a commit to mozilla/redash that referenced this pull request Jun 10, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>
washort pushed a commit to washort/redash that referenced this pull request Jun 12, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes getredash#927, getredash#928: Schema refresh improvements.

* Closes getredash#934, getredash#935: Remove type from schema browser and don't show empty example column in schema drawer (getredash#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes getredash#939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>
jezdez pushed a commit to mozilla/redash that referenced this pull request Jun 13, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>
washort pushed a commit to mozilla/redash that referenced this pull request Jun 27, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>
This was referenced Jun 27, 2019
washort pushed a commit to mozilla/redash that referenced this pull request Jun 28, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
emtwo pushed a commit to mozilla/redash that referenced this pull request Jul 15, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
emtwo pushed a commit to mozilla/redash that referenced this pull request Jul 17, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants