Skip to content

Closes #927, #928: Schema refresh improvements.#930

Merged
emtwo merged 1 commit intomasterfrom
emtwo/fix_schema_bugs
Apr 3, 2019
Merged

Closes #927, #928: Schema refresh improvements.#930
emtwo merged 1 commit intomasterfrom
emtwo/fix_schema_bugs

Conversation

@emtwo
Copy link
Copy Markdown

@emtwo emtwo commented Apr 3, 2019

This PR makes a variety of changes to fix bugs and more gracefully handle schema processing for data sources with many tables. A summary:

  • Batch insert and update of column and table metadata
  • Time limit on schema refresh increased to a soft limit of 5 min and hard limit of 10
  • Column types are truncated to fit 255 character limit
  • Fixed bug with str() function on TableMetadata accessing a non-existent variable
  • Refresh schema every 12 hours instead of 30 min
  • Check whether we should collect data samples before we spin off tasks to do so
  • When a user requests a refresh, do it asynchronously
  • Don’t force a schema refresh when there is no schema data to display (this clogs up the queue), just wait for the next refresh
  • Changes to data sources (in data_sources.py) should trigger a refresh for the changed data source only (instead of all data sources)

@emtwo emtwo requested a review from jezdez April 3, 2019 16:35
Copy link
Copy Markdown

@washort washort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement, r+

Comment thread redash/settings/__init__.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the faster schema refresh do we still want this longer interval?

Copy link
Copy Markdown
Author

@emtwo emtwo Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary at the moment, but I'm being cautious because the column examples are still a bit slow since they do select * from table limit 1 for each table. We can always turn it back up if it seems too slow I think.

Comment thread redash/tasks/queries.py Outdated
@emtwo emtwo force-pushed the emtwo/fix_schema_bugs branch from 527eddc to 01ef94d Compare April 3, 2019 22:50
@emtwo emtwo force-pushed the emtwo/fix_schema_bugs branch from 01ef94d to f7c12a2 Compare April 3, 2019 22:52
@emtwo emtwo merged commit 1662f47 into master Apr 3, 2019
@emtwo emtwo deleted the emtwo/fix_schema_bugs branch April 3, 2019 23:05
emtwo pushed a commit that referenced this pull request Apr 4, 2019
emtwo pushed a commit that referenced this pull request Apr 4, 2019
emtwo pushed a commit that referenced this pull request Apr 4, 2019
@jezdez jezdez mentioned this pull request May 13, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants