Optimized tsvectors insertion 🚀 #3892
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Me and @bjester recently co-hacked on slow performance of
make set-tsvectorscommand on hotfixes. Blaine sir came up with a CTE that reduced the execution time exponentially! We analyzed the CTE and realized that the only concern we had was that thepublishedboolean field being non-indexed can dramatically slow down our query. Even indexingpublishedshouldn't make any difference because database will need to traverse the half of the leaf nodes any way.So, this PR optimizes the performance of
make set-tsvectorsby first bringing in all the publishedchannel_ids andtree_ids to python memory. Then our query utilizes these values to query for tsvectors.tree_idbeing indexed seems a good choice for querying.Our optimization should bring the estimated completion time of
make set-tsvectorsfrom 8 months to 8 hours.Query plan: https://explain.depesz.com/s/jl2m#html
Manual verification steps performed
Reviewer guidance
How can a reviewer test these changes?
Performing manual verification steps and checking on query plan.
References
Closes #3846.
Contributor's Checklist
PR process:
CHANGELOGlabel been added to this PR. Note: items with this label will be added to the CHANGELOG at a later timedocslabel has been added if this introduces a change that needs to be updated in the user docs?requirements.txtfiles also included in this PRStudio-specifc:
notranslateclass been added to elements that shouldn't be translated by Google Chrome's automatic translation feature (e.g. icons, user-generated text)pages,components, andlayoutsdirectories as described in the docsTesting:
Reviewer's Checklist
This section is for reviewers to fill out.
yarnandpip)