Do not re-run components unnecessarily#469
Closed
ANogin wants to merge 15 commits into
Closed
Conversation
added 9 commits
May 31, 2024 23:35
One way to look at the issue is that the task cache was emptied too eagerly. This changes it so that the task cache is only emptied when the job service is idle.
Without this fix the custom config could end up being ignored - this bug existed before, but the earlier changes here made it way easier to hit it.
Rather than manually querying the resource for whether the component was run (which would not return a hit when the component is still running), hook directly into Resource.remove_component to remove stale cache entries
This is needed because image_seq changes
added 4 commits
June 1, 2024 08:26
With my new changes, AbstractComponent._log_component_has_run_warning is never beeng hit in any other tests :)
Contributor
Author
|
I did a timing test of a gather of 2x ( where I used the approach from https://gist.github.com/vxgmichel/620eb3a02d97d3da9dacdc508a5d5321 to break the non-CPU time into “select” vs “blocked IO”. |
Contributor
Author
Done
No, it's ~35-40% now. |
Contributor
Author
|
@whyitfor convinced me this is not the way to go. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
One sentence summary of this PR (This should go in the CHANGELOG!)
Avoid rerunning components unnecessarily; eliminate an already unlikely possibility that a component that should be rerun (because of difference in configuration) will not be rerun.
Link to Related Issue(s)
N/A
Please describe the changes in your request.
Previously, when executing a set of components (e.g. identifiers) on a resource, the job service will check whether the tags that the previous run has added means that more components should be executed, and then will run the full set again. This changes it so that only the additional components are executed, but those that were already executed once are not executed again. However if a component is run with a non-null config, it is always run, without regard for caching.
I observed speedups of up to 10% or more on some messy identify/unpack code.
Anyone you think should look at this, specifically?
@whyitfor