Merge populate_metadata PRs#5232
Merged
joshmoore merged 65 commits intoome:metadata53from Apr 7, 2017
atarkowska:merge_populate
Merged
Merge populate_metadata PRs#5232joshmoore merged 65 commits intoome:metadata53from atarkowska:merge_populate
joshmoore merged 65 commits intoome:metadata53from
atarkowska:merge_populate
Conversation
Most methods for dataset loading and parsing were left unimplement. Now a `Dataset:`-style object can be passed to populate_metadata.py and images will be looked up by name. Note: there's a small bug with name lookup that will be corrected separately.
The assumptions for well/imaging naming in a plate or screen differ from those from image naming in a dataset since there's no unique way to reference an image in a dataset like there is well "A1" for example. This commit loosens some of those rules to allow image columns and image name columns to work together in the case of datasets. The assumption is that for population the ID of the image in a dataset won't be known. Instead names of images will be used as a unique identifier. Currently only a warning is issued if the name is not unique.
In general, populate_metadata.py looks to be in line for a refactoring. The number of if-clauses as well as the unhandled cases (like no catch-all for unknown targets in delete) is making this ever harder to work with. All tests passing.
In order to allow Projects to smartly handle multiple images with the same name (though not in the same dataset), the internals of ValueResolver have been hidden within a ValueWrapper class. ValueResolver chooses once which ValueWrapper to use internally after which the various if/then blocks based on target object are no longer necessary (needs further refactoring). There *are* still if/then blocks basked on column-type. These could use some cleaning but will likely remain to be necessary for multiple-dispatch style handling.
For extremely large screens (idr0016), both adding map annotations as well as deleting them lead to either PG errors or Ice.MessageSizeMax exceptions. Now both are done in batches of 1000.
This should be enabled once deletion is fixed
It just seems to add unnecessary complexity Also add ns and id to CanonicalMapAnnotation.__str__
When a well is missing from a plate, a warning is printed. The same now happens when an image is missing from a dataset. Likely, a `--strict` argument should be added which will force the existence of all objects.
This makes it easier to deal with URLs which require escaping.
It makes more sense to test small batch sizes since the integration tests use very few annotations. The alternative is to keep batch size 1000, but this unnecessarily increases the time to run the tests.
BulkToMapAnnotation for large screens (+100K wells) was being killed by the OOMKiller. Strategies include: * unload linked objects * drop `andReturn` where possible * return only sizes when possible * write actively in loops
Storing all of the plate data for idr0016 led to death-by-OOM-killer during *initialization*. Now, Ice objects are not being stored but wrapper very thin Data objects.
Member
Author
|
@joshmoore @manics this PR could be potentially rebased to develop, to make mapr compatible |
Member
|
Merging so that @aleksandra-tarkowska can continue the rebasing. The sum of this PR and those that follow will need to be reviewed for potential inclusion in 5.3.x. |
3 tasks
Member
Author
|
--rebased-to #5241 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
merge #5218 #5220 #5222 #5223 #5224 #5226 #5227 #5229 (no conflicts)
TODO:
#5074 depends on #5215