Skip to content

populate_metadata.py: add batches to write_to_omero#4754

Merged
joshmoore merged 10 commits intoome:metadata52from
joshmoore:populate-batches
Jul 26, 2016
Merged

populate_metadata.py: add batches to write_to_omero#4754
joshmoore merged 10 commits intoome:metadata52from
joshmoore:populate-batches

Conversation

@joshmoore
Copy link
Copy Markdown
Member

@joshmoore joshmoore commented Jul 21, 2016

What this PR does

The new QueryContext implementations in populate_metadata.py were not batching in their write_to_omero methods. For very large screens (e.g. idr0016):

  • BulkToMapAnnotationContext tends to hit MESSAGESIZEMAX limits and
  • DeleteMapAnnotationContext hits Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 109734

Now both are done in batches of 1000.

Note: projections are also causing issues so batching is being added to invocations of projection as well.

Testing this PR

  1. A bulk annotation YML configuration is required. This is minimal possible via test/integration/metadata/test_populate.py.
  2. There should be no change in functionality and no overt slowdown in creation or deletion.

Related reading

cc: @eleanorwilliams

For extremely large screens (idr0016), both adding
map annotations as well as deleting them lead to either
PG errors or Ice.MessageSizeMax exceptions. Now both
are done in batches of 1000.
populate.add_argument("--batch",
type=long,
default=1000,
help="Number of objects to save at once")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about save at once -> process at once since it's used for deletes?


@mark.parametrize("fixture", METADATA_FIXTURES, ids=METADATA_IDS)
def testPopulateMetadata(self, fixture):
@mark.parametrize("batch_size", (None, 10, 1000))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFE for sometime in the future: From past experience writing this sort of thing in omero-features it's useful to have your input test data be at least 2*batch size but not exactly divisible.

@joshmoore
Copy link
Copy Markdown
Member Author

Note: projections are also causing issues so batching is being added to invocations of projection as well.

The list of map annotations and file annotations
have duplicates which cause secondary deletes to
fail.
@eleanorwilliams
Copy link
Copy Markdown

Tested using OMERO.server-5.2.3-170-db884ba-ice35-b45 which includes this PR, using idr0005/screenA and idr0009/screenA which had both previously failed in map annotation deletion. In both cases map annotations can now be deleted and created successfully in batches of 1000.

Also idr0009/screenA has the annotation.csv file gzipped and it was successfully unzipped and loaded.

@joshmoore
Copy link
Copy Markdown
Member Author

Thanks, @eleanorwilliams . Merging this has happy before it becomes a mega-PR.

@joshmoore joshmoore merged commit a51ab18 into ome:metadata52 Jul 26, 2016
@joshmoore joshmoore deleted the populate-batches branch July 26, 2016 12:02
@joshmoore joshmoore mentioned this pull request Dec 8, 2016
@atarkowska
Copy link
Copy Markdown
Member

--rebased-to #5220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants