Skip to content

Show image with pagination#4511

Merged
joshmoore merged 9 commits intoome:developfrom
will-moore:show_image_with_pagination
Apr 6, 2016
Merged

Show image with pagination#4511
joshmoore merged 9 commits intoome:developfrom
will-moore:show_image_with_pagination

Conversation

@will-moore
Copy link
Copy Markdown
Member

This fixes the navigation to an image with the url webclient/?show=image-<id> when that image is not on the first page of a Dataset.

To test:

  • find a Dataset with pagination (>200 images)
  • browse to the second or other page, click an image and copy the link to it
  • paste the url in browser and check that this browses to show the image on the correct page
  • Bonus: The same should work for Orphaned images (if you have > 200 orphaned images)

Also added web integration test which should pass!

@gusferguson
Copy link
Copy Markdown

@will-moore

Tested with https://cowfish.openmicroscopy.org/merge/webclient/ Safari, Firfox and Chrome - when logged in as user and when not logged in, and when logged in as another user.

All behave as expected.
Good to merge.

p = omero.sys.ParametersI()
so = deepcopy(conn.SERVICE_OPTS)
so.setOmeroGroup(groupId)
if (datasetId is not None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Particular reason for using parentheses here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much JavaScript programming! I'll fix...

# Selecting a 'well' is really for selecting well_sample paths
# if a well is specified on its own, we return all the well_sample paths
# than match
def get_image_ids(datasetId=None, groupId=-1, ownerId=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a top level function, be documented and be integration tested individually.

@chris-allan
Copy link
Copy Markdown
Member

Functionally, I really like this as it is essential for larger datasets. Especially with the default page size of 200. We are regularly working with datasets which contain 1000+ images, with the 2x or 3x multiplier for most digital pathology use cases. What would be fabulous to know are the performance implications of the various additional queries. Maybe before and after timings on 100, 1000, 5000 and 10000 images?

@sbesson sbesson added the develop label Mar 7, 2016
@will-moore
Copy link
Copy Markdown
Member Author

Some performance metrics (on my local laptop):
Without this PR, 10k images takes approx 0.007 secs for both a dataset and orphaned image. Haven't tried this with fewer images since it's probably not affected by image count.
With this PR (timings are approx averages over a handful of repeats):

  • 10k images in a dataset: 0.22 secs
  • 10k images orphaned: 0.29 secs
  • 5k images in a dataset: 0.12 secs
  • 5k images orphaned: 0.15 secs
  • 1k images dataset: 0.037 secs
  • 1k images orphaned: 0.053 secs

@chris-allan
Copy link
Copy Markdown
Member

Great information to have! So only non-linear with respect to image count when < 5000. Probably just the floor of executing the query in the first place. Can you post up your test case in a Gist? Probably would be good to also test if it gets worse with the total number of images in the system/group or just the total number of images in the dataset or are orphaned.

I guess we have to ask ourselves whether we're willing to accept a 100ms+ overhead per 5000 images to properly select the page correctly and if we can optimise this away somehow.

@will-moore
Copy link
Copy Markdown
Member Author

My testing workflow for above is described at https://gist.github.com/will-moore/33dc5d021a428b161bae
I had about another 5000 images in the same group when doing the tests above. Do you think it's possible that objects we don't retrieve in the query would have a significant effect on performance? How many total images do we want in the group to know if this is the case?

I think it's clearly better to accept a 100ms delay per 5000 images than not to be able to find the image at-all. Obviously would be nice to optimise this but I can't think of a way to do this with iQuery?

@joshmoore
Copy link
Copy Markdown
Member

NB: I tried a few things to speed up https://github.com/openmicroscopy/openmicroscopy/pull/4511/files#diff-9704a8389cb2e3a7dd3f753858c9a1baR432 but didn't find anything faster. If there's something specific I should look at, let me know.

@will-moore
Copy link
Copy Markdown
Member Author

Thanks @joshmoore. @chris-allan anything else to do for this PR?

@chris-allan
Copy link
Copy Markdown
Member

Yes, I would expect the total number of objects to have an affect on the performance. You're performing two subqueries in an inclusion scenario. You can play with the queries themselves at the PostgreSQL level using EXPLAIN if you want to have an idea of the indexes, etc. being hit and the potential performance impact on databases with many images. Would be interesting to see how such a query performs on the IDR database. It could also be that your sort is what's actually incurring most of the overhead.

Agreed that it's better for this functionality to work and accept the 100ms+ delay. It would however be prudent to be mindful of that delay in the context of databases with large numbers of images.

@chris-allan
Copy link
Copy Markdown
Member

While I do think further performance investigation of a form similar to the above should be done the PR probably should be merged as is. Information on performance impacts can be added to this PR retroactively.

@snoopycrimecop snoopycrimecop mentioned this pull request Mar 22, 2016
@will-moore
Copy link
Copy Markdown
Member Author

@joshmoore Any way I can connect to an IDR server from my machine, or something else with similar image counts? See #4511 (comment). Non-production server would be best since we don't know the impact of these queries.

@joshmoore
Copy link
Copy Markdown
Member

@will-moore : you should be able to make use of https://trello.com/c/rVVBCZ4r/245-sysgro-copy-for-cambridge -- please report any OOMs or similar on that card.

@will-moore
Copy link
Copy Markdown
Member Author

Demo user on orca-3, there are 139597 orphaned images.
The paths_to_object call timing was:

  • Orphaned image: with this PR: 4.08 seconds, without this PR: 0.11 seconds.
  • Image in Dataset of 600 images: with this PR: 0.047 seconds, without this PR 0.011 seconds.

I didn't try to run any scripts to move image into large Dataset.

@will-moore
Copy link
Copy Markdown
Member Author

@joshmoore Since @chris-allan is happy that this PR probably should be merged as is, could you go ahead and merge this, so I can fix conflicts with #4534. Thanks.

@chris-allan
Copy link
Copy Markdown
Member

Given that sort of image count that level of performance degradation seem reasonable to me.

@joshmoore joshmoore merged commit 27e5da2 into ome:develop Apr 6, 2016
@jburel jburel added this to the 5.2.3 milestone May 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants