Show image with pagination by will-moore · Pull Request #4511 · ome/openmicroscopy

will-moore · 2016-03-03T23:47:47Z

This fixes the navigation to an image with the url webclient/?show=image-<id> when that image is not on the first page of a Dataset.

To test:

find a Dataset with pagination (>200 images)
browse to the second or other page, click an image and copy the link to it
paste the url in browser and check that this browses to show the image on the correct page
Bonus: The same should work for Orphaned images (if you have > 200 orphaned images)

Also added web integration test which should pass!

…ests

gusferguson · 2016-03-07T08:10:26Z

@will-moore

Tested with https://cowfish.openmicroscopy.org/merge/webclient/ Safari, Firfox and Chrome - when logged in as user and when not logged in, and when logged in as another user.

All behave as expected.
Good to merge.

chris-allan · 2016-03-07T09:56:11Z

+        p = omero.sys.ParametersI()
+        so = deepcopy(conn.SERVICE_OPTS)
+        so.setOmeroGroup(groupId)
+        if (datasetId is not None):


Particular reason for using parentheses here?

Too much JavaScript programming! I'll fix...

chris-allan · 2016-03-07T10:06:02Z

-    # Selecting a 'well' is really for selecting well_sample paths
-    # if a well is specified on its own, we return all the well_sample paths
-    # than match
+    def get_image_ids(datasetId=None, groupId=-1, ownerId=None):


This should be a top level function, be documented and be integration tested individually.

chris-allan · 2016-03-07T10:25:17Z

Functionally, I really like this as it is essential for larger datasets. Especially with the default page size of 200. We are regularly working with datasets which contain 1000+ images, with the 2x or 3x multiplier for most digital pathology use cases. What would be fabulous to know are the performance implications of the various additional queries. Maybe before and after timings on 100, 1000, 5000 and 10000 images?

will-moore · 2016-03-08T13:50:56Z

Some performance metrics (on my local laptop):
Without this PR, 10k images takes approx 0.007 secs for both a dataset and orphaned image. Haven't tried this with fewer images since it's probably not affected by image count.
With this PR (timings are approx averages over a handful of repeats):

10k images in a dataset: 0.22 secs
10k images orphaned: 0.29 secs
5k images in a dataset: 0.12 secs
5k images orphaned: 0.15 secs
1k images dataset: 0.037 secs
1k images orphaned: 0.053 secs

chris-allan · 2016-03-08T14:24:21Z

Great information to have! So only non-linear with respect to image count when < 5000. Probably just the floor of executing the query in the first place. Can you post up your test case in a Gist? Probably would be good to also test if it gets worse with the total number of images in the system/group or just the total number of images in the dataset or are orphaned.

I guess we have to ask ourselves whether we're willing to accept a 100ms+ overhead per 5000 images to properly select the page correctly and if we can optimise this away somehow.

will-moore · 2016-03-10T14:40:22Z

My testing workflow for above is described at https://gist.github.com/will-moore/33dc5d021a428b161bae
I had about another 5000 images in the same group when doing the tests above. Do you think it's possible that objects we don't retrieve in the query would have a significant effect on performance? How many total images do we want in the group to know if this is the case?

I think it's clearly better to accept a 100ms delay per 5000 images than not to be able to find the image at-all. Obviously would be nice to optimise this but I can't think of a way to do this with iQuery?

joshmoore · 2016-03-10T15:39:37Z

NB: I tried a few things to speed up https://github.com/openmicroscopy/openmicroscopy/pull/4511/files#diff-9704a8389cb2e3a7dd3f753858c9a1baR432 but didn't find anything faster. If there's something specific I should look at, let me know.

will-moore · 2016-03-10T15:54:33Z

Thanks @joshmoore. @chris-allan anything else to do for this PR?

chris-allan · 2016-03-11T10:51:34Z

Yes, I would expect the total number of objects to have an affect on the performance. You're performing two subqueries in an inclusion scenario. You can play with the queries themselves at the PostgreSQL level using EXPLAIN if you want to have an idea of the indexes, etc. being hit and the potential performance impact on databases with many images. Would be interesting to see how such a query performs on the IDR database. It could also be that your sort is what's actually incurring most of the overhead.

Agreed that it's better for this functionality to work and accept the 100ms+ delay. It would however be prudent to be mindful of that delay in the context of databases with large numbers of images.

chris-allan · 2016-03-11T10:54:19Z

While I do think further performance investigation of a form similar to the above should be done the PR probably should be merged as is. Information on performance impacts can be added to this PR retroactively.

will-moore · 2016-03-24T12:50:11Z

@joshmoore Any way I can connect to an IDR server from my machine, or something else with similar image counts? See #4511 (comment). Non-production server would be best since we don't know the impact of these queries.

joshmoore · 2016-03-27T12:50:38Z

@will-moore : you should be able to make use of https://trello.com/c/rVVBCZ4r/245-sysgro-copy-for-cambridge -- please report any OOMs or similar on that card.

will-moore · 2016-04-04T09:38:20Z

Demo user on orca-3, there are 139597 orphaned images.
The paths_to_object call timing was:

Orphaned image: with this PR: 4.08 seconds, without this PR: 0.11 seconds.
Image in Dataset of 600 images: with this PR: 0.047 seconds, without this PR 0.011 seconds.

I didn't try to run any scripts to move image into large Dataset.

will-moore · 2016-04-04T11:22:59Z

@joshmoore Since @chris-allan is happy that this PR probably should be merged as is, could you go ahead and merge this, so I can fix conflicts with #4534. Thanks.

chris-allan · 2016-04-05T09:26:27Z

Given that sort of image count that level of performance degradation seem reasonable to me.

will-moore added 5 commits March 3, 2016 22:35

show.py paths_to_object() adds 'childPage' if dataset paginated

43887f4

test_show.py tests pagination of paths_to_object

6c2fca8

ome.tree.js uses pagination from paths_to_object

dfaba25

test_orphaned_pagination() not broken by orphaned images from other t…

d682355

…ests

flake8 fix

f72f9ca

chris-allan reviewed Mar 7, 2016
View reviewed changes

Remove parentheses from if () in show.py

948b7a9

chris-allan reviewed Mar 7, 2016
View reviewed changes

sbesson added the develop label Mar 7, 2016

will-moore added 3 commits March 7, 2016 23:47

Move get_image_ids() to top level in show.py

bc81ff8

Add tests for get_image_ids()

91e1ff2

Don't use settings.PAGE as param default in function def

d780eb7

snoopycrimecop mentioned this pull request Mar 22, 2016

Show tag tagset #4534

Merged

joshmoore merged commit 27e5da2 into ome:develop Apr 6, 2016

jburel added this to the 5.2.3 milestone May 4, 2016

Conversation

will-moore commented Mar 3, 2016

Uh oh!

gusferguson commented Mar 7, 2016

Uh oh!

chris-allan Mar 7, 2016

Choose a reason for hiding this comment

Uh oh!

will-moore Mar 7, 2016

Choose a reason for hiding this comment

Uh oh!

chris-allan Mar 7, 2016

Choose a reason for hiding this comment

Uh oh!

chris-allan commented Mar 7, 2016

Uh oh!

will-moore commented Mar 8, 2016

Uh oh!

chris-allan commented Mar 8, 2016

Uh oh!

will-moore commented Mar 10, 2016

Uh oh!

joshmoore commented Mar 10, 2016

Uh oh!

will-moore commented Mar 10, 2016

Uh oh!

chris-allan commented Mar 11, 2016

Uh oh!

chris-allan commented Mar 11, 2016

Uh oh!

will-moore commented Mar 24, 2016

Uh oh!

joshmoore commented Mar 27, 2016

Uh oh!

will-moore commented Apr 4, 2016

Uh oh!

will-moore commented Apr 4, 2016

Uh oh!

chris-allan commented Apr 5, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants