Skip to content

improve graph operation performance (rebased onto metadata)#4380

Closed
joshmoore wants to merge 2 commits intoome:metadatafrom
joshmoore:rebased/metadata/adjust-graph-type-query
Closed

improve graph operation performance (rebased onto metadata)#4380
joshmoore wants to merge 2 commits intoome:metadatafrom
joshmoore:rebased/metadata/adjust-graph-type-query

Conversation

@joshmoore
Copy link
Copy Markdown
Member

This is the same as gh-4378 but rebased onto metadata.


With thanks to @joshmoore, adjust how graph traversal uses Hibernate to determine the exact class of model objects. Testing is by CI.

--no-rebase

@joshmoore
Copy link
Copy Markdown
Member Author

--rebased-from #4378

@joshmoore
Copy link
Copy Markdown
Member Author

Watching stack traces (very unscientifically):

Count 1:

        at org.hibernate.util.EqualsHelper.equals(EqualsHelper.java:33)
        at org.hibernate.type.NullableType.isEqual(NullableType.java:237)
        at org.hibernate.type.NullableType.isEqual(NullableType.java:233)
        at org.hibernate.type.AbstractType.isEqual(AbstractType.java:140)
        at org.hibernate.engine.EntityKey.equals(EntityKey.java:119)
        at java.util.HashMap.getEntry(HashMap.java:467)
        at java.util.HashMap.containsKey(HashMap.java:449)
        at org.hibernate.engine.StatefulPersistenceContext.containsEntity(StatefulPersistenceContext.java:371)
        at org.hibernate.type.ManyToOneType.scheduleBatchLoadIfNeeded(ManyToOneType.java:174)
        at org.hibernate.type.ManyToOneType.hydrate(ManyToOneType.java:160)
        at org.hibernate.type.EntityType.nullSafeGet(EntityType.java:227)
        at org.hibernate.loader.hql.QueryLoader.getResultColumnOrRow(QueryLoader.java:403)
        at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:647)
        at org.hibernate.loader.Loader.doQuery(Loader.java:745)
        at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:270)
        at org.hibernate.loader.Loader.doList(Loader.java:2449)
        at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2192)
        at org.hibernate.loader.Loader.list(Loader.java:2187)
        at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:452)
        at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:363)
        at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:196)
        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1260)
        at org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)

Count 7:

...
  e.g.  at ome.services.graphs.GraphTraversal.reviewObject(GraphTraversal.java:1113)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:624)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:513)
        at omero.cmd.graphs.Delete2I.step(Delete2I.java:177)

Count 4:

        at ome.services.graphs.GraphTraversal.findObjectDetails(GraphTraversal.java:777) & 807
        at ome.services.graphs.GraphTraversal.getLinksToCache(GraphTraversal.java:890)
        at ome.services.graphs.GraphTraversal.cache(GraphTraversal.java:939)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:616)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:513)
        at omero.cmd.graphs.Delete2I.step(Delete2I.java:177)

Count 4:

        at ome.services.graphs.GraphTraversal.findObjectDetails(GraphTraversal.java:807)
        at ome.services.graphs.GraphTraversal.getLinksToCache(GraphTraversal.java:890)
        +
        at org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)
        at ome.services.graphs.GraphTraversal.getLinksToCache(GraphTraversal.java:884)
        at ome.services.graphs.GraphTraversal.cache(GraphTraversal.java:939)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:616)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:513)
        at omero.cmd.graphs.Delete2I.step(Delete2I.java:177)

Count 9:

        at omero.cmd.graphs.BaseGraphPolicyAdjuster.review(BaseGraphPolicyAdjuster.java:107)
        +
        at ome.services.graphs.GraphTraversal.reviewObject(GraphTraversal.java:1113)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:607)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:513)

Count 17:

        at ome.services.graphs.GraphTraversal.getLinksToCache(GraphTraversal.java:884)
        at ome.services.graphs.GraphTraversal.cache(GraphTraversal.java:939)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:616)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:513)

But then I hit an OOM:

omero.cmd.Delete2 Screen 3... failed: 'graph-fail'
        message=Java heap space
        stacktrace=java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2219)
        at java.util.ArrayList.grow(ArrayList.java:242)
        at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
        at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
        at java.util.ArrayList.add(ArrayList.java:440)
        at ome.services.graphs.GraphTraversal.getLinksToCache(GraphTraversal.java:899)
        at ome.services.graphs.GraphTraversal.cache(GraphTraversal.java:939)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:616)
        at ome.services.graphs.GraphTraversal.planOperation(GraphTraversal.java:513)
        at omero.cmd.graphs.Delete2I.step(Delete2I.java:177)
        at omero.cmd.HandleI.steps(HandleI.java:437)
        at omero.cmd.HandleI$1.doWork(HandleI.java:365)
        at omero.cmd.HandleI$1.doWork(HandleI.java:361)
Elapsed time: 2876.641 secs.
Flags: [FAILURE, CANCELLED]

I recently lowered the memory on this machine from 24 GB to 16 GB so this isn't necessarily strictly a problem with this PR, but perhaps worth looking into further.

@mtbc
Copy link
Copy Markdown
Member

mtbc commented Dec 12, 2015

In the expected case that the same data's being deleted by only one Delete2 at a time (with poor transaction isolation) this PR certainly shouldn't be increasing memory consumption (and the other PR changes code after the cache stuff), so if the OOM is caused by this PR then something more surprising is wrong.

@joshmoore
Copy link
Copy Markdown
Member Author

Stopping the experiment at 4 hours with 24 GB since the server is suffering:

omero@f03a444e90aa:~$ grep -i delete OMERO.server/var/log/Blitz.log
...
2015-12-13 16:49:57,299 INFO  [        ome.services.util.ServiceHandler] (2-thread-1)  Executor.doWork -- omero.cmd.HandleI.run[daac2c34-6090-4aa7-8803-9a51fd8defb3/IHandle0b1a1721-a350-4469-ac34-559f3c38c2ed, omero.cmd.graphs.Delete2I@442182]
2015-12-13 16:49:57,304 INFO  [       omero.cmd.graphs.Delete2I.@442182] (l.Server-8) Add callback: Ip8GDk+?yT^_Ea9b,A.$/dc1be059-b01f-4cdd-b7c4-d6591fc32bda
2015-12-13 16:49:57,306 INFO  [       omero.cmd.graphs.Delete2I.@442182] (l.Server-9) getResponse: null
omero@f03a444e90aa:~$ date
Sun Dec 13 20:39:18 UTC 2015
omero@f03a444e90aa:~$ top
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 74507 omero     20   0 33.994g 0.025t  18400 S  3775 10.1   5314:09 java

@mtbc
Copy link
Copy Markdown
Member

mtbc commented Dec 14, 2015

I wonder if there's something horribly inefficient about Hibernate's type. Sigh. I guess this will require rather more investigation.

@joshmoore
Copy link
Copy Markdown
Member Author

Understood. I'll concentrate on deleting in smaller chunks then! 😄 cc: @jburel, though @mtbc's other PR should be fine.

@joshmoore joshmoore closed this Dec 17, 2015
@mtbc
Copy link
Copy Markdown
Member

mtbc commented Jan 6, 2016

Well, this is a puzzle. Using this PR on a screen with over 4,000 images and over 5,000 comments (each comment will exercise the subclass code) I'm finding that this PR offers over 40% improvement in time. It would therefore be odd if the memory usage were much worse, as I'd expect that to slow things down.

However, it looks like I misread the Javadocs at the last minute before my break and I don't even need the extra synchronization, so I'll remove that, then maybe we can try again for a bit more comparative testing?

@joshmoore
Copy link
Copy Markdown
Member Author

@mtbc : my memory issues could well be simply the size of what I was attempting since it was a screen with hundreds of plates each with hundreds of images.

@mtbc
Copy link
Copy Markdown
Member

mtbc commented Jan 7, 2016

I don't know at what frequency redeployments tend to needed but certainly feel free to exclude and unexclude #4400 as assists insuring it's not deleterious; it's not blocking any other work at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants