Skip to content

Solr: Index all performance is too slow with full production data. #50

@eaquigley

Description

@eaquigley

Author Name: Kevin Condon (@kcondon)
Original Redmine Issue: 3457, https://redmine.hmdc.harvard.edu/issues/3457
Original Date: 2014-01-29
Original Assignee: Philip Durbin


Preliminary testing shows index all is taking too long with full production data.

Indexing 1861 dataverses: 41 minutes

Indexing 1900 datasets: 2 hours, 15 minutes. There are 52,000+ datasets.

The above numbers were achieved on dvn-3 with full production data of public dv's and studies. Various glassfish heaps of 512MB and 10GB showed the same performance.


We see"java -server -jar start.jar" at https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding

-server? What does that mean?

man java says this...

   -server             Selects the Java  HotSpot  Server  VM.   For  more  information  see
                       Server-Class             Machine             Detection            at
                       http://java.sun.com/j2se/1.5.0/docs/guide/vm/server-class.html

... and if you follow that link you see this:

"Starting with J2SE 5.0, when an application starts up, the launcher can attempt to detect whether the application is running on a "server-class" machine and, if so, use the Java HotSpot Server Virtual Machine (server VM) instead of the Java HotSpot Client Virtual Machine (client VM). The aim is to improve performance even if no one configures the VM to reflect the application it's running. In general, the server VM starts up more slowly than the client VM, but over time runs more quickly."

Maybe this can help performance?


Related issue(s): #623
Redmine related issue(s): 3430, 4062


Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions