Skip to content

6665 index dv api fails#6704

Merged
kcondon merged 13 commits intodevelopfrom
6665-index-dv-api-fails
Mar 12, 2020
Merged

6665 index dv api fails#6704
kcondon merged 13 commits intodevelopfrom
6665-index-dv-api-fails

Conversation

@sekmiller
Copy link
Contributor

What this PR does / why we need it:
This makes the indexing of linked dataverses more efficient by modifying the solr docs of the owned datasets instead of doing a full re-index. When the datasets were fully re-indexed dataverses with large numbers of datasets would fail to re-index.

Which issue(s) this PR closes:

Closes #6665

Special notes for your reviewer:

Suggestions on how to test this:
verify that a dataverse with a large number of datasets (mra - for one) can be re-indexed via the api. Also verify that linking dataverses also display the datasets of the linked dataverse.

Does this PR introduce a user interface change?:
no
Is there a release notes update needed for this change?:
none
Additional documentation:

@coveralls
Copy link

coveralls commented Feb 27, 2020

Coverage Status

Coverage decreased (-0.005%) to 19.438% when pulling 7856538 on 6665-index-dv-api-fails into fea57c1 on develop.

@kcondon
Copy link
Contributor

kcondon commented Mar 3, 2020

@sekmiller It looks like batch index performance became a lot slower: from 18hours to 6days, estimated. Will do further testing to see whether it is a completion rate issue versus a memory/resource problem, versus a few problem datasets.

Update: Ran it again , starting at 2pm on 3/3. It is still running but only 26k of 95k datasets indexed and appears to be 4 mins between indexing a dataset. CPU ~98%, mem ok, 30% used, ~3500 open file descriptors. This increasingly slow to index behavior feels like an algorithm issue where it is reprocessing an entire list that gets slower as list gets larger. We've seen it in other batch jobs in the past. Just speculation though.

@kcondon kcondon assigned kcondon and sekmiller and unassigned kcondon Mar 3, 2020
@sekmiller sekmiller removed their assignment Mar 12, 2020
@kcondon kcondon self-assigned this Mar 12, 2020
@kcondon kcondon merged commit c574792 into develop Mar 12, 2020
@kcondon kcondon deleted the 6665-index-dv-api-fails branch March 12, 2020 20:56
@djbrooke djbrooke added this to the 4.20 milestone Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Index: Cannot api index a particular dataverse in develop, no errors, hangs, affects/prevents batch index too.

5 participants