Skip to content

Create flow for building a fresh solr instance from a dump file#1843

Merged
mekarpeles merged 101 commits intointernetarchive:masterfrom
cdrini:solr-builder
Apr 10, 2020
Merged

Create flow for building a fresh solr instance from a dump file#1843
mekarpeles merged 101 commits intointernetarchive:masterfrom
cdrini:solr-builder

Conversation

@cdrini
Copy link
Copy Markdown
Collaborator

@cdrini cdrini commented Jan 21, 2019

Progress towards #1067 (and, in general, all our solr issues).

Description:
Sorry folks, this a very messy PR, but ran out of time to make this neater. I think there is a lot of room for improvement with this PR (more tests, general cleanup, etc), but I think it is ready to merge. It's been run multiple times with few errors.

Technical:
General "How it works": A Jenkins job loads the dump file into a local postgres instance, import works/orphans, and then imports them into solr. Solr needs to query the data as its importing; that's why the db is needed.

Testing:
Have run it multiple times on OJF; it does have issues on occasion, but most issues are recoverable. Since this is more of a dev ops tool anyways, I'm ok with that. http://server.openjournal.foundation:8081/job/solrbuilder-reindex/

@cdrini cdrini added Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Module: Docker Issues related to the configuration or use of Docker. [managed] WIP labels Jan 21, 2019
@cdrini cdrini requested review from hornc and tfmorris January 21, 2019 23:19
@cdrini cdrini changed the title WIP: Create flow for building a fresh solr instance from a dump file. WIP: Create flow for building a fresh solr instance from a dump file Jan 21, 2019
@cdrini cdrini requested a review from mekarpeles January 21, 2019 23:22
@brad2014 brad2014 added State: Work In Progress This issue is being actively worked on. [managed] and removed WIP labels Apr 29, 2019
@cdrini cdrini changed the title WIP: Create flow for building a fresh solr instance from a dump file Create flow for building a fresh solr instance from a dump file Jun 21, 2019
@cdrini cdrini force-pushed the solr-builder branch 2 times, most recently from b3041b4 to 6f0e3bd Compare August 12, 2019 20:53
@mekarpeles
Copy link
Copy Markdown
Member

Is this still WIP?

@cdrini
Copy link
Copy Markdown
Collaborator Author

cdrini commented Aug 17, 2019

Yes, unfortunately; have some final cleanup left.

@cdrini cdrini removed their assignment Aug 20, 2019
@mekarpeles mekarpeles changed the title Create flow for building a fresh solr instance from a dump file WIP: Create flow for building a fresh solr instance from a dump file Sep 9, 2019
@cdrini cdrini removed the State: Work In Progress This issue is being actively worked on. [managed] label Mar 2, 2020
@cdrini cdrini changed the title WIP: Create flow for building a fresh solr instance from a dump file Create flow for building a fresh solr instance from a dump file Mar 2, 2020
@cdrini cdrini removed their assignment Mar 2, 2020
@tfmorris
Copy link
Copy Markdown
Contributor

tfmorris commented Mar 4, 2020

So is this getting close to being ready for review? I see that the WIP tag got removed.

I have a couple of dozen comments from a review that I began 14 months ago, but I imagine many of them are obsolete. I don't want to go to the trouble of re-reviewing them and updating them though, if this is still in flux.

When we last chatted 9 months ago there was basic disagreement on whether having this be simple, fast, and runnable by any developer were important goals. Has any of that changed? I believe these attributes are critical to being able to iterate quickly by multiple developers. The PR #2246 attempts to improve along these dimensions, but it hasn't been looked at it in a while.

@hornc hornc removed their assignment Mar 14, 2020
@tfmorris
Copy link
Copy Markdown
Contributor

@cdrini Ping. Any feedback on my question from a week and a half ago?

}
}
parameters {
string(name: 'SOLR_BACKUP_GZIP', defaultValue: 'backup-2019-07-30.tar.gz', description: 'Location of the dump gzip in /storage/openlibrary/solr')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it okay to hard-code against /storage which is OJF specific?

@@ -0,0 +1,474 @@
from __future__ import division

import ConfigParser
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check python2 v. python3 here

:return: dict of key value pairs
:rtype: dict
"""
config = ConfigParser.ConfigParser()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same re: py2v.3

Copy link
Copy Markdown
Member

@mekarpeles mekarpeles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great 1st stab.
1 potential python2->3 blocker.
Merging as there appear to be no production consequences

@mekarpeles mekarpeles merged commit 8c0596a into internetarchive:master Apr 10, 2020
@cdrini cdrini deleted the solr-builder branch November 29, 2020 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Module: Docker Issues related to the configuration or use of Docker. [managed] Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants