Create flow for building a fresh solr instance from a dump file#1843
Create flow for building a fresh solr instance from a dump file#1843mekarpeles merged 101 commits intointernetarchive:masterfrom
Conversation
b3041b4 to
6f0e3bd
Compare
|
Is this still WIP? |
|
Yes, unfortunately; have some final cleanup left. |
|
So is this getting close to being ready for review? I see that the WIP tag got removed. I have a couple of dozen comments from a review that I began 14 months ago, but I imagine many of them are obsolete. I don't want to go to the trouble of re-reviewing them and updating them though, if this is still in flux. When we last chatted 9 months ago there was basic disagreement on whether having this be simple, fast, and runnable by any developer were important goals. Has any of that changed? I believe these attributes are critical to being able to iterate quickly by multiple developers. The PR #2246 attempts to improve along these dimensions, but it hasn't been looked at it in a while. |
|
@cdrini Ping. Any feedback on my question from a week and a half ago? |
| } | ||
| } | ||
| parameters { | ||
| string(name: 'SOLR_BACKUP_GZIP', defaultValue: 'backup-2019-07-30.tar.gz', description: 'Location of the dump gzip in /storage/openlibrary/solr') |
There was a problem hiding this comment.
is it okay to hard-code against /storage which is OJF specific?
| @@ -0,0 +1,474 @@ | |||
| from __future__ import division | |||
|
|
|||
| import ConfigParser | |||
There was a problem hiding this comment.
check python2 v. python3 here
| :return: dict of key value pairs | ||
| :rtype: dict | ||
| """ | ||
| config = ConfigParser.ConfigParser() |
mekarpeles
left a comment
There was a problem hiding this comment.
This looks like a great 1st stab.
1 potential python2->3 blocker.
Merging as there appear to be no production consequences
Progress towards #1067 (and, in general, all our solr issues).
Description:
Sorry folks, this a very messy PR, but ran out of time to make this neater. I think there is a lot of room for improvement with this PR (more tests, general cleanup, etc), but I think it is ready to merge. It's been run multiple times with few errors.
Technical:
General "How it works": A Jenkins job loads the dump file into a local postgres instance, import works/orphans, and then imports them into solr. Solr needs to query the data as its importing; that's why the db is needed.
Testing:
Have run it multiple times on OJF; it does have issues on occasion, but most issues are recoverable. Since this is more of a dev ops tool anyways, I'm ok with that. http://server.openjournal.foundation:8081/job/solrbuilder-reindex/