Create flow for building a fresh solr instance from a dump file by cdrini · Pull Request #1843 · internetarchive/openlibrary

cdrini · 2019-01-21T23:19:12Z

Progress towards #1067 (and, in general, all our solr issues).

Description:
Sorry folks, this a very messy PR, but ran out of time to make this neater. I think there is a lot of room for improvement with this PR (more tests, general cleanup, etc), but I think it is ready to merge. It's been run multiple times with few errors.

Technical:
General "How it works": A Jenkins job loads the dump file into a local postgres instance, import works/orphans, and then imports them into solr. Solr needs to query the data as its importing; that's why the db is needed.

Testing:
Have run it multiple times on OJF; it does have issues on occasion, but most issues are recoverable. Since this is more of a dev ops tool anyways, I'm ok with that. http://server.openjournal.foundation:8081/job/solrbuilder-reindex/

mekarpeles · 2019-08-16T05:40:25Z

Is this still WIP?

cdrini · 2019-08-17T23:43:43Z

Yes, unfortunately; have some final cleanup left.

tfmorris · 2020-03-04T21:15:23Z

So is this getting close to being ready for review? I see that the WIP tag got removed.

I have a couple of dozen comments from a review that I began 14 months ago, but I imagine many of them are obsolete. I don't want to go to the trouble of re-reviewing them and updating them though, if this is still in flux.

When we last chatted 9 months ago there was basic disagreement on whether having this be simple, fast, and runnable by any developer were important goals. Has any of that changed? I believe these attributes are critical to being able to iterate quickly by multiple developers. The PR #2246 attempts to improve along these dimensions, but it hasn't been looked at it in a while.

…age on prod

tfmorris · 2020-03-14T22:06:57Z

@cdrini Ping. Any feedback on my question from a week and a half ago?

mekarpeles · 2020-04-10T21:27:58Z

scripts/solr_builder/Jenkinsfile

+    }
+  }
+  parameters {
+    string(name: 'SOLR_BACKUP_GZIP', defaultValue: 'backup-2019-07-30.tar.gz', description: 'Location of the dump gzip in /storage/openlibrary/solr')


is it okay to hard-code against /storage which is OJF specific?

mekarpeles · 2020-04-10T21:32:20Z

scripts/solr_builder/solr_builder/solr_builder_main.py

@@ -0,0 +1,474 @@
+from __future__ import division
+
+import ConfigParser


check python2 v. python3 here

mekarpeles · 2020-04-10T21:32:39Z

scripts/solr_builder/solr_builder/solr_builder_main.py

+    :return: dict of key value pairs
+    :rtype: dict
+    """
+    config = ConfigParser.ConfigParser()


same re: py2v.3

mekarpeles

This looks like a great 1st stab.
1 potential python2->3 blocker.
Merging as there appear to be no production consequences

cdrini added Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Module: Docker Issues related to the configuration or use of Docker. [managed] WIP labels Jan 21, 2019

cdrini requested review from hornc and tfmorris January 21, 2019 23:19

cdrini changed the title ~~WIP: Create flow for building a fresh solr instance from a dump file.~~ WIP: Create flow for building a fresh solr instance from a dump file Jan 21, 2019

cdrini requested a review from mekarpeles January 21, 2019 23:22

cdrini force-pushed the solr-builder branch from 9c0b290 to a43bfa3 Compare February 16, 2019 21:14

cdrini mentioned this pull request Feb 17, 2019

Subjects not being indexed into Solr #1896

Open

cdrini force-pushed the solr-builder branch from 7a05e3c to 29f894c Compare April 23, 2019 03:38

brad2014 added State: Work In Progress This issue is being actively worked on. [managed] and removed WIP labels Apr 29, 2019

cdrini mentioned this pull request May 3, 2019

Solr should normalize ISBNs #609

Closed

brad2014 assigned cdrini May 4, 2019

cdrini mentioned this pull request May 8, 2019

Make solr-updater not error on malformed authors #2116

Merged

cdrini force-pushed the solr-builder branch from 29f894c to fba9bca Compare May 8, 2019 22:57

cdrini mentioned this pull request May 25, 2019

Create readonly postgres user for solrupdater #2150

Merged

cdrini force-pushed the solr-builder branch from 9173683 to 1e9bb9d Compare June 21, 2019 14:48

cdrini changed the title ~~WIP: Create flow for building a fresh solr instance from a dump file~~ Create flow for building a fresh solr instance from a dump file Jun 21, 2019

cdrini force-pushed the solr-builder branch from 1e9bb9d to 0b581b3 Compare July 22, 2019 19:51

tfmorris mentioned this pull request Jul 30, 2019

WIP - Solr enhancements #2246

Closed

cdrini force-pushed the solr-builder branch 2 times, most recently from b3041b4 to 6f0e3bd Compare August 12, 2019 20:53

cdrini removed their assignment Aug 20, 2019

cdrini mentioned this pull request Sep 9, 2019

Reindex documents into a new Solr on OJF #2222

Closed

3 tasks

mekarpeles changed the title ~~Create flow for building a fresh solr instance from a dump file~~ WIP: Create flow for building a fresh solr instance from a dump file Sep 9, 2019

cclauss force-pushed the master branch from 95fa9af to 788a8fb Compare September 23, 2019 19:55

cdrini mentioned this pull request Oct 9, 2019

remove editions search endpoint #2470

Merged

cdrini force-pushed the solr-builder branch from 6f0e3bd to ed096b5 Compare March 2, 2020 14:23

cdrini added 2 commits March 2, 2020 09:58

[solr-builder] Fix linting errors

30d6a28

[solr-builder] Fix last linting error + switch to requests

887ee0b

cdrini removed the State: Work In Progress This issue is being actively worked on. [managed] label Mar 2, 2020

cdrini changed the title ~~WIP: Create flow for building a fresh solr instance from a dump file~~ Create flow for building a fresh solr instance from a dump file Mar 2, 2020

cdrini removed their assignment Mar 2, 2020

[solr-builder] Add deploy instructions to README

349fbaf

cdrini assigned hornc Mar 2, 2020

cdrini modified the milestones: Sprint 2020-02, Active Sprint Mar 2, 2020

[solr-builder] Fix solr running out of memory and exploding in CPU us…

c702875

…age on prod

cdrini force-pushed the solr-builder branch from 40d9550 to c702875 Compare March 5, 2020 06:40

hornc removed their assignment Mar 14, 2020

mekarpeles added the Needs: Reviewer label Mar 16, 2020

cdrini modified the milestones: Sprint 2020-03, Next Sprint (Proposed) Apr 6, 2020

cdrini assigned mekarpeles and cdrini Apr 6, 2020

cdrini modified the milestones: Next Sprint (Proposed), Active Sprint Apr 6, 2020

cdrini removed the Needs: Reviewer label Apr 6, 2020

cdrini mentioned this pull request Apr 6, 2020

Update to Solr 8 (Latest) #3317

Closed

31 tasks

mekarpeles reviewed Apr 10, 2020

View reviewed changes

mekarpeles approved these changes Apr 10, 2020

View reviewed changes

mekarpeles merged commit 8c0596a into internetarchive:master Apr 10, 2020

cdrini deleted the solr-builder branch November 29, 2020 06:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create flow for building a fresh solr instance from a dump file#1843

Create flow for building a fresh solr instance from a dump file#1843
mekarpeles merged 101 commits intointernetarchive:masterfrom
cdrini:solr-builder

cdrini commented Jan 21, 2019 •

edited

Loading

Uh oh!

mekarpeles commented Aug 16, 2019

Uh oh!

cdrini commented Aug 17, 2019

Uh oh!

tfmorris commented Mar 4, 2020

Uh oh!

tfmorris commented Mar 14, 2020

Uh oh!

mekarpeles Apr 10, 2020

Uh oh!

mekarpeles Apr 10, 2020

Uh oh!

mekarpeles Apr 10, 2020

Uh oh!

mekarpeles left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -0,0 +1,474 @@
		from __future__ import division

		import ConfigParser

Uh oh!

Conversation

cdrini commented Jan 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mekarpeles commented Aug 16, 2019

Uh oh!

cdrini commented Aug 17, 2019

Uh oh!

tfmorris commented Mar 4, 2020

Uh oh!

tfmorris commented Mar 14, 2020

Uh oh!

mekarpeles Apr 10, 2020

Choose a reason for hiding this comment

Uh oh!

mekarpeles Apr 10, 2020

Choose a reason for hiding this comment

Uh oh!

mekarpeles Apr 10, 2020

Choose a reason for hiding this comment

Uh oh!

mekarpeles left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cdrini commented Jan 21, 2019 •

edited

Loading