New course key migration script. #160

jcdyer · 2017-08-15T16:20:46Z

I have a suspicion that the problem was with the way the queries depend on the previous queries getting committed. I know edx.org uses a funky transaction mode for their MySQL instance which might cause problems with this behavior, and might also cause differences with a development box that uses a newer version of mysql (that might use a different transaction mode). This implementation makes the individual queries independent of one another, and also only performs one update query per batch, rather than one per object.

I still need to do manual tests, but wanted to get feedback.

Compare the implementation to the source I borrowed from here: https://github.com/open-craft/problem-builder/pull/143/files#diff-cd075b20c253e110da4aef02f64019d0R38

@bradenmacdonald @feanil

feanil

Generally looks good. We should make batch_start configurable since we may need to stop and start this job as we tweak batch sizes based on end-user impact.

feanil · 2017-08-15T17:20:04Z

problem_builder/management/commands/copy_deprecated_course_ids.py

+        batch_size = options['batch_size']
+        sleep_time = options['sleep']
+        max_pk = Answer.objects.order_by('-pk')[0].pk
+        batch_start = 0


If we have to stop this command and re-start it, we'll want to start at a different ID. We should make this configurable. Optionally it could look at the lowest PK where the new field is empty? That may be more clever than necessary.

bradenmacdonald · 2017-08-15T17:59:30Z

problem_builder/migrations/0004_copy_course_ids.py

+
+from django.db import migrations
+
+def copy_course_id_to_course_key(apps, schema_editor):


Should we just replace this forward migration with a noop in this 2.6.5.patch1 build, so devops doesn't have to bother manually faking it?

If we try to merge this branch into master, it will cause some pain to have a different version here. If we don't try to merge this branch into master, this version of the management command will disappear and will never have lived on the master branch, which may make it hard to dig up later on, if it's needed for historical reasons.

My suspicion is that faking the migration will be the lowest amount of pain, but I'll defer to @feanil's judgment on which approach they'd prefer.

Actually, if we're just going to cut a release directly off this branch it shouldn't matter. We can still resolve conflicts and merge the branch afterward. I like your plan @bradenmacdonald. I've set the migration to noop.

Yep, exactly. Let's release with this exact branch, and when we merge to master later (which we should, to keep the code, as you point out) we can resolve the migration conflict in favor of the existing migration that is not a noop.

Nice, that will make things a lot simpler, thanks @jcdyer @bradenmacdonald .

jcdyer · 2017-08-17T12:29:11Z

@bradenmacdonald @feanil The current version is working for a manual test, but there still seems to be an error in the circle-ci test suite (unrelated to my PR). Sorry I squashed commits without thinking. I fixed an off-by-one error and did some other clean-up to get the script working (I had the queryset filtered using __ge instead of __gte, the circle.yml file had some fiddly bits that needed fiddling, etc.).

With 1,000,000 Answer entries, batches of 5000 (the default) and no sleep between batches, All 200 batches run in about 90 seconds. I reduced the default sleep time from 5 seconds between batches to 1 second between batches (Which means it will still spend more time sleeping than working, but not waste quite as much wall clock time).

Before this gets run, we will still need to

tag a release.
update edx-platform to use that release.
Do a test run on stage.

I have permission to do #1 (just need thumbs on this PR), and create a PR for #2 (after the tag exists), but will need devops to do #3.

I will out of town the rest of this week. I'll be able to do minor tasks (like #s 1 and 2 above), but won't be available for larger work. We can either pick this up next week, or I can hand it off to @bradenmacdonald to shepherd through.

bradenmacdonald

Looking good, but I think I spotted a couple bugs.

I'm not worried about the CircleCI issues - they're definitely unrelated (probably flaky tests or dependency issues that have already been fixed on master) and I think we can be quite sure this PR isn't introducing any regressions.

One other general comment: Could we please rename copy_deprecated_course_ids to something that includes problem_builder in the name? Because the management command namespace is shared among all django apps in the edxapp virtualenv, and that name is very generic and could conflict with something else in the future.

bradenmacdonald · 2017-08-17T20:56:42Z

problem_builder/management/commands/copy_deprecated_course_ids.py

+        offset = options['offset']
+        max_pk = Answer.objects.order_by('-pk')[0].pk
+        batch_start = offset
+        batch_stop = batch_size


Shouldn't this be batch_stop = batch_start + batch_size ?

Otherwise if I pass --batch-size 200 --offset 600 then we'd have batch_stop < batch_start

I tested this and found that --offset didn't seem to be working.

Fixed. Thanks

bradenmacdonald · 2017-08-17T21:27:33Z

problem_builder/management/commands/copy_deprecated_course_ids.py

+            queryset = Answer.objects.filter(
+                pk__gte=batch_start,
+                pk__lt=batch_stop,
+                course_key='',


Looking at https://github.com/open-craft/problem-builder/blob/master/problem_builder/migrations/0003_auto_20161124_0755.py it seems that when we created the course_key column, we made it nullable with a null default, so I suspect that the live server actually has null values in this column, not empty strings. (Note that in migration 0005 we later fixed that, but that migration isn't on edX.org yet).

Using this should work:

queryset = Answer.objects.filter( pk__gte=batch_start, pk__lt=batch_stop, ).filter(Q(course_key__isnull=True) | Q(course_key=''))

Ok. Updated.

bradenmacdonald

Thanks for those fixes! Conditional 👍 from me, if you also rename the copy_deprecated_course_ids command to something problem-builder specific as mentioned.

I tested this: Tested some data migrations on my devstack - small numbers of rows though
I read through the code
~~I checked for accessibility issues~~ n/a
Includes documentation: has docstrings

jcdyer · 2017-08-21T11:17:02Z

Renamed command to problem_builder_migrate_keys. I'm going to tag a release.

jcdyer · 2017-08-21T13:46:41Z

Looks like someone created v2.6.5patch1 a couple days ago. Should I rebase on top of that, or release independently?

bradenmacdonald · 2017-08-21T15:06:34Z

@jcdyer Hmm, not sure how that happened. I have deleted that tag as it was obviously incorrect. Please release independently. We'll have to use a slightly different name for the new tag, though.

jcdyer · 2017-08-21T16:14:55Z

Release tagged as v2.6.5patch2

feanil · 2017-08-21T17:03:50Z

@jcdyer when you have the edx-platform PR, tag me and @fredsmith on it. He's on call this week but I'll be shepherding this through.

bradenmacdonald · 2017-08-24T19:03:10Z

setup.py

    version='2.7.2',
    description='XBlock - Problem Builder',
-    packages=['problem_builder', 'problem_builder.v1'],
+    packages=['problem_builder', 'problem_builder.v1', 'problem_builder.management', 'problem_builder.management.commands'],


Tip: It's even better to just use find_packages()

* Fix version number * Make circle script more flexible for version bumps. * Remove vestigial tests. * Use find_packages in setup.py.

jcdyer · 2017-08-24T19:40:12Z

@bradenmacdonald This is ready for a final review before merging to master. I had to remove the tests because the master branch expects course_key to be non-null, which makes them 1. fail, and 2. useless if they're revised not to fail. The reason I'm keeping the management command around is to give it a presence in the history of the master branch.

bradenmacdonald · 2017-08-27T01:54:06Z

👍 Thanks @jcdyer !

feanil reviewed Aug 15, 2017

View reviewed changes

bradenmacdonald reviewed Aug 15, 2017

View reviewed changes

jcdyer force-pushed the cliff/migrate-course-key branch from f7d177f to 82db864 Compare August 17, 2017 12:03

bradenmacdonald requested changes Aug 17, 2017

View reviewed changes

bradenmacdonald approved these changes Aug 18, 2017

View reviewed changes

jcdyer force-pushed the cliff/migrate-course-key branch 2 times, most recently from 15ccaa4 to 3463336 Compare August 21, 2017 11:15

New course key migration script.

3463336

jcdyer mentioned this pull request Aug 22, 2017

Update problem-builder requirement to 2.6.5patch2 to migrate keys. openedx/openedx-platform#15862

Merged

Merge branch 'master' into cliff/migrate-course-key

91e8b52

bradenmacdonald reviewed Aug 24, 2017

View reviewed changes

Prepare code for merge to master branch.

ddcf831

* Fix version number * Make circle script more flexible for version bumps. * Remove vestigial tests. * Use find_packages in setup.py.

jcdyer force-pushed the cliff/migrate-course-key branch from af25c49 to 3b88683 Compare August 24, 2017 19:39

jcdyer force-pushed the cliff/migrate-course-key branch from 3b88683 to ddcf831 Compare August 24, 2017 19:41

bradenmacdonald merged commit 7665063 into master Aug 27, 2017

bradenmacdonald deleted the cliff/migrate-course-key branch August 27, 2017 01:54


		from django.db import migrations

		def copy_course_id_to_course_key(apps, schema_editor):

New course key migration script. #160

New course key migration script. #160

Uh oh!

Conversation

jcdyer commented Aug 15, 2017

Uh oh!

feanil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer commented Aug 17, 2017

Uh oh!

bradenmacdonald left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bradenmacdonald left a comment

Choose a reason for hiding this comment

Uh oh!

jcdyer commented Aug 21, 2017

Uh oh!

jcdyer commented Aug 21, 2017

Uh oh!

bradenmacdonald commented Aug 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcdyer commented Aug 21, 2017

Uh oh!

feanil commented Aug 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer commented Aug 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradenmacdonald commented Aug 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jcdyer Aug 15, 2017 •

edited

Loading

bradenmacdonald commented Aug 21, 2017 •

edited

Loading

jcdyer commented Aug 24, 2017 •

edited

Loading