Skip to content

Conversation

@mtyaka
Copy link
Contributor

@mtyaka mtyaka commented Jan 13, 2017

This is a follow-up to #14044, in which we fixed a bug that would cause problem-builder to fail in courses with long course keys (> 50 characters). Because the problem-builder xblock is used on edx.org and the affected production problem_builder_answer is large, we had to do the migration to extend the course_key column in two steps. For more info on that see https://github.com/edx/edx-platform/pull/14013.

The new problem-builder removes the temporary transition code and includes a migration that copies over values from the old deprecated course_id column to the new extended course_key column.

IMPORTANT: Before deploying v2.6.9 to production, edX will have to fake this migration and manually copy course_id values into course_key in batches because the production problem_builder_answer table is large and performing the migration in one go would degrade performance!

See the problem-builder PR included in the new release:
open-craft/problem-builder#138

Environments: edx.org and edge.edx.org

Partner information: hosted on edX Edge (Davidson College)

JIRA ticket: TNL-5932

Dependencies: open-craft/problem-builder#138

Testing instructions:

This patch should not change problem-builder functionality in any way. Verify that problem-builder still works correctly on the sandbox.

Sandbox:

Reviewers:


Settings

EDXAPP_INSTALL_PRIVATE_REQUIREMENTS: true
COMMON_EDXAPP_SETTINGS: openstack

@maxrothman
Copy link
Contributor

Out of curiosity, why did you pin to a commit rather than a tag? Is there not a release with that commit in it in the other repo?

@mtyaka
Copy link
Contributor Author

mtyaka commented Jan 23, 2017

@maxrothman Thanks for looking at this! I do plan to create a tag for it (and I'll update this PR when I do), but I would like to get open-craft/problem-builder#138 reviewed and merged before I do that, to avoid having to do a series of releases just to address any potential review comments.

If you could take a look at open-craft/problem-builder#138, that would be great.

@mtyaka mtyaka force-pushed the mtyaka/problem-builder-2.6.6 branch from 45bb38d to 5dccd19 Compare January 25, 2017 08:15
@mtyaka mtyaka changed the title WIP: Bump problem builder to v2.6.6. Bump problem builder to v2.6.6. Jan 26, 2017
@mtyaka
Copy link
Contributor Author

mtyaka commented Jan 26, 2017

open-craft/problem-builder#138 has been merged and the v2.6.6 tag created, so I updated this PR to pin to that tag. It is now ready for review.

@bdero
Copy link
Contributor

bdero commented Jan 27, 2017

👍

  • I tested this: I checked to make sure the sandbox has 2.6.6 and then smoke tested the problem builder examples on the sandbox.
  • I read through the code
  • I checked for accessibility issues Not necessary - there are no user-facing changes.
  • Includes documentation Not necessary - there are no UX or setup changes.

@mtyaka
Copy link
Contributor Author

mtyaka commented Jan 27, 2017

Thanks @bdero!

@maxrothman Thanks for reviewing open-craft/problem-builder#138. That has now been merged, a new v2.6.6 tag created and this PR updated to pin to the tag. If you could review this PR as well that would be great.
Care needs to be taken to fake the problem-builder migration contained in the new release and perform it manually in batches before deploying this to production.

@jibsheet
Copy link
Contributor

@cpennington FYI

@bradenmacdonald
Copy link
Contributor

@jibsheet @maxrothman would you guys be able to give us an ETA for this PR's merge+migration? Anything else needed from us at this point?

@maxrothman
Copy link
Contributor

maxrothman commented Feb 14, 2017

FYI @edx/devops and @cpennington there'll be some manual action required to fake the migration and run it in the background. See open-craft/problem-builder#138 for details.

@maxrothman
Copy link
Contributor

Also pinging @nedbat because this impacts people deploying the next named release

@maxrothman
Copy link
Contributor

I'll let you know when edX has done the migration work so this can be merged.

@nedbat
Copy link
Contributor

nedbat commented Feb 14, 2017

@mtyaka Could you add a section to the Gingko wiki page detailing what will need to be done when installing this? https://openedx.atlassian.net/wiki/display/COMM/Gingko Thanks.

@maxrothman
Copy link
Contributor

maxrothman commented Feb 14, 2017

@mtyaka just so I'm on the same page, does this release only add the migration that moves the data, or does it also drop the old column?
EDIT: figured it out 😳

@bradenmacdonald
Copy link
Contributor

@mtyaka Could you add a section to the Gingko wiki page detailing what will need to be done when installing this? https://openedx.atlassian.net/wiki/display/COMM/Gingko Thanks.

Shouldn't affect Open edX. Problem Builder is in edx-private.txt so it's used on edx.org but not part of a default Open edX install. Also even if people are using it, my impression was that manual action is mostly required for edx.org due to the large size of this particular db table - smaller-scale installations can probably just run the migration normally.

@maxrothman
Copy link
Contributor

A couple of things:

  • Could you take a crack at writing the manage.py shell script that performs the out-of-band migration? We're going to be publishing it in the next release for larger-scale deployers. It should have parameterized batch sizes and sleep times.
  • edX DevOps has scheduled the work necessary to merge this for about a month for now, so this can't merge until then. If there are reasons need it sooner, let me know and we can talk.

@bradenmacdonald
Copy link
Contributor

@maxrothman Would you be able to tell us the week when this is scheduled specifically, so we can schedule it into a sprint on our side as well?

I'll let @mtyaka answer about that shell script.

@mtyaka
Copy link
Contributor Author

mtyaka commented Feb 20, 2017

@maxrothman Yes, we can add a django management command to perform the migration in batches. We'll schedule some time to do that with @bradenmacdonald.

@maxrothman
Copy link
Contributor

@mtyaka though I won't stop you from writing a management command, I'm fine if you just post some python in the PR 😄

@bradenmacdonald we do 2-week sprints. Our current sprint ends Friday and this story is schedule for 2 sprints from now, so sometime between 3-4 weeks from now, depending on other obligations. We'll let you know if the schedule changes.

@mtyaka
Copy link
Contributor Author

mtyaka commented Feb 23, 2017

@maxrothman This PR adds the management command: open-craft/problem-builder#143

Do you want to take a look before I merge it?

@mtyaka mtyaka force-pushed the mtyaka/problem-builder-2.6.6 branch from 5dccd19 to 1098ac2 Compare February 24, 2017 11:05
@mtyaka mtyaka changed the title Bump problem builder to v2.6.6. Bump problem builder to v2.6.7. Feb 24, 2017
@mtyaka
Copy link
Contributor Author

mtyaka commented Feb 24, 2017

Rebased and updated to problem builder v.2.6.7, which contains the management command.

@mtyaka
Copy link
Contributor Author

mtyaka commented Mar 20, 2017

@maxrothman It looks like this one got lost in a flux, sorry this is taking so long. I made a change in open-craft/problem-builder#145 to remove the COUNT(*) query from the management command. Can you please take a look? If it looks good to you, I'll merge it, create a new problem-builder release, and update this PR to point to the new release.

@mtyaka mtyaka force-pushed the mtyaka/problem-builder-2.6.6 branch from 1098ac2 to 4e9ea78 Compare March 21, 2017 10:04
@mtyaka
Copy link
Contributor Author

mtyaka commented Mar 21, 2017

@maxrothman Thanks! open-craft/problem-builder#143 has been merged, new release created, and this PR updated to point to the new release.

@mtyaka mtyaka changed the title Bump problem builder to v2.6.7. Bump problem builder to v2.6.9. Mar 24, 2017
@mtyaka
Copy link
Contributor Author

mtyaka commented May 5, 2017

@jibsheet I heard Max left edX which is probably why this PR is left hanging. Can you please help us get this merged?

The important thing is that the new problem-builder release contains a data migration which copies values of old course_id column to the new extended course_key column. Since the edX production problem_builder.answer table is huge, this migration should NOT be run automatically as part of the normal deployment process, the data should be copied manually BEFORE deploying the new version to production instead.

To make copying easier, we prepared a management command, which performs copying in batches, with a configurable batch size and sleep time between batches.

@jibsheet
Copy link
Contributor

jibsheet commented May 5, 2017

@mtyaka Yes, Max has moved on. I believe we have a ticket but not sure of scheduling.

@fredsmith is our scrum master and @feanil is team lead, so they're best positioned to look at scheduling this work.

@mtyaka
Copy link
Contributor Author

mtyaka commented May 31, 2017

@fredsmith @feanil Any idea when this can get scheduled?

@feanil
Copy link
Contributor

feanil commented Jun 2, 2017

@mtyaka reviewing this again to gain context, a few questions/clarifications:

The idea would be:

  1. To merge this.
  2. Cut a new release from this.
  3. Do the migration that creates the new column? What's the app/number for this migration?
  4. Fake the migration that does the data migration?
  5. Run the data migration via the management command.
  6. Release the new code to production?

@mtyaka
Copy link
Contributor Author

mtyaka commented Jun 6, 2017

@feanil That's correct, with the exception of 3 (the migration that creates the new column was already deployed to production some time ago). The migration that created the new course_key column is in problem_builder app, number 0003_auto_20161124_0755 (link):

  1. To merge this.
  2. Cut a new release from this.
  3. Do the migration that creates the new column? What's the app/number for this migration? Migration problem_builder/0003_auto_20161124_0755 was already deployed to production as part of problem-builder release v2.6.5.
  4. Fake the migration that does the data migration? Yes, fake problem_builder / 0004_copy_course_ids.
  5. Run the data migration via the management command.
  6. Release the new code to production? Yes, note that v.2.6.9 contains another migration (problem_builder / 0005_auto_20170112_1021), which makes the course_key column non-nullable. This migration is quick and can be run in production, but it's important that the data migration performed via the management command in the previous step is complete before running this migration (otherwise this migration would fail because some course_key columns might be empty).

@feanil
Copy link
Contributor

feanil commented Jun 20, 2017

@mtyaka @bradenmacdonald I'll be on-call next week and can Shepard through this change assuming we don't see other issues that prevent that. Can we have someone on your side get it to green before then?

@mtyaka mtyaka force-pushed the mtyaka/problem-builder-2.6.6 branch from 4e9ea78 to f0b2a91 Compare June 20, 2017 13:59
@mtyaka
Copy link
Contributor Author

mtyaka commented Jun 20, 2017

Thanks @feanil! We'll get it to green.

@mtyaka
Copy link
Contributor Author

mtyaka commented Jun 20, 2017

@feanil Green and ready :)

@feanil feanil merged commit 00ba9c7 into openedx:master Jun 26, 2017
@feanil
Copy link
Contributor

feanil commented Jun 26, 2017

@mtyaka Unfortunately, the management is not being found when I go to run manage.py in our production system. I'm not sure what's going on there. I do see that the app is installed and the migrations are visible.

I also tried to run the management command via django-shell but it seems to be hanging on something, It's not responding to CTRL-C and has to be kill -9ed. I didn't have too much time to investigate.

Unfortunately I'm going to have to revert this for now.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production on Tuesday, June 27, 2017.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@bradenmacdonald
Copy link
Contributor

@feanil Argh, that's a disappointment after all this time. Thanks for the attempt - we'll have to sort this out on our side ASAP. Do you remember if the management command (when run from the shell) produced any output at all?

@mtyaka Any ideas on why the management command would not be found?

@feanil
Copy link
Contributor

feanil commented Jun 28, 2017

@bradenmacdonald when I tried to run the command, I got the same error as I would for a command that doesn't exist(standard django error, no other useful info).

@mtyaka
Copy link
Contributor Author

mtyaka commented Jun 28, 2017

@feanil Hmm... that's weird. I'm not sure how that could happen. I'll try to reproduce.

@bradenmacdonald
Copy link
Contributor

@feanil Sorry, I was trying to ask whether there was any output when you ran "the management command via django-shell" and had to kill it. The script should be producing regular output as it does each batch, so if it hung without producing any output, that would be quite strange.

@feanil
Copy link
Contributor

feanil commented Jun 28, 2017

It seemed like it never got through the first batch. I tried making the batches smaller and printing more often and that seemed to have some effect but eventually it would lock up.

@mtyaka
Copy link
Contributor Author

mtyaka commented Jul 24, 2017

@feanil While trying to reproduce the issues you encountered I initially ran into the same problem where the management command would not be found, but I later realized that I was trying to run copy_deprecated_course_ids while the command is actually called copy_deprecated_course_id (without the "s"). That's an easy error to make especially since the migration is called 0004_copy_course_ids.

I then tried to reproduce the second part of the problem where the command hangs. I suspected it could be due to a memory leak because I previously didn't test the command with a large number of items in the database.

However after filling the database on my devstack with over 2 million rows, the management command was still working fine for me. The command would periodically print "Processed batch N" and the memory usage remained more or less constant while it worked through the batches.

I am not sure how to debug this further without being able to reproduce the problem. Any ideas @feanil @bradenmacdonald?

@bradenmacdonald bradenmacdonald deleted the mtyaka/problem-builder-2.6.6 branch May 18, 2019 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants