-
Notifications
You must be signed in to change notification settings - Fork 60
Add management command to manually migrate course_id field. #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ca3142 to
768c171
Compare
| batch_size = options['batch_size'] | ||
| sleep_time = options['sleep'] | ||
| queryset = Answer.objects.filter(course_key__isnull=True) | ||
| batch_count = (queryset.count() / batch_size) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This should do a floating point division with a ceiling operation instead - otherwise cases where the queryset length is divisible by the batch_size will result in one additional batch count than necessary. E.g.:
batch_count = int(math.ceil(queryset.count() / float(batch_size)))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdero Thanks, done!
|
This looks good @mtyaka - I just had one nit about the batch_count calculation, which you can choose to ignore if you want since I verified it won't cause an error either way. 👍
|
Normally the value of deprecated Answer.course_id will be copied into the new Answer.course_key field by a migration, but if your answer table is very large, you might want to use this management command to perform the copying in batches instead.
|
Rebased from 768c171 |
768c171 to
6b358f6
Compare
| batch_size = options['batch_size'] | ||
| sleep_time = options['sleep'] | ||
| queryset = Answer.objects.filter(course_key__isnull=True) | ||
| batch_count = int(math.ceil(queryset.count() / float(batch_size))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually do the equivalent of SELECT count(*)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, queryset.count() performs a SELECT count(*).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxrothman This is only executed once at the beginning of the command to be able to report progress. I don't know how large the table in edX production is, but I was assuming the count query won't take more than a minute or two and that it wasn't going to be a problem since it's only executed once at the beginning of command. However I don't have much experience in this area so I may be underestimating the time the count query takes or I am missing something else.
We can remove the count query and keep fetching batches until the query returns no results - in that case we will not be able to report "Processed batch i of N", but can only print "Processed batch i".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtyaka Can we just avoid counting how many batches there are, and run the batches one by one in a while loop until the batch happens to be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradenmacdonald Sure, I opened a PR to do that at #145, but I am away until March 5th so if this needs to land before then somebody else will have to take care of merging & releasing this.
|
Thanks! This looks good. |
Normally the value of deprecated
Answer.course_idwill be copied into the newAnswer.course_keyfield by a migration, but if your answer table is very large, you might want to use this management command to perform the copying in batches instead.The new management command added by this patch lets you copy the column in batches, with a configurable batch size and sleep time between each batch.
Discussion: https://github.com/edx/edx-platform/pull/14327#issuecomment-280043022
Testing instructions:
problem_builder_answertable.NULLvalues in thecourse_keyfield, so you'll have to temporarily modify the table to be able to nullify the values via mysql console:course_keyfield of all rows in the table is empty (SELECT * FROM problem_builder_answer).course_idhave been copied intocourse_key(SELECT * FROM problem_builder_answer).--batch-sizeand--sleepvalues.