-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Block completion model #16047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block completion model #16047
Conversation
|
Thanks for the pull request, @jcdyer! It looks like you're a member of a company that does contract work for edX. If you're doing this work as part of a paid contract with edX, you should talk to edX about who will review this pull request. If this work is not part of a paid contract with edX, then you should ensure that there is an OSPR issue to track this work in JIRA, so that we don't lose track of your pull request. |
bradenmacdonald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcdyer Looks good and the migration works - had a few comments for you though.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_or_create is actually a standard method on the default Manager already, and this is shadowing that base class method. Can you just use that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has two customizations over the standard create_or_update, one of which is important, and the other which is just a convenience:
Important: It only updates the entry when the existing entry has changed its completion value. This prevents the modified timestamp from getting updated when it shouldn't, which will prove important for determining when to perform a recalculation of aggregators.
Convenience: The caller is not required to pass a block type, because it can be inferred from the block key.
I agree it's not great to shadow create_or_update. I could go one of two ways with this:
- Eliminate the convenience feature, so this has the same behavior as the super() version, but still performs the check that values have changed before updating.
- Rename this (maybe
submit_completion) and keep the convenience function. I'm tempted to go with 2, to make it clear that we're working with a custom API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for renaming - I think it's better not to shadow methods and change the behavior :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently allows True and False through without error - does that matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This is only going to be applied to FloatFields. None is a possible value in a float field, but if you pass store true or false in a FloatField, they will just get stored as 1.0 and 0.0, (because bool is a subclass of float). In fact, there's really no reason to test 'hi'. A (slightly) more interesting question would be how the validator handles float('inf') and float('nan').
lms/envs/common.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we actually move this into openedx.core.djangoapps per the README notes at https://github.com/edx/edx-platform/tree/master/openedx ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the new openedx.features directory? I think that might be a good fit for it (though it's existence is probably newer than the last time the README was updated. @robrap might have an opinion on that. I believe he's been involved in some work on improving the way the edx-platform code base is organized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason when running migrations and just running the devstack LMS in #16112 I'm seeing:
/edx/app/edxapp/edx-platform/lms/djangoapps/completion/models.py:94: RemovedInDjango19Warning: Model class lms.djangoapps.completion.models.BlockCompletion doesn't declare an explicit app_label and either isn't in an application in INSTALLED_APPS or else was imported before its application was loaded. This will no longer be supported in Django 1.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unrelated, but John Eskew passed on the following details around setting up an AppConfig while working on his Django upgrade work. It came up in the context of importing signals. But, the following quote can be found in these docs:
New applications should avoid default_app_config. Instead they should require the dotted path to the appropriate AppConfig subclass to be configured explicitly in INSTALLED_APPS.
John wrote:
Here's the commit from my branch:
edx@04d662cand here's the Django docs:
https://docs.djangoproject.com/en/1.11/ref/applications/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robrap, @bradenmacdonald: I added an AppConfig, and squashed my commits. Assuming tests pass, this should be ready to merge, but I no longer have merge permissions on edx-platform.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also include a composite index (index_together) on user, course_key? I imagine that retrieving all the data for a particular user in a particular course will be a common use case. But we can also wait to optimize/index later based on actual usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need. MySQL can use the prefix of an index for indexing. One of my favorite resources on how indexes work is this five question quiz. After you finish the quiz, it gives very clear explanations for why the answers are right or wrong, which makes it a great learning resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we will need to get the indexes right before this goes to production, because this table is going to get very large, and once it does, unless I'm mistaken about how MySQL does it, adding an index will be a prohibitively slow operation.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove block_type from this unique_together, since it's derived from block_key, and if we ever tried to save two entries that differed only in their block_type value, we'd want that to raise an error since that's always an invalid case. With this constraint as written, such a situation would be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, perhaps only user, block_key should be the unique constraint, since the course_key is usually derived from the block_key as well. (Not sure exactly how that works with old/draft mongo though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that I should keep the unique index precise. It will need to be (user, course_key, block_key) to handle old mongo. The other indexes will not need to be unique (but could be, in some cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block type still needs to be included in the composite indexes. Even though it's derived from block_key, it will be needed to create filtered separately, in order to create efficient aggregations that filter by block case (to meet edX's watched-videos need).`
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work as an index to retrieve any of the following:
- all completions in a course
- all completions of a particular type in a course
- all completions of a particular type for a given user in a course
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work as an index on:
- all completions for a given user
- all completions in a course for a given user
- the specific completion for a particular block for a particular user.
Things that aren't indexed:
- All completions for a particular block across all users
- All completions modified since a given time at any granularity. For aggregating smartly, do we need to add a
(user, course, modified)key? What about(course, modified)? I think we might need the former, but not the latter, because I think we will always need to compare the timestamp against the specific user's aggregations to know whether to update them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a naive question, but are block key's globally unique, and would one need to know to use the course key to make use of this index for the block key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe block keys are globally unique for split mongo, but they drop the run value from the course key in old-mongo, so if the same block exists in two runs of the same course (highly likely), then they need to be disambiguated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to include an index for recently modified values and to rename the validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we agreed that we would use "percent" rather than "ratio" in the spec? Is there a strong case for the need for inconsistency?
One additional argument for "percent", which I know you don't love, is that it is precisely the fact that this ratio represents a percent that provides the rules for what are valid and invalid ratios.
|
jenkins run all |
|
@robrap Do you (or the Taming the Monolith team) have an opinion on where this should live?
|
|
@jcdyer: At this point, it probably should stay where it is in lms/djangoapps/completion. |
|
@robrap: It has never seen the light of the master branch, so "at this point" is probably the best time to move it, if it should be moved. (Unless you mean "at this point" in the monolith taming process) |
robrap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some import comments. Sorry I comment on something different each time in. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we try to import more explicitly, like AutoCreatedField, but maybe not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the migrations file. I think it's better to leave that in its autogenerated state, so that you can regenerate it as needed without having to reformat every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's excluded from linting for that reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Sorry. My brain misfired.
|
Yes. I mean at this point in the taming process. |
|
@bradenmacdonald This is ready for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very minor, but in the API design, the name of this model, and some methods, we use the term completion, though here and as the final argument to submit_completion you're using completed. Is it worth making those consistent now while we have the chance? Or is there a distinction I may have missed that explains the difference?
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for renaming - I think it's better not to shadow methods and change the behavior :)
bradenmacdonald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @jcdyer Thanks for those changes! I have one nitty comment which is optional. This looks great to me overall. The indexes that we choose to use may change a bit as this evolves, but I think this is a great start.
- I tested this: verified that the migrations work
- I read through the code
- I checked for accessibility issues: n/a
- Includes documentation: docstrings.
|
Thanks @bradenmacdonald! @robrap This is ready for another round of review. |
robrap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments and questions.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change needed. I added a comment on slack around naming conventions in case you are interested.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was block_type intentionally or unintentionally left out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentionally. The information is already visible to a human reader in block_key.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine your comments about the indices living in the code. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful information that I think might not be as widely known as it should be, but it's fundamental to how indexes work, and is by no means unique to this particular model (the indexes on the grades models were designed with this in mind, for certain. I would think a wiki page, or other training documentation would be a better place for it.
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a note somewhere about the purpose of block_type in the model and its relation to upcoming "aggregation models"?
lms/djangoapps/completion/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Possible repeat comment, but I don't see what Github did with mine.]
Checking if lack of block_type is intentional or unintentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I responded to the other version of this comment. Intentional.
dcdcc0f to
0c96659
Compare
|
I know this PR has already been approved, but I'm thinking about data storage, since this is going to be a big table. Since we have to have the course key for old mongo, and we're including the block type for indexing purposes, does it make sense not to store the full block key separately, and only store the block id? Then, instead of duplicating the data, we're just decomposing the block key into its constituent parts. And we can add a property to return a reconstructed block key whenever needed. I've been up since 4am, so I could easily be missing something, but I don't think this would cause any problems in terms of retrieval or indexing, as long as we update the indexes appropriately. The signature of |
Tempting, but that seems to go against the principle of treating opaque keys as actually opaque. It closes the door to future types of keys that have different or additional fields. If the concern is data size, I would also be tempted to suggest normalizing the data by creating a separate |
2e28925 to
5d9ddfe
Compare
|
@robrap This is squashed and ready to merge. I no longer have merge permission, so I'll leave that to you. |
|
@robrap @bradenmacdonald Can one of you merge this? It's got two approvals, and I believe it is ready to go. |
|
I'm going to let @bradenmacdonald manage all of these PRs if that's ok. I also might be replaced as reviewer at some point, but I'll let you know if that is the case. |
|
@jcdyer @robrap I just had one last thought before we merge: the "Everything About Database Migrations" doc mentions to ask: "Should the primary key be a bigint (more than 4b rows)?" I don't know what usage numbers are realistic for the coming few years, but if we imagine 10 million learners with 3 courses each and 100 blocks per course, that's 3 billion - do you think we should consider a bigint primary key now? Let me know your thoughts and then I'll merge. |
|
@bradenmacdonald Yep. I think that's a good idea. I'll add a commit. |
|
It looks like edX has never actually created a BigInt primary key field, and it wasn't quite as straightforward as |
|
@jcdyer There is at least one existing use of a bigint: https://github.com/edx/edx-platform/blob/9c4869c1d59899a1e4a0ed2bf9cf92c77acc4447/lms/djangoapps/coursewarehistoryextended/models.py#L36 I don't see the new code yet so I assume you're still working on it - let me know when it's ready for a spin. @feanil Quick check: any concerns or advice from devops re merging code with a new DB table that uses a bigint primary key? |
|
Oh good find. I did a grep for BigInteger (but apparently not BigInt) and only found one use (which wasn't a key), and then went looking for general purpose model-utility submodules. I think you're right though. Makes sense to stick with the future-compatible version. |
|
Hmm. I thought I committed it. Maybe I was on the wrong branch. I'll investigate in the morning. |
|
@bradenmacdonald Ready for review. |
|
FYI: @doctoryes: Thought you might be interested in the BigAutoField discussion and code as you work on the Django upgrades. |
openedx/core/djangolib/fields.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jcdyer ! Questions:
- Will this work with sqlite3, which is typically used in unittests?
- Is the Django version unsigned or signed? Unsigned makes more sense here, of course.
- Post-Django-upgrade, the initial migration will still point to this file's def. Do you advise just adding an import of BigAutoField to the top of this file post-upgrade?
- What's the back-of-envelope calculation for when edx.org will get > 4.2 billion of these completions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doctoryes Good point. BIGINT is allowed for sqlite, but the current combination is causing the python test suite to fail with this sqlite error:
OperationalError: AUTOINCREMENT is only allowed on an INTEGER PRIMARY KEY
The django BigAutoField is signed. It uses bigint AUTO_INCREMENT on MySQL and integer AUTOINCREMENT on sqlite which works around the above issue, since sqlite uses 64-bit primary keys in any case and allows up to 8 byte ints in any column anyways regardless of its declared type.
@jcdyer I'll give a final review once the tests are fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like @doctoryes has asked most of the questions I would care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @doctoryes!
The back-of-napkin calculation that @bradenmacdonald came up with was 10M users taking 3 courses eaches with 100 completable blocks per course = 3B entries. It will probably take longer than that to get there for a couple reasons:
- We won't (and can't) backfill this information.
- We aren't currently planning to fill empty records for all blocks when a user enrolls in a course so never-visited blocks are non-records (but this could change).
So we aren't likely to actually hit the cap for significantly longer if at all, but it's close enough to the right ballpark that I think a better-safe-than-sorry policy is warranted.
lms/djangoapps/completion/apps.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on the app configuration! Is this pass needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
26a3c9b to
8149a26
Compare
|
@bradenmacdonald @robrap This is ready for a (hopefully) final look before I squash commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this import in the migration the same as the one in models.py? Otherwise this will presumably generate a spurious and invalid migration when we upgrade django since the model will no longer match the migration history.
try:
from django.models import BigAutoField # New in django 1.10
except ImportError:
from openedx.core.djangolib.fields import BigAutoField|
Had one comment about the migration field import, but otherwise still 👍 from me.
|
|
@jcdyer Please let me know your thoughts on that import in the migration and squash this, then I'll merge :) |
cd9b226 to
6f56da5
Compare
|
@bradenmacdonald Migration updated and commits squashed. The only change is to the imports in migrations/0001_initial.py |
* Includes custom manager. * Includes percent validation. * Includes useful indices. * Subclasses TimeStampedModel OC-3086
6f56da5 to
2207ef7
Compare
2207ef7 to
4ab64f7
Compare
|
jenkins run bokchoy |
|
EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production on Thursday, October 19, 2017. |
|
EdX Release Notice: This PR has been deployed to the production environment. |
|
EdX Release Notice: This PR has been rolled back from the production environment. |
OC-3086
Add a model to track completion of individual blocks.
Reviewers
Testing instructions
$ ./manage.py lms --settings=devstack migrate completion
$ ./manage.py lms --settings=devstack migrate completion zero
$ ./manage.py lms --settings=devstack migrate completion