Block completion model #16047

jcdyer · 2017-09-19T16:10:15Z

OC-3086

Add a model to track completion of individual blocks.

Reviewers

Initial (opencraft) reviewer: @bradenmacdonald
edX reviewer: @robrap

Testing instructions

Test creating and modifying BlockCompletion objects in the django shell:

$ ./manage.py lms --settings=devstack shell
...
>>> from lms.djangoapps.completion.models import BlockCompletion
>>> from django.contrib.auth.models import User
>>> from opaque_keys.edx.keys import CourseKey, UsageKey
>>> BlockCompletion.objects.submit_completion(user=User.objects.get(...), course_key=CourseKey.from_string('...'), block_key=UsageKey.from_string('...'), completed=0.5)
>>>

Ensure that values have to be between 0.0 and 1.0, and that duplicate completions aren't created for a given user/course/block.
Test migrations up and down.
$ ./manage.py lms --settings=devstack migrate completion
$ ./manage.py lms --settings=devstack migrate completion zero
$ ./manage.py lms --settings=devstack migrate completion

openedx-webhooks · 2017-09-19T16:10:27Z

Thanks for the pull request, @jcdyer! It looks like you're a member of a company that does contract work for edX. If you're doing this work as part of a paid contract with edX, you should talk to edX about who will review this pull request. If this work is not part of a paid contract with edX, then you should ensure that there is an OSPR issue to track this work in JIRA, so that we don't lose track of your pull request.

Create an OSPR issue for this pull request.

bradenmacdonald

@jcdyer Looks good and the migration works - had a few comments for you though.

bradenmacdonald · 2017-09-21T21:17:39Z

lms/djangoapps/completion/models.py

update_or_create is actually a standard method on the default Manager already, and this is shadowing that base class method. Can you just use that instead?

This has two customizations over the standard create_or_update, one of which is important, and the other which is just a convenience:

Important: It only updates the entry when the existing entry has changed its completion value. This prevents the modified timestamp from getting updated when it shouldn't, which will prove important for determining when to perform a recalculation of aggregators.

Convenience: The caller is not required to pass a block type, because it can be inferred from the block key.

I agree it's not great to shadow create_or_update. I could go one of two ways with this:

Eliminate the convenience feature, so this has the same behavior as the super() version, but still performs the check that values have changed before updating.

Rename this (maybe submit_completion) and keep the convenience function. I'm tempted to go with 2, to make it clear that we're working with a custom API.

Got it. Thanks for renaming - I think it's better not to shadow methods and change the behavior :)

bradenmacdonald · 2017-09-21T21:22:09Z

lms/djangoapps/completion/tests/test_models.py

It currently allows True and False through without error - does that matter?

I don't think so. This is only going to be applied to FloatFields. None is a possible value in a float field, but if you pass store true or false in a FloatField, they will just get stored as 1.0 and 0.0, (because bool is a subclass of float). In fact, there's really no reason to test 'hi'. A (slightly) more interesting question would be how the validator handles float('inf') and float('nan').

bradenmacdonald · 2017-09-21T21:24:13Z

lms/envs/common.py

Should we actually move this into openedx.core.djangoapps per the README notes at https://github.com/edx/edx-platform/tree/master/openedx ?

Maybe in the new openedx.features directory? I think that might be a good fit for it (though it's existence is probably newer than the last time the README was updated. @robrap might have an opinion on that. I believe he's been involved in some work on improving the way the edx-platform code base is organized.

For some reason when running migrations and just running the devstack LMS in #16112 I'm seeing:

/edx/app/edxapp/edx-platform/lms/djangoapps/completion/models.py:94: RemovedInDjango19Warning: Model class lms.djangoapps.completion.models.BlockCompletion doesn't declare an explicit app_label and either isn't in an application in INSTALLED_APPS or else was imported before its application was loaded. This will no longer be supported in Django 1.9.

This may be unrelated, but John Eskew passed on the following details around setting up an AppConfig while working on his Django upgrade work. It came up in the context of importing signals. But, the following quote can be found in these docs:

New applications should avoid default_app_config. Instead they should require the dotted path to the appropriate AppConfig subclass to be configured explicitly in INSTALLED_APPS.

John wrote:

Here's the commit from my branch:
edx@04d662c

and here's the Django docs:
https://docs.djangoproject.com/en/1.11/ref/applications/

@robrap, @bradenmacdonald: I added an AppConfig, and squashed my commits. Assuming tests pass, this should be ready to merge, but I no longer have merge permissions on edx-platform.

bradenmacdonald · 2017-09-21T21:26:34Z

lms/djangoapps/completion/models.py

Should we also include a composite index (index_together) on user, course_key? I imagine that retrieving all the data for a particular user in a particular course will be a common use case. But we can also wait to optimize/index later based on actual usage.

No need. MySQL can use the prefix of an index for indexing. One of my favorite resources on how indexes work is this five question quiz. After you finish the quiz, it gives very clear explanations for why the answers are right or wrong, which makes it a great learning resource.

Also, I think we will need to get the indexes right before this goes to production, because this table is going to get very large, and once it does, unless I'm mistaken about how MySQL does it, adding an index will be a prohibitively slow operation.

bradenmacdonald · 2017-09-21T21:27:36Z

lms/djangoapps/completion/models.py

I think we should remove block_type from this unique_together, since it's derived from block_key, and if we ever tried to save two entries that differed only in their block_type value, we'd want that to raise an error since that's always an invalid case. With this constraint as written, such a situation would be accepted.

In fact, perhaps only user, block_key should be the unique constraint, since the course_key is usually derived from the block_key as well. (Not sure exactly how that works with old/draft mongo though.)

I think you're right that I should keep the unique index precise. It will need to be (user, course_key, block_key) to handle old mongo. The other indexes will not need to be unique (but could be, in some cases).

Block type still needs to be included in the composite indexes. Even though it's derived from block_key, it will be needed to create filtered separately, in order to create efficient aggregations that filter by block case (to meet edX's watched-videos need).`

jcdyer · 2017-09-22T15:46:01Z

lms/djangoapps/completion/models.py

This will work as an index to retrieve any of the following:

all completions in a course

all completions of a particular type in a course

all completions of a particular type for a given user in a course

jcdyer · 2017-09-22T15:52:17Z

lms/djangoapps/completion/models.py

This will work as an index on:

all completions for a given user

all completions in a course for a given user

the specific completion for a particular block for a particular user.

Things that aren't indexed:

All completions for a particular block across all users

All completions modified since a given time at any granularity. For aggregating smartly, do we need to add a (user, course, modified) key? What about (course, modified)? I think we might need the former, but not the latter, because I think we will always need to compare the timestamp against the specific user's aggregations to know whether to update them.

This might be a naive question, but are block key's globally unique, and would one need to know to use the course key to make use of this index for the block key?

I believe block keys are globally unique for split mongo, but they drop the run value from the course key in old-mongo, so if the same block exists in two runs of the same course (highly likely), then they need to be disambiguated.

Update to include an index for recently modified values and to rename the validator.

robrap · 2017-09-22T17:15:10Z

lms/djangoapps/completion/tests/test_models.py

I thought we agreed that we would use "percent" rather than "ratio" in the spec? Is there a strong case for the need for inconsistency?

One additional argument for "percent", which I know you don't love, is that it is precisely the fact that this ratio represents a percent that provides the rules for what are valid and invalid ratios.

jcdyer · 2017-09-22T18:16:47Z

jenkins run all

jcdyer · 2017-09-22T18:49:04Z

@robrap Do you (or the Taming the Monolith team) have an opinion on where this should live?

lms/djangoapps/completion
openedx/core/djangoapps/completion
openedx/features/completion
other

robrap · 2017-09-22T19:13:09Z

@jcdyer: At this point, it probably should stay where it is in lms/djangoapps/completion.

jcdyer · 2017-09-22T19:14:36Z

@robrap: It has never seen the light of the master branch, so "at this point" is probably the best time to move it, if it should be moved. (Unless you mean "at this point" in the monolith taming process)

robrap

Some import comments. Sorry I comment on something different each time in. :)

robrap · 2017-09-22T19:24:42Z

lms/djangoapps/completion/migrations/0001_initial.py

I thought we try to import more explicitly, like AutoCreatedField, but maybe not.

This is the migrations file. I think it's better to leave that in its autogenerated state, so that you can regenerate it as needed without having to reformat every time.

I believe it's excluded from linting for that reason.

Agreed. Sorry. My brain misfired.

robrap · 2017-09-22T19:26:58Z

Yes. I mean at this point in the taming process.

jcdyer · 2017-09-22T19:58:43Z

@bradenmacdonald This is ready for another round of review.

bradenmacdonald · 2017-09-23T03:12:54Z

lms/djangoapps/completion/migrations/0001_initial.py

This is very minor, but in the API design, the name of this model, and some methods, we use the term completion, though here and as the final argument to submit_completion you're using completed. Is it worth making those consistent now while we have the chance? Or is there a distinction I may have missed that explains the difference?

bradenmacdonald · 2017-09-23T03:16:40Z

lms/djangoapps/completion/models.py

Got it. Thanks for renaming - I think it's better not to shadow methods and change the behavior :)

bradenmacdonald

👍 @jcdyer Thanks for those changes! I have one nitty comment which is optional. This looks great to me overall. The indexes that we choose to use may change a bit as this evolves, but I think this is a great start.

I tested this: verified that the migrations work
I read through the code
I checked for accessibility issues: n/a
Includes documentation: docstrings.

jcdyer · 2017-09-25T12:05:42Z

Thanks @bradenmacdonald!

@robrap This is ready for another round of review.

robrap

Minor comments and questions.

robrap · 2017-09-25T14:03:30Z

lms/djangoapps/completion/models.py

No change needed. I added a comment on slack around naming conventions in case you are interested.

robrap · 2017-09-25T14:14:33Z

lms/djangoapps/completion/models.py

Was block_type intentionally or unintentionally left out?

Intentionally. The information is already visible to a human reader in block_key.

robrap · 2017-09-25T14:21:56Z

lms/djangoapps/completion/models.py

I can imagine your comments about the indices living in the code. Thoughts?

It's useful information that I think might not be as widely known as it should be, but it's fundamental to how indexes work, and is by no means unique to this particular model (the indexes on the grades models were designed with this in mind, for certain. I would think a wiki page, or other training documentation would be a better place for it.

robrap · 2017-09-25T14:53:48Z

lms/djangoapps/completion/models.py

Should we add a note somewhere about the purpose of block_type in the model and its relation to upcoming "aggregation models"?

robrap · 2017-09-25T16:09:39Z

lms/djangoapps/completion/models.py

[Possible repeat comment, but I don't see what Github did with mine.]
Checking if lack of block_type is intentional or unintentional.

I responded to the other version of this comment. Intentional.

jcdyer · 2017-09-28T11:01:02Z

@bradenmacdonald @robrap

I know this PR has already been approved, but I'm thinking about data storage, since this is going to be a big table. Since we have to have the course key for old mongo, and we're including the block type for indexing purposes, does it make sense not to store the full block key separately, and only store the block id? Then, instead of duplicating the data, we're just decomposing the block key into its constituent parts. And we can add a property to return a reconstructed block key whenever needed.

I've been up since 4am, so I could easily be missing something, but I don't think this would cause any problems in terms of retrieval or indexing, as long as we update the indexes appropriately. The signature of submit_completion() would stay the same.

bradenmacdonald · 2017-09-28T15:56:09Z

@jcdyer

does it make sense not to store the full block key separately, and only store the block id?

Tempting, but that seems to go against the principle of treating opaque keys as actually opaque. It closes the door to future types of keys that have different or additional fields.

If the concern is data size, I would also be tempted to suggest normalizing the data by creating a separate BlockReference table with columns (course_key, block_key, block_type). Then this model would have two foreign keys (one to User, one to BlockReference) and a float, so the data size would be much smaller. However, that would remove the ability to create indexes that involve any of those columns together with the user ID (among other issues), so I suspect that we need to stick with the denormalized version for performance.

jcdyer · 2017-10-03T20:06:50Z

@robrap This is squashed and ready to merge. I no longer have merge permission, so I'll leave that to you.

jcdyer · 2017-10-10T19:43:45Z

@robrap @bradenmacdonald Can one of you merge this? It's got two approvals, and I believe it is ready to go.

robrap · 2017-10-10T19:58:56Z

I'm going to let @bradenmacdonald manage all of these PRs if that's ok. I also might be replaced as reviewer at some point, but I'll let you know if that is the case.

bradenmacdonald · 2017-10-10T21:26:13Z

@jcdyer @robrap I just had one last thought before we merge: the "Everything About Database Migrations" doc mentions to ask: "Should the primary key be a bigint (more than 4b rows)?"

I don't know what usage numbers are realistic for the coming few years, but if we imagine 10 million learners with 3 courses each and 100 blocks per course, that's 3 billion - do you think we should consider a bigint primary key now? Let me know your thoughts and then I'll merge.

jcdyer · 2017-10-11T00:44:34Z

@bradenmacdonald Yep. I think that's a good idea. I'll add a commit.

jcdyer · 2017-10-11T01:46:32Z

It looks like edX has never actually created a BigInt primary key field, and it wasn't quite as straightforward as models.BigIntegerField(primary_key=True), on account of the autoincrementing. I based my solution on this: https://stackoverflow.com/a/17035822/131084 and the implementation that is coming down the pipe in django 1.10 (https://docs.djangoproject.com/en/1.10/_modules/django/db/models/fields/#BigAutoField).

bradenmacdonald · 2017-10-11T20:07:21Z

@jcdyer There is at least one existing use of a bigint: https://github.com/edx/edx-platform/blob/9c4869c1d59899a1e4a0ed2bf9cf92c77acc4447/lms/djangoapps/coursewarehistoryextended/models.py#L36
I suppose you could reuse the custom field type it uses, though your approach that's compatible with the upcoming BigAutoField (the key difference is that it's signed) is probably better.

I don't see the new code yet so I assume you're still working on it - let me know when it's ready for a spin.

@feanil Quick check: any concerns or advice from devops re merging code with a new DB table that uses a bigint primary key?

jcdyer · 2017-10-12T00:48:34Z

Oh good find. I did a grep for BigInteger (but apparently not BigInt) and only found one use (which wasn't a key), and then went looking for general purpose model-utility submodules. I think you're right though. Makes sense to stick with the future-compatible version.

jcdyer · 2017-10-12T00:50:16Z

Hmm. I thought I committed it. Maybe I was on the wrong branch. I'll investigate in the morning.

robrap · 2017-10-12T11:51:51Z

Your changes are here: https://github.com/edx/edx-platform/pull/16186/files/2b9e9c5c1c8bd916de0a8a92b849e76ba3b8bce3..24979efb0f4ea1cb431504677745230600d6f588

jcdyer · 2017-10-12T14:31:26Z

@bradenmacdonald Ready for review.

robrap · 2017-10-12T14:44:16Z

FYI: @doctoryes: Thought you might be interested in the BigAutoField discussion and code as you work on the Django upgrades.

doctoryes · 2017-10-12T19:38:39Z

openedx/core/djangolib/fields.py

Hi @jcdyer ! Questions:

Will this work with sqlite3, which is typically used in unittests?

Is the Django version unsigned or signed? Unsigned makes more sense here, of course.

Post-Django-upgrade, the initial migration will still point to this file's def. Do you advise just adding an import of BigAutoField to the top of this file post-upgrade?

What's the back-of-envelope calculation for when edx.org will get > 4.2 billion of these completions?

@doctoryes Good point. BIGINT is allowed for sqlite, but the current combination is causing the python test suite to fail with this sqlite error:

OperationalError: AUTOINCREMENT is only allowed on an INTEGER PRIMARY KEY

The django BigAutoField is signed. It uses bigint AUTO_INCREMENT on MySQL and integer AUTOINCREMENT on sqlite which works around the above issue, since sqlite uses 64-bit primary keys in any case and allows up to 8 byte ints in any column anyways regardless of its declared type.

@jcdyer I'll give a final review once the tests are fixed.

Looks like @doctoryes has asked most of the questions I would care about.

Hi @doctoryes!

The back-of-napkin calculation that @bradenmacdonald came up with was 10M users taking 3 courses eaches with 100 completable blocks per course = 3B entries. It will probably take longer than that to get there for a couple reasons:

We won't (and can't) backfill this information.

We aren't currently planning to fill empty records for all blocks when a user enrolls in a course so never-visited blocks are non-records (but this could change).

So we aren't likely to actually hit the cap for significantly longer if at all, but it's close enough to the right ballpark that I think a better-safe-than-sorry policy is warranted.

doctoryes · 2017-10-12T19:42:09Z

lms/djangoapps/completion/apps.py

👍 on the app configuration! Is this pass needed?

jcdyer · 2017-10-13T19:29:45Z

@bradenmacdonald @robrap This is ready for a (hopefully) final look before I squash commits.

bradenmacdonald · 2017-10-13T20:25:31Z

lms/djangoapps/completion/migrations/0001_initial.py

Should we make this import in the migration the same as the one in models.py? Otherwise this will presumably generate a spurious and invalid migration when we upgrade django since the model will no longer match the migration history.

try: from django.models import BigAutoField # New in django 1.10 except ImportError: from openedx.core.djangolib.fields import BigAutoField

bradenmacdonald · 2017-10-13T20:49:07Z

Had one comment about the migration field import, but otherwise still 👍 from me.

I tested this: followed the test instructions with the latest version (had to change completed argument to completion for submit_completion), verified that MySQL is using bigint datatype for the completion_blockcompletion id column.
I read through the code
I checked for accessibility issues: n/a
Includes documentation: docstrings.

bradenmacdonald · 2017-10-16T22:26:10Z

@jcdyer Please let me know your thoughts on that import in the migration and squash this, then I'll merge :)

jcdyer · 2017-10-17T18:33:35Z

@bradenmacdonald Migration updated and commits squashed. The only change is to the imports in migrations/0001_initial.py

* Includes custom manager. * Includes percent validation. * Includes useful indices. * Subclasses TimeStampedModel OC-3086

bradenmacdonald · 2017-10-17T20:25:08Z

jenkins run bokchoy

edx-pipeline-bot · 2017-10-18T15:43:39Z

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production on Thursday, October 19, 2017.

edx-pipeline-bot · 2017-10-20T17:24:47Z

EdX Release Notice: This PR has been deployed to the production environment.

edx-pipeline-bot · 2017-10-20T19:00:24Z

EdX Release Notice: This PR has been rolled back from the production environment.

bradenmacdonald reviewed Sep 21, 2017

View reviewed changes

jcdyer commented Sep 22, 2017

View reviewed changes

robrap reviewed Sep 22, 2017

View reviewed changes

bradenmacdonald reviewed Sep 23, 2017

View reviewed changes

bradenmacdonald approved these changes Sep 23, 2017

View reviewed changes

robrap approved these changes Sep 25, 2017

View reviewed changes

jcdyer force-pushed the cliff/block-completion-model branch from dcdcc0f to 0c96659 Compare September 28, 2017 10:42

jcdyer force-pushed the cliff/block-completion-model branch 3 times, most recently from 2e28925 to 5d9ddfe Compare October 3, 2017 15:53

jcdyer mentioned this pull request Oct 3, 2017

Handle completion events #16112

Merged

4 tasks

doctoryes reviewed Oct 12, 2017

View reviewed changes

jcdyer force-pushed the cliff/block-completion-model branch from 26a3c9b to 8149a26 Compare October 13, 2017 14:10

bradenmacdonald reviewed Oct 13, 2017

View reviewed changes

jcdyer mentioned this pull request Oct 17, 2017

Complete default blocks #16234

Merged

4 tasks

jcdyer force-pushed the cliff/block-completion-model branch from cd9b226 to 6f56da5 Compare October 17, 2017 18:32

Introduce BlockCompletion model.

6f89157

* Includes custom manager. * Includes percent validation. * Includes useful indices. * Subclasses TimeStampedModel OC-3086

jcdyer force-pushed the cliff/block-completion-model branch from 6f56da5 to 2207ef7 Compare October 17, 2017 18:36

Add BigAutoField for BlockCompletion primary key.

4ab64f7

jcdyer force-pushed the cliff/block-completion-model branch from 2207ef7 to 4ab64f7 Compare October 17, 2017 18:46

bradenmacdonald merged commit d64e0b9 into openedx:master Oct 17, 2017

bradenmacdonald deleted the cliff/block-completion-model branch October 17, 2017 22:06

tomaszgy mentioned this pull request Nov 30, 2017

Add block completion value as optional field in course_blocks.api. #16674

Merged

3 tasks

Block completion model #16047

Block completion model #16047

Uh oh!

Conversation

jcdyer commented Sep 19, 2017 • edited by robrap Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewers

Testing instructions

Uh oh!

openedx-webhooks commented Sep 19, 2017

Uh oh!

bradenmacdonald left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer commented Sep 22, 2017

Uh oh!

jcdyer commented Sep 22, 2017

Uh oh!

robrap commented Sep 22, 2017

Uh oh!

jcdyer commented Sep 22, 2017

Uh oh!

robrap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcdyer Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

jcdyer commented Sep 19, 2017 •

edited by robrap

Loading

jcdyer Sep 22, 2017 •

edited

Loading

jcdyer Sep 22, 2017 •

edited

Loading

jcdyer Sep 22, 2017 •

edited

Loading

jcdyer Sep 28, 2017 •

edited

Loading