-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[BD-32] feat: add first batch of Open edX Filters #29449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the pull request, @mariajgrimaldi! I've created BLENDED-1027 to keep track of it in Jira. More details are on the BD-32 project page. When this pull request is ready, tag your edX technical lead. |
|
Hi there! I hope you're all well! I'm tagging you here since this PR is part of the implementation plan of BD-32 or Hooks Extensions Framework. Please, let us know what you think. @felipemontoya @feanil @ormsbee @jmbowman |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the comment about removing the password.
I can see reasons to leave the password in, for example auditing the security by calling an API at have i been pwned, but that is a small possibility/feature at the cost of opening up a big hole in the security since other plugins could use the password for the wrong reasons.
In any case it would be easy to copy the data dict and remove anything in the list of censored strings (https://github.com/edx/edx-platform/blob/4f4be6538ae008a6d65cd7cc49d776aebfa9e40a/common/djangoapps/track/middleware.py#L64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little ambivalent on this. Anything running with this level of permission can already do anything they want, security wise. So this is to prevent people accidentally doing horribly unsecure things. That being said, I can think of other fields that edX itself would also consider extremely sensitive, and not want filters to operate on (like certain demographics information).
In any case, I think it's fine to block these off for now, and then we can always add them back in later. I'm imagining that the LMS would always pass through the entire form data, and the PreRegisterFilter implementation would decide how much to pass on to the individual filters or not? And then if we really want to get fancy down the road, we could make that configurable via filter-level configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my concern, what developers could do with the form_data -mainly unintentionally-. I took your advice and implemented within the PreEnrollmentFilter class a way of extracting sensitive data -there's a lot of room for improvement-. Here it is: link to Open edX Filter PR #20
We could add more sensitive_form_data in a Django setting to make it more extensible. What do you think? Either way, I like this approach a lot.
ormsbee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of questions, and a few requests for additional tests.
common/djangoapps/student/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to log some of these parameters (user/course_key/mode) before running the filter? The reason I ask is that there are places where CourseEnrollmentException is caught and the message is never used or inspected. From what I can tell, the pipeline will log which step failed and echo out the exception, but that won't necessarily have the inputs that caused it to fail. I guess the main thing I want to understand is: when one of these filters does something wrong with a really unhelpful message like NoneType is not iterable, will we have enough information in the logs to understand how to reproduce the issue?
FWIW, I think that making EnrollmentNotAllowed a subclass of CourseEnrollmentException, and then doing this bit of except/raise here is a great idea. It maintains backwards compatibility in a lightweight and very self-contained way. Kudos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, the pipeline will log which step failed and echo out the exception, but that won't necessarily have the inputs that caused it to fail.
You're right. Let's test those scenarios:
Case 1. The developer raises PreventEnrollment from its filter step:
With this config:
OPEN_EDX_FILTERS_CONFIG = {
"org.openedx.learning.student.login.requested.v1": {
"fail_silently": False,
"pipeline": [
"openedx_filters_samples.samples.pipeline.ModifyUserProfileBeforeLogin",
"openedx_filters_samples.samples.pipeline.StopLogin"
]
},
}
When trying to login:
Case 2. The filter step fails and raises NoneType error and fail_silently is False
It seems like too little and scarce information, that's why we've been working on a better way for logging this type of errors. For example:
Using this configuration:
OPEN_EDX_FILTERS_CONFIG = {
"org.openedx.learning.student.login.requested.v1": {
"fail_silently": False,
"pipeline": [
"openedx_filters_samples.samples.pipeline.ModifyUserProfileBeforeLogin",
"openedx_filters_samples.samples.pipeline.StopLogin"
],
"log_level": "info"/"debug"
},
}
Case 1.
Case 2.
I've been implementing this logging strategy for a later release in this PR: https://github.com/eduNEXT/openedx-filters/pull/17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see that you folks are already tackling this issue. 😄 Thanks for pointing me to that PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AuthFailedError defines a number of fields. Is constructing it like this sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not clear why we're using the pattern of using str(exc) as the message for these exceptions. Is the idea that we'd lose the original source exception information otherwise? Or do we expect that the exception message might actually be bubbled up to users directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I added more info! Now, the
PreLoginExceptionis looking like this and our filter step (in our samples plugin), like this -
We -as it is right now, this can change if necessary- expect the developer to describe the error in a friendly and helpful way in the exception message to show it to the end-user. This way, we define just one helpful message, and the error information is logged in for the developer to see. What do you think it's best? Maybe we could use a
developer_messageanduser_message🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it do the right thing when this configuration is specified but the pipeline is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we have a test with more than one step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, it behaves like a noop. I'll attach some tests:
- Unit-test from openedx-filters: test_run_empty_pipeline
- On my dev environment:
Using this configuration ->
OPEN_EDX_FILTERS_CONFIG = {
"org.openedx.learning.student.login.requested.v1": {
"fail_silently": False,
"pipeline": [],
"log_level": "debug",
},
}
- Of course! I'll try that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please put in a test that returns this successfully when the filter is not running? That way, we can be absolutely sure that it's the filter causing the 400 error, and not that data is somehow being malformed in the test itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little ambivalent on this. Anything running with this level of permission can already do anything they want, security wise. So this is to prevent people accidentally doing horribly unsecure things. That being said, I can think of other fields that edX itself would also consider extremely sensitive, and not want filters to operate on (like certain demographics information).
In any case, I think it's fine to block these off for now, and then we can always add them back in later. I'm imagining that the LMS would always pass through the entire form data, and the PreRegisterFilter implementation would decide how much to pass on to the individual filters or not? And then if we really want to get fancy down the road, we could make that configurable via filter-level configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned about how we maintain the contract here with form_data, since it seems like the LMS is free to remove a field at any time without the filter needing to be updated, which would potentially cause filters to break downstream. I'm not sure how to address that given the optional nature of a lot of form data. I'm not going to block this PR on it, since I think it's more important to get this out for people to try, and it's better than what exists now. But I am curious what your thoughts on it are these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formally, we haven't had any discussion about this specific case. But what I think is, why don't we take a screenshot of the form_data contents as they are right now? And use that as v1. Then, if the form's contents change, we update the filter.
What do you think? @ormsbee @felipemontoya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this scenario, i.e. if you bump up the version of the filter from v1 to v2, how do those two interoperate? Do they interoperate?
What I mean is, if you bump up a filter to v2, will support be immediately dropped for v1? If not, then how do you work in a mixed environment of some v1 filters and some v2 filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the Open edX filters naming and versioning ADR, they must interoperate:
- Open edX core developers wanting to deprecate and eventually remove filters.
This is what I imagine it'd be:
# data.keys -> ["username", "email", "name", "password", "mailing_address"]
data = PreRegisterFilter.run(form_data=data) # Filter expects data.keys -> ["username", "email", "name"]
data = PreRegisterFilterV2.run(form_data=data) # Filter expects data.keys -> ["username", "email"]
data = PreRegisterFilterV3.run(form_data=data) # Filter expects data.keys -> ["username"]
For each filter, a different configuration:
OPEN_EDX_FILTERS_CONFIG = {
"org.openedx.learning.course.enrollment.started.v1": {
"fail_silently": False,
"pipeline": [
"openedx_filters_samples.samples.pipeline.ModifyModeBeforeEnrollment",
"openedx_filters_samples.samples.pipeline.StopEnrollment"
]
},
"org.openedx.learning.course.enrollment.started.v2": {
"fail_silently": False,
"pipeline": [
"openedx_filters_samples.samples.pipeline.ModifyModeBeforeEnrollmentV2",
"openedx_filters_samples.samples.pipeline.StopEnrollmentV2"
]
},
"org.openedx.learning.course.enrollment.started.v3": {
"fail_silently": False,
"pipeline": [
"openedx_filters_samples.samples.pipeline.ModifyModeBeforeEnrollmentV3",
"openedx_filters_samples.samples.pipeline.StopEnrollmentV3"
]
},
}
Filters, by definition, must return what they receive (the whole data.keys), but they'll be able to use just the subset that the filter defines. I imagine this evolution happening because we decide that the filter can't modify some fields 🤔
This is just me throwing some ideas. I'm thrilled to hear what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formally, we haven't had any discussion about this specific case. But what I think is, why don't we take a screenshot of the form_data contents as they are right now? And use that as v1. Then, if the form's contents change, we update the filter.
So we make the current fields a part of the filter's contract, and if edx-platform changes the field names for some reason, we do the translation at the filter level so that any pipeline steps that have been written are unaffected? That sounds good to me. The only question I'd have then is how we detect a break in the contract from edx-platform testing–i.e. if I change a field name right now, will it automatically get caught by a filter-related test breaking in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the sentiment that we should not block the PR on the issue of maintaining the contract that form_data provides. I'm actually going to propose the counter argument. Should we at all try lock in the keys in form_data dict and maintain them? I think not.
There is always going to be some degree of flexibility due to the optional fields which will make this very difficult if at all possible. From my perspective, the Filter at the student registration is the key piece of extension that allows any sort of registration flow to be handled. A developer could now remove or dramatically alter the form at the registration page and send any sort of info, which the filter will receive and "normalize" into something that the core platform can use to create the final user.
E.g:
- a custom MFE for registration gathers names in spanish customs (2 names, two last names), national id, and phone (I've seen this kind of requirement many times).
- the filter processes this dict and creates a completely new
form_datathat is consistent with the openedx user models and views. Fullname, meta fields for ID and all that. Creates a password since it was not included and emails it to the user (horrible experience, but I have also have had this requested from us).
Filters, specially one like the filter before a student registers, are going to have some risk of breaking since we are passing in memory objects to them. The same would be true for a filter that receives a user object or enrollment object. Eventually it will have to deal with certain methods of the object changing. I think we have to embrace that and learn to write filters that are delicate in the way the handle the live data that is passed to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm on the side of not enforcing the form_data contract, it's up to filter developers to make sure their filters return sane data to the platform (it's similar to Django plugin apps - we have more freedom, but a plugin can easily break the LMS if implemented wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee has any of this conversation swayed you to not fixing in place the content of form_data?
This is probably the last outstanding request to change the current form of this PR. If the issue is not blocking for you, would you agree that we go ahead with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felipemontoya: Yeah, it's not a blocker.
f5cdec0 to
6c84064
Compare
|
Hi there @ormsbee! Sorry for the delayed response, I was finishing some sprint tasks 😃 I'll be answering all your questions today! Thanks for the patience 🤘 |
1ba05d3 to
c27f00f
Compare
|
@felipemontoya I'd like to review this, but I'll only be available early next year due to end of year holidays. Don't block merging this on me. |
|
Hi there! I have some updates: We now have a beta release of Open edX Filters (v0.4.2), equipped with login, registration, and enrollment filters ready to use, as we see in this PR 😄 and in our samples repository. Now, this is what's left to address within this integration (non-blocking):
As I see it, we don't have any other blocking issue to solve. Do you think of any? What do you think is the status of this PR? |
8dc922c to
cbfdaa5
Compare
|
I spent a good while testing this and trying to make all sort behavior changes using filters. At this moment I'm very happy with the result. Thanks @mariajgrimaldi great work. After fixing tests and probably squashing the ~22 commits I'd say we are ready to merge. However, and this is not blocking for me, we could make the filters respond a little better after the filter is run and the platform code takes back control.
In the case of My greatest priority right now is to get this out and start gathering feedback so again, I would not block on any of the above. |
c1e4058 to
6cf20fe
Compare
6cf20fe to
94490c5
Compare
|
Your PR has finished running tests. There were no failures. |
94490c5 to
f21290a
Compare
|
The filters worked for me. |
|
@mariajgrimaldi @felipemontoya Awesome work on this! I'm looking forward to seeing this in a named release in the near future. 🚀 👍 I have one question/suggestion: |
|
Hi @giovannicimolin, thanks a lot for the review.
One of the key aspects of the filters architecture is that as a developer, you are easily capable of testing the code you are writting. This force-pushes the definition of the filters to the external library so that it is easy to import in your third party code to test. We thought we could extend this to the pipeline implementation as well, since this would allow us to have it available both in core-platform tests and plugin tests. |
|
Thank you @mariajgrimaldi @felipemontoya for this effort. I've tried it and it works like described. Very nice. I have a couple of questions though: 1- 2- The pipeline fails silently if something is not right with configuration. For example: I forgot to install my pre-registration filter package! the pipeline fails with a |
|
Hi there @shadinaif. Thank you so much for the review and the fresh insight!
|
For consistency I think if |
|
Thank you @mariajgrimaldi and @felipemontoya for the quick response! For the first point; I think the above scenario can be solved by having the critical filter run before the non-critical one. But this might not always solve the problem (order wise) If the above makes any sense to you; then I have a suggestion: @staticmethod
def get_fail_silently():
return FalseThis way, the default is raising exceptions; if the developer sees the filter as safe-to-fail, then an override of the method is needed. The inherited method can also read from customized configs; which is awesome Again; the above is just a suggestion it can be a future enhancement upon your vision (if it makes sense) In the end; this PR is good to go 🚀 and many thanks for your efforts 👍🏼 |
a48f647 to
dfeab79
Compare
|
Hi there @shadinaif! I just published a new openedx-filters version to Pypi that works as you suggested. Can you help me test it? 😄 @ormsbee can you help us with a review so we can merge? 🥇 |
|
Hi @mariajgrimaldi , it's awesome, works like a charm. Thank you! |
* Add PreEnrollmentFilter * Add PreRegisterFilter * Add PreLoginFilter
dfeab79 to
f29a4ee
Compare
|
@mariajgrimaldi 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
|
Thanks a lot to everyone for all the reviews, comments and all the work poured into this initiative. Thanks to @mariajgrimaldi for all the revisions, updates and rebases. |
|
EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production. |
|
EdX Release Notice: This PR has been deployed to the production environment. |
|
@mariajgrimaldi, @felipemontoya: Congrats folks! Would one of you be up for posting a quick update about the recent Hooks work in the forums? I think that a lot of folks would be interested in filters being added and the openedx-events + message bus discussions in https://github.com/eduNEXT/openedx-events/issues/38 and https://github.com/eduNEXT/openedx-events/issues/39. I can do it if you don't have time, but I feel like you folks should take your bows for this. Please also put a mention of it in the release page for Nutmeg. Thank you! |
* Add PreEnrollmentFilter * Add PreRegisterFilter * Add PreLoginFilter For more info: openedx/openedx-platform#29449 Some events that were already on the platform were also added: * Add COURSE_ENROLLMENT_CHANGED: sent after the enrollment update * Add COURSE_ENROLLMENT_CREATED event after the user's enrollment creation * Add COURSE_UNENROLLMENT_COMPLETED: sent after the user's unenrollment For more info: openedx/openedx-platform#28266 openedx/openedx-platform#28640
* Add PreEnrollmentFilter * Add PreRegisterFilter * Add PreLoginFilter For more info: openedx/openedx-platform#29449 Some events that were already on the platform were also added: * Add COURSE_ENROLLMENT_CHANGED: sent after the enrollment update * Add COURSE_ENROLLMENT_CREATED event after the user's enrollment creation * Add COURSE_UNENROLLMENT_COMPLETED: sent after the user's unenrollment For more info: openedx/openedx-platform#28266 openedx/openedx-platform#28640








Description
This PR adds the 1st Open edX Filters batch as part of the implementation plan of Hooks Extension Framework:
Supporting information
ADR(s) on:
Testing instructions
With this configuration, you won't be able to enroll, register or login. To change the behavior you can replace
Stop<process>withNoopFilteror remove it completely.Notes about the filters steps:
-modifiedSimple and straightforward implementations as a way of illustrating how they work. More complex steps are coming.
Deadline
None
Other information
Concerns
request.datacontaining the registration form information, including the user's password, which may result in a security leak, right? What can we do to avoid sending the password? maybe removing it from form_data?