-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Shuffle feature for multiple choice questions #1499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
common/lib/capa/capa/capa_problem.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pushes fixed islands to the end. I think that's reasonable behavior, but you don't actually test in in the set of tests.
|
Other than the one missing test case, looks good to me. @nedbat: am I missing anything? |
|
I don't see a test that confirms that different seeds produce different results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to write out this if/else over four lines. The flow is hard to see here. Also, why not if (shuffle), why the inverted test?
I notice you dealt with the fixed flag and the shuffle flag differently, the code might be easier to follow if the same idiom were used.
|
Thanks for the quick responses Cale and Ned. I've made the following fixes: -I upgraded the rare "island" goes-to-tail case just to be the official behavior and tested it, although obviously it's kind of an oddball case. Leaving it undefined (K&R style sortof) the way I had was probably not ideal. I expanded the changelog doc to document this all. -I cleaned up the .coffee js if/else, and also I noticed that file write the ifs with no space before the (, so I did that too. -Added test for a different seed I just kept the one commit and used --force to put it back up. I'll ping @rocha to make sure he's ok with the analytics end. I think Jane is traveling today, so I expect she won't do the testing until next week. |
|
@nparlante It will be ideal if we can include any indication that the question was shuffled or not. IMHO we still have to figure a better way to deal with randomization in problems, since relying that all python implementations will produce the same random results with the same seed seems too fragile to me. The issues is larger than this Pull Request however. |
|
@rocha -- hmm, I can confirm that rng changes (code below), at least sometimes, across releases -- d'oh! Could you point me to what line makes that log entry per student submit? -- code to see if random num gen changes across releases ... it does! 3.2: [2, 0, 1, 3] |
|
I think that particular wrinkle is something we don't take into account in much/most of our analytics logs at the moment. In particular, I think we depend on using the seed to regenerate problems elsewhere in capa randomization too. |
|
Hmm, digging around in the python docs, the following section appears in 3.2.x but not in the 2.x series. It looks to me that this gives the reproducibility feature we want ... too bad it's a future version we're not using! It does mean that we can take the pain one time at 2.x to 3.x conversion time, then at least it will be solved and we can forget about it. http://docs.python.org/3.2/library/random.html#notes-on-reproducibility 8.6.1. Notes on Reproducibility |
|
So I've come to the view that the correct course here is to leave the logging as it is, for these reasons: -The most important bit -- their answer and its correctness -- is logged -I believe the current paradigm of the logging is that if you want to know something about the question, you load the question object and ask it directly. That all still works. Looking at the example log at the top of this thread ... we don't even log the number of options. The analytics code most load the question to orient itself about all that, which seems fine, as we don't want duplicate that data in the logs. If someone wants to do analytics about shuffle order affecting correctness, they will have to load the question and interrogate it, just like now. -There will be a problem doing analytics in a python 3.x world using data gathered in a 2.x world, but that's seems pretty minor as 3.x problems go. |
|
@nparlante thanks for the research. I think your arguments are valid. One more question, why is the shuffling done at Moving it to |
|
Hmm, I'm not sure what you mean about |MultipleChoiceResponse|. Heh, Here's what the xml looks like now: Do you mean to put the shuffle= up in multiplechoiceresponse, or do you On 10/29/13 12:15 PM, Carlos Andrés Rocha wrote:
|
|
@nparlante I meant the implementation. Sorry I was not clear.
We won't still have the emitted events indicate that the question was shuffled, but as you said earlier, that is a problem that affect other problem types as well. |
|
Good catch, @rocha. I agree, this randomization should be part of |
|
Sure, that file looks reasonable. Heh, it includes the following TODO I could override init and shuffle the tree in there somewhere, or I @rocha .. what do you think about where the shuffling should be done? On 10/29/13 1:36 PM, Calen Pennington wrote:
|
|
In general, we're only likely to be rendering once per request anyway, so I would lean towards putting it into |
|
@nparlante what @cpennington said :) Also, there is no need to add an additional |
|
Ok, get_html it is! @rocha for the randomization setting, , my understanding is that the randomize setting controls when the seed is set and reset, and essentially the seed feature is independent of the problem type. The shuffling is one sort of "randomization" which is particular to multiple choice problems, using the seed. Heh, I happen to know we're going to have other types or "randomization" since Jeff Ericson @jaericson, sitting across from me is working on another type randomization which I think will turn out to be orthogonal to shuffle. So anyway I think that's why it's right to have an independent "shuffle" attribute. |
|
+1 on having separate shuffle attribute--it is a separate use case. (randomize will control whether it's re-shuffled every time or not) |
|
@nparlante @shnayder makes sense! |
|
Great -- I was sort of 80% confident about the randomize thing, so it's good to hear that that makes sense. I think we've got a good consensus on what I need to do, so I'll ping you all when I get it moved over on this same PR. It seems like it's not too much work, but ahem ... well you never know! |
|
So I put the following code in to verify that render_html does not get called multiple times... And as you can guess, it turns out that assumption is false (I just ran test_lms which was the laziest thing I could think of). It's not like Cale was giving me some sort of gold plated guarantee on that prediction, but in the light of day, assuming that something never gets called twice was not the most reliable path. It's easy for stuff to get called and there's a lot of code paths, so it's just going to creep in, unless there's some sort of assert formally keeping it from happening. So anyway, it seems like is that the shuffle code should go in init or something in that neighborhood, so that's what I'm working on now. |
|
Ok, it's basically the same code, but now housed in responsetypes.py One difference is that here it was easy to do the shuffle early, so that the choice_0 ... numbering uses the shuffled version. As Calen points out above, this prevents us from leaking any info about what order the author chose, which seems desirable -- like the author can just write the choices in whatever order they like, and we have them covered. This means that analytics will need to create the problem/seed to decode what choice_0 means, since it doesn't even mean that it's the first choice in the authored-ordering any more. |
|
@nparlante one of the most commonly used analysis is answer distribution. How will the shuffling affect such analysis? For example, normally instructors want to know how many people answered option a, how many option b, etc. |
|
The current style in the capa space that I've seen is that the problem Cheers, Nick On 11/4/13 10:03 AM, Carlos Andrés Rocha wrote:
|
|
@nparlante i agree that for most capa problems, it is necesary to check the original problem to know what it means. If I understand correctly, with this patch it would be absolutely necessary to use the original problem and the seed to be able to recreate the distribution of answers, correct? |
|
On 11/4/13 1:39 PM, Carlos Andrés Rocha wrote:
That's correct: the raw choice_0 in the log stream will mean different My guess is that we should leave the log format as it is. The analytics
|
|
My sense is that some people working with the data may be working with the On Mon, Nov 4, 2013 at 2:56 PM, Nick Parlante notifications@github.comwrote:
|
|
I would say that log-only analysis is not formally supported right now, If we want to support log-only analytics, then I'll say there are two paths: (1) The easy way is that we record minimally what happened, but do not A variation of (1) I just thought of is we could log always using the (2) Record all of the state/setting of the problem in the log, which I think the costs of the (1) approaches above are pretty low, so I'm On 11/4/13 3:02 PM, jinpa wrote:
|
|
@sarina @cahrens But seriously, thanks so much for all your help, and especially this timely surge at the end to get this thing across the finish line. I'm sure I could have wrangled my end of it faster, but I also have a sense that this is really kind of a complicated feature, so tons of review was maybe ultimately the right thing. I'm thinking of the masking ... jeez that spans a bunch of conceptual levels. |
|
@rocha @mulby Hey Carlos and Gabe -- this should be ready to go for any last testing. My hope is to get it in the next .org release. It would be great actually to see some testing of the analytics data downstream. I've tracked it through capa_base basically, and have a few tests there, but I don't have much visibility of what happens after track_function() gets called. You can add shuffle="true" in a multiple choice question, and the masking will be active. Things to look for downstream:
Obviously you should never see mask_xx anywhere. I suppose you could just grep for it! |
|
Hi @nparlante. Thanks for such amazing work! We are starting the release of answer distributions computed through our pipeline next week. Will you be ok if to wait a little bit more? before merging this PR? Our plan is to run a test with the new code, but we won't be able to do so until next week. |
|
@rocha Hey Carlos, thanks I think that should be fine. We don't want to slip too much of course, but what you describe sounds fine. And of course getting some thorough testing sounds great. |
|
We ran a test of this through the answer distribution pipeline, and it looks good. |
|
@nparlante It looks like there are still many suggestions from Sarina that haven't been addressed. But looking at the output of "problem_check" events emitted by Capa, the right values are output and they do work with our answer distributions. I tested both shuffle and answer-pool options. As a sanity check, I also looked at other events generated by my testing. As might be expected, browser and implicit events will contain the "mask_N" values, as do the state that is stored in courseware_studentmodule. Looking at save_problem_success events, the event/state/student_answers/input_id value is unmasked (e.g. "choice_X" instead of "mask_Y"). However, for reset_problem, the event/old_state/student_answers/input_id value is still masked ("mask_Y" instead of "choice_X"). (There is, of course, no new state, since it has been reset.) I'm not able to look at the values for state that are stored in problem_check events because I had to perform a "reset" before I could do a "check" again. |
|
@brianhw I'm satisfied my comments are addressed. @nparlante before merging, please make sure you're at 100% quality: https://jenkins.testeng.edx.org/job/edx-platform-report/3716/Diff_Quality_Report/? (theres some lingering pylints) Also please squash/reword your commits for a clean history (no "WIP", no "Respond to review" commits - the number of commits you end up with is really just your own judgement call) |
|
@nparlante Please also fix the reset_problem so that it outputs unmasked data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you had "num_minute" and "num_second" below, this should be "num_hour".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, done.
|
@nparlante Thanks for fixing the reset, and responding to my comments. So just have to address Sarina's last comment, and get the tests to pass, quality improved, and commits squashed. 👍 |
|
@brianhw Thanks Brian. In really appreciated that you figured out the old_state case, as I was surely never going to think of that. I'll work on this other stuff. |
|
Ok, I'm working on the last few bits. I'm hoping the fast-forward clears up the failing tests. Woah, what's up with that huge list of commits. I must have done the rebase in some suboptimal way. |
|
Hmm, ok I'm going to re-do the rebase. (later) ok done. I'm hoping the spuriously failing tests will have fixed themselves. (even later)Yay, that worked for the tests. |
|
@sarina Hey Sarina, ok I think I've done ever thing, so I'm getting ready to click the old button, say late Mon. (Well I need to do one last rebase) There's one remaining pylint which is that the check_problem function is long, which it is. i was thinking it was best to just leave that pylint as is, as a sort of reminder to apply decomposition when messing with that function. What do you think? |
|
You can wrap the function in disable pragmas: # pylint: disable=too-many-statements
def check_problem():
really_long_stuff
# pylint: enable=too-many-statements |
|
@sarina ok thank Sarina. I'll give it a last lookover tonight and ship it. |
Features by Jeff Ericson and Nick Parlante squashed down to one commit. -shuffle of choices within a choicegroup (Nick) -answer-pool subsetting within a choicegroup (Jeff) -masking of choice names within a choicegroup, (Nick) used by shuffle and answer-pool -targeted-feedback within a multiplechoiceresponse (Jeff) -delay-between-submissions capa feature (Jeff)
Shuffle feature for multiple choice questions
|
(woot)^1000 ! |
|
@jbau Thanks Jason -- I was torn between thinking up some witty epitaph for good ol' 1499, and the fear how that might jinx it. |
|
🌟 👍 🌟 |
|
w00t ✨ 🚀 be sure to delete your branch, too, once you've merged (it speeds up builds and deploy) |
|
Nice, congrats! |
Releasing v2.1.1
@cpennington Could you check this one out?
@rocha Note the analytics issue below
@jinpa How about you do testing, it is your story I believe!
Just FYI: @jaericson @shnayder
Adds studio and lms support for an option to shuffle
the displayed order of multiple choice questions.
Works by shuffling the xml tree nodes in the problem
during the get_html process. The shuffling uses the problem's seed.
The added syntax for the xml and markdown is documented in the Changelog.
One concern was: will this mess up analytics?
My experiments suggest that analytics will work fine:
The question is -- when the question choices are displayed shuffled, is enough logged so that analytics still works?
I have a 4-option problem with shuffling enabled. In this case, the options are presented in the order b a d c (i.e. 1 0 3 2). I selected option c, the last one displayed (which happens to be correct), and then looking in courseware_studentmodulehistory I have the following row
2013-10-22 22:23:40.658495|1.0|1.0|{"correct_map": {"i4x-Stanford-CS99-problem-408c0fcffb8c41ec87675d8c3f7a3b5b_2_1": {"hint": "", "hintmode": null, "correctness": "correct", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-Stanford-CS99-problem-408c0fcffb8c41ec87675d8c3f7a3b5b_2_1": {}}, "attempts": 2, "seed": 282, "done": true, "student_answers": {"i4x-Stanford-CS99-problem-408c0fcffb8c41ec87675d8c3f7a3b5b_2_1": "choice_2"}}||4|52
In student_answers I see choice_2 which is "c" in the 0-based numbering, so that's right. It looks to me that it has successfully recorded which choice I made, not getting confused by the fact that the options where displayed in a different order. The rest of the stuff in the rows looks reasonable, but mostly it's greek to me.
Because we're using standard python shuffle, knowing the seed, you can recreate the shuffle order. Maybe you never need to do this since the problem will just do it for you. Still. Noting that the logged seed above is 282:
Cute!