-
Notifications
You must be signed in to change notification settings - Fork 10
(improvement/AttentionalDecoder.predict_max) Avoid computation on rea… #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
(improvement/AttentionalDecoder.predict_max) Avoid computation on rea… #74
Conversation
…ched EOS for prediction (See emanjavacas#73) The current prediction time is quite slow, we agree that there might be room for improvement. After having a good look at it, it seemed clear that we were computing on items that technically did not need to continue to be computed upon (string that reach EOS). I propose here my refactor of the predict_max function that stop computing over elements that reached EOS. There is probably still room for improvement here. For a group of 19 sentences over 100 iterations Average tagging time with default: 0.556127781867981 s Median tagging time with default: 0.5420029163360596 Total tagging time with default: 55.612778186798096 s For a group of 19 sentences over 100 iterations Average tagging time with new: 0.4061899709701538 s Median tagging time with new: 0.40130531787872314 Total tagging time with new: 40.61899709701538 s - 27 % time for the whole tagging (lemma only)
… performance fixes (avoid transposing when non needed)
cf58ee8 to
ae15550
Compare
|
I am bumping this one :) |
|
Sorry I wont have time for this in the coming months, busy PhD time. |
|
I get it :)
|
pie/models/decoder.py
Outdated
| # We then iterate over score and inp (same sized tensors) and add to our | ||
| prediction_run_output = [eos for _ in range(batch)] | ||
|
|
||
| for ind, (hyp, sc) in enumerate(zip(inp.tolist(), score.tolist())): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this bit particularly confusing and I think there is no need for a for loop here, which goes against the tensor-level operations used in the rest of the package.
I would rewrite all this using a mask. Check the following points:
- the keys in tensor_to_batch_indices are always 0 to number of remaining hypothesis. so the use of a dictionary is redundant. You could keep a tensor with the target indices.
- prediction_run_output (btw. I cannot imagine what the variable represents on the basis of the name only, perhaps look for a more informative name) can be updated at once using masks (same for scores)
As a strategy you could define an output tensor at once of size (batch x max_length) for output hyps and index it directly after each step. For the output scores, you can keep a 1D tensor and update it using a mask that indexes to words not yet finished. When the process is done you can rearrange hyps into the expected format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not completely get what you mean (unfortunately) and I am not sure this is gonna make the code clearer. I'd argue it will make it less clear actually. You're the code owner, but outside the variable name, I really feel like the code is accessible and easy to tweak for outsider as is. And we have loops here and there (for example beams) even if these might be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pie/models/decoder.py
Outdated
| tensor_to_batch_indexes = { | ||
| elem: tensor_to_batch_indexes[former_index] | ||
| for elem, former_index in enumerate(keep.tolist()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. The keys are always 0 to n (remaining items) so the dictionary is redundant. I'd rather go for a tensor that lets you index into the output tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
emanjavacas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments.
Another thing I think we'd need is some code that tests that this outputs the same as the non-optimized version.
|
#74 (review) I don't see how you would like to test this. I mean, I ran things (which is not test per se, in the sense of CI/Unit Test) and it was the same. But to test it, we'd have to keep an artificial version of predict_max ? |
|
Thanks for the review. I'll try to implement what you ask for, because I still think that's a huge improvement. |
|
I have moved from the loop to tensor uses as required. Remains the question of tests which I am not sure how to handle. |
|
Bumping this :) |
* (improvement/AttentionalDecoder.predict_max) Avoid computation on reached EOS for prediction (See emanjavacas#73) The current prediction time is quite slow, we agree that there might be room for improvement. After having a good look at it, it seemed clear that we were computing on items that technically did not need to continue to be computed upon (string that reach EOS). I propose here my refactor of the predict_max function that stop computing over elements that reached EOS. There is probably still room for improvement here. For a group of 19 sentences over 100 iterations Average tagging time with default: 0.556127781867981 s Median tagging time with default: 0.5420029163360596 Total tagging time with default: 55.612778186798096 s For a group of 19 sentences over 100 iterations Average tagging time with new: 0.4061899709701538 s Median tagging time with new: 0.40130531787872314 Total tagging time with new: 40.61899709701538 s - 27 % time for the whole tagging (lemma only) * (Improvement/AttentionalDecoder.predict_max) Lots of comment and some performance fixes (avoid transposing when non needed) * Update torch * (improvement/argmax.decoder) Improved a little more readability * (improvement/decode) Removed loop by tensor use for prediction * Do not update sneakily the torch requirements
…ched EOS for prediction (See #73)
The current prediction time is quite slow, we agree that there might be room for improvement.
After having a good look at it, it seemed clear that we were computing on items that technically did not need to continue to be computed upon (string that reach EOS).
I propose here my refactor of the predict_max function that stop computing over elements that reached EOS. There is probably still room for improvement here.
For a group of 19 sentences over 100 iterations
Average tagging time with default: 0.556127781867981 s
Median tagging time with default: 0.5420029163360596
Total tagging time with default: 55.612778186798096 s
For a group of 19 sentences over 100 iterations
Average tagging time with new: 0.4061899709701538 s
Median tagging time with new: 0.40130531787872314
Total tagging time with new: 40.61899709701538 s