Skip to content

Conversation

@PonteIneptique
Copy link
Contributor

…ched EOS for prediction (See #73)

The current prediction time is quite slow, we agree that there might be room for improvement.

After having a good look at it, it seemed clear that we were computing on items that technically did not need to continue to be computed upon (string that reach EOS).

I propose here my refactor of the predict_max function that stop computing over elements that reached EOS. There is probably still room for improvement here.

For a group of 19 sentences over 100 iterations
Average tagging time with default: 0.556127781867981 s
Median tagging time with default: 0.5420029163360596
Total tagging time with default: 55.612778186798096 s

For a group of 19 sentences over 100 iterations
Average tagging time with new: 0.4061899709701538 s
Median tagging time with new: 0.40130531787872314
Total tagging time with new: 40.61899709701538 s

  • 27 % time for the whole tagging (lemma only)

…ched EOS for prediction (See emanjavacas#73)

The current prediction time is quite slow, we agree that there might be room for improvement.

After having a good look at it, it seemed clear that we were computing on items that technically did not need to continue to be computed upon (string that reach EOS).

I propose here my refactor of the predict_max function that stop computing over elements that reached EOS. There is probably still room for improvement here.

For a group of 19 sentences over 100 iterations
Average tagging time with default: 0.556127781867981 s
Median tagging time with default: 0.5420029163360596
Total tagging time with default: 55.612778186798096 s

For a group of 19 sentences over 100 iterations
Average tagging time with new: 0.4061899709701538 s
Median tagging time with new: 0.40130531787872314
Total tagging time with new: 40.61899709701538 s

- 27 % time for the whole tagging (lemma only)
… performance fixes (avoid transposing when non needed)
@PonteIneptique PonteIneptique force-pushed the improvement/AttentionDecoder.predict_max branch from cf58ee8 to ae15550 Compare August 8, 2020 07:17
@PonteIneptique
Copy link
Contributor Author

I am bumping this one :)

@emanjavacas
Copy link
Owner

Sorry I wont have time for this in the coming months, busy PhD time.

@PonteIneptique
Copy link
Contributor Author

PonteIneptique commented Aug 31, 2020 via email

# We then iterate over score and inp (same sized tensors) and add to our
prediction_run_output = [eos for _ in range(batch)]

for ind, (hyp, sc) in enumerate(zip(inp.tolist(), score.tolist())):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this bit particularly confusing and I think there is no need for a for loop here, which goes against the tensor-level operations used in the rest of the package.

I would rewrite all this using a mask. Check the following points:

  • the keys in tensor_to_batch_indices are always 0 to number of remaining hypothesis. so the use of a dictionary is redundant. You could keep a tensor with the target indices.
  • prediction_run_output (btw. I cannot imagine what the variable represents on the basis of the name only, perhaps look for a more informative name) can be updated at once using masks (same for scores)

As a strategy you could define an output tensor at once of size (batch x max_length) for output hyps and index it directly after each step. For the output scores, you can keep a 1D tensor and update it using a mask that indexes to words not yet finished. When the process is done you can rearrange hyps into the expected format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not completely get what you mean (unfortunately) and I am not sure this is gonna make the code clearer. I'd argue it will make it less clear actually. You're the code owner, but outside the variable name, I really feel like the code is accessible and easy to tweak for outsider as is. And we have loops here and there (for example beams) even if these might be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 398 to 401
tensor_to_batch_indexes = {
elem: tensor_to_batch_indexes[former_index]
for elem, former_index in enumerate(keep.tolist())
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. The keys are always 0 to n (remaining items) so the dictionary is redundant. I'd rather go for a tensor that lets you index into the output tensors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Owner

@emanjavacas emanjavacas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments.

Another thing I think we'd need is some code that tests that this outputs the same as the non-optimized version.

@PonteIneptique
Copy link
Contributor Author

#74 (review) I don't see how you would like to test this. I mean, I ran things (which is not test per se, in the sense of CI/Unit Test) and it was the same. But to test it, we'd have to keep an artificial version of predict_max ?

@PonteIneptique
Copy link
Contributor Author

Thanks for the review. I'll try to implement what you ask for, because I still think that's a huge improvement.

@PonteIneptique
Copy link
Contributor Author

I have moved from the loop to tensor uses as required. Remains the question of tests which I am not sure how to handle.

@PonteIneptique
Copy link
Contributor Author

Bumping this :)

* (improvement/AttentionalDecoder.predict_max) Avoid computation on reached EOS for prediction (See emanjavacas#73)

The current prediction time is quite slow, we agree that there might be room for improvement.

After having a good look at it, it seemed clear that we were computing on items that technically did not need to continue to be computed upon (string that reach EOS).

I propose here my refactor of the predict_max function that stop computing over elements that reached EOS. There is probably still room for improvement here.

For a group of 19 sentences over 100 iterations
Average tagging time with default: 0.556127781867981 s
Median tagging time with default: 0.5420029163360596
Total tagging time with default: 55.612778186798096 s

For a group of 19 sentences over 100 iterations
Average tagging time with new: 0.4061899709701538 s
Median tagging time with new: 0.40130531787872314
Total tagging time with new: 40.61899709701538 s

- 27 % time for the whole tagging (lemma only)

* (Improvement/AttentionalDecoder.predict_max) Lots of comment and some performance fixes (avoid transposing when non needed)

* Update torch

* (improvement/argmax.decoder) Improved a little more readability

* (improvement/decode) Removed loop by tensor use for prediction

* Do not update sneakily the torch requirements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants