ProphetNet#7157
ProphetNet#7157patrickvonplaten merged 79 commits intohuggingface:masterfrom qiweizhen:prophetnet_develop
Conversation
prophetnet modified modify codes as suggested v1 add prophetnet test files
|
I opened a wrong PR yesterday, please help me check this version, thanks! |
|
@qiweizhen - this looks great! Is this the complete PR? Can we close the "old" PR: #6187 in favor of this one? |
|
@qiweizhen the Integration tests look great! @JetRunner, I think we can take it from here :-) I saw that there are models, such "xprophetnet-large-wiki100-cased-xglue-ntg" that are both under microsoft and under weizhen - @qiweizhen are these models identical? |
|
This PR is complete version, as I rebased this branch to the latest huggingface version with directions of @JetRunner . Models under Microsoft are what we actually used. Those under qiweizhen were used to debug. I will delete the models under qiweizhen. Thank you for your helps @patrickvonplaten @JetRunner |
Awesome! Thanks a million for your work! We will take it from here :-) |
|
@patrickvonplaten Hi, may I ask when could ProphetNet be added into Transformers? Are there any jobs I can co-work to help it be integrated? |
|
Hey @qiweizhen , Sorry for the delay on this. Prophetnet is my no 1 priority next week. It should be merged by the end of next week. You have done your part - I might ping you for some further questions |
|
@qiweizhen - the integration tests are awesome! Thanks to that it should be quite straightforward to integrate the model |
|
@qiweizhen - would it be ok for you if we add a |
Sure! Thank you! |
sgugger
left a comment
There was a problem hiding this comment.
Thanks for all the work in the implementation! I'm not a fan of breaking the naming conventions that are in all our modeling files, the building blocks should be prefixed with ProphetNet in my opinion. I'm also wondering why ProphetNetForCausalLM is excluded from the common tests.
The rest is just nits.
sshleifer
left a comment
There was a problem hiding this comment.
Excited to use this! Great contribution!
I wrote comments as if I were reviewing Patrick's code. If anything is written without sufficient explanation or unclear, I'd be happy to clarify.
I read config_, modeling_, and tests.
Things I noticed in pycharm that I didn't include
(didn't write here, all related to modeling_prophetnet):
-
softmaxonnx trace logic:
deleted in bart without issue, but no strong preference. -
NgramMultiheadAttention.forward:need_weightskwarg unused -
ProphetNetDecoderLayer:output_attentionskwarg unused -
why is it called
predict_attention_maskinstead ofdecoder_attention_mask
I thinkmainis used instead of encoder also? -
There are two sets of logic for preparing causal masks
prepare_attention_maskandprepare_predict_attention_mask. -
I think these should both have docstrings/better names. I don't understand their role well enough to know exactly.
-
In
prepare_predict_attention_mask, are we assuming that batches are padded to max_target_positions? -
In
prepare_predict_attention_mask, why do we expandpredict_causal_maskto max_target_positions? -
I would type hint that
DecoderLayerreturnsTuple
| inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=100, return_tensors='pt') | ||
|
|
||
| # Generate Summary | ||
| summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True) |
There was a problem hiding this comment.
if num_beams=4, max_length=512 are config defaults (512 seems high), they should not be specified.
If 512 is meant to be the source max_length, as I suspect, tokenizer.model_max_length should be set to handle it by default.
There was a problem hiding this comment.
The integration test was written by the author, so I'd prefer to leave to ensure model haves as originally expected by the atuhor.
| For xGLUE corss-lingual NLG tasks, xProphetNet is finetuned with English data, but inference with both English and other zero-shot language data. | ||
| ### Usage | ||
| A quick usage is like: | ||
| ``` |
There was a problem hiding this comment.
same comments as ^^ apply to all model cards.
| "microsoft/xprophetnet-large-wiki100-cased-xglue-ntg", use_cdn=False | ||
| ) | ||
| model.to(torch_device) | ||
| model.config.max_length = 512 |
There was a problem hiding this comment.
but you generate like 30 tokens?
| @slow | ||
| def test_xprophetnet_ntg_inference(self): | ||
| model = XLMProphetNetForConditionalGeneration.from_pretrained( | ||
| "microsoft/xprophetnet-large-wiki100-cased-xglue-ntg", use_cdn=False |
| ) | ||
|
|
||
| summary_ids_beam1 = model.generate( | ||
| input_ids, num_beams=1, length_penalty=1.0, no_repeat_ngram_size=3, early_stopping=True |
There was a problem hiding this comment.
| input_ids, num_beams=1, length_penalty=1.0, no_repeat_ngram_size=3, early_stopping=True | |
| input_ids, num_beams=1, |
There was a problem hiding this comment.
(assuming config defaults like BART)
| def test_is_whitespace(self): | ||
| self.assertTrue(_is_whitespace(" ")) | ||
| self.assertTrue(_is_whitespace("\t")) | ||
| self.assertTrue(_is_whitespace("\r")) | ||
| self.assertTrue(_is_whitespace("\n")) | ||
| self.assertTrue(_is_whitespace("\u00A0")) | ||
|
|
||
| self.assertFalse(_is_whitespace("A")) | ||
| self.assertFalse(_is_whitespace("-")) | ||
|
|
||
| def test_is_control(self): | ||
| self.assertTrue(_is_control("\u0005")) | ||
|
|
||
| self.assertFalse(_is_control("A")) | ||
| self.assertFalse(_is_control(" ")) | ||
| self.assertFalse(_is_control("\t")) | ||
| self.assertFalse(_is_control("\r")) | ||
|
|
||
| def test_is_punctuation(self): | ||
| self.assertTrue(_is_punctuation("-")) | ||
| self.assertTrue(_is_punctuation("$")) | ||
| self.assertTrue(_is_punctuation("`")) | ||
| self.assertTrue(_is_punctuation(".")) | ||
|
|
||
| self.assertFalse(_is_punctuation("A")) | ||
| self.assertFalse(_is_punctuation(" ")) |
…nsformers into prophetnet_develop
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
…nsformers into prophetnet_develop
sgugger
left a comment
There was a problem hiding this comment.
The master will be removed by the release master but should be there until then ;-)
LysandreJik
left a comment
There was a problem hiding this comment.
Complicated model! Great job on the implementation and finishing touches!
Mostly nits about logging. Should wait for #7659 to be merged before merging.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Add ProphetNet.
This PR implements both ProphetNet and XLM-ProphetNet. The model architectures are identical, but each model uses a different tokenizer.
Description:
ProphetNet is a new pre-trained language model for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction. ProphetNet is able to predict more future tokens with an n-stream decoder. The original implementation is Fairseq version at github repo.
xProphetNet has the same model structure but is pretrained with wikipedia 100 languages dataset as described in xGLUE. xGLUE is a benchmark for cross-lingual NLU and NLG tasks. xProphetNet is also served as a baseline model for cross-lingual generation tasks in xGLUE NTG and QG.
Usage:
Take xGLUE NTG task as an example:
The cross-lingual pretrained model is finetuned with English news title generation data, but inference with both English and other zero-shot language data.
A quick usage is like:
Model will generate news titles like:
Released checkpoints:
pretrained:
fine-tuned:
Notes
According to the outputs of original fairseq outputs, integration tests for prophetnet include:
The model was implemented so all of its parts can be used separately. This means that
ProphetNetEncoderandProphetNetEncodercan be used as stand-alone models.ProphetNetForCausalLMcan be instantiated easily from pretrained checkpoints and can be used within the EncoderDecoderModel framework.