Skip to content

ProphetNet#7157

Merged
patrickvonplaten merged 79 commits intohuggingface:masterfrom
qiweizhen:prophetnet_develop
Oct 19, 2020
Merged

ProphetNet#7157
patrickvonplaten merged 79 commits intohuggingface:masterfrom
qiweizhen:prophetnet_develop

Conversation

@qiweizhen
Copy link
Copy Markdown
Contributor

@qiweizhen qiweizhen commented Sep 16, 2020

Add ProphetNet.

This PR implements both ProphetNet and XLM-ProphetNet. The model architectures are identical, but each model uses a different tokenizer.

Description:

ProphetNet is a new pre-trained language model for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction. ProphetNet is able to predict more future tokens with an n-stream decoder. The original implementation is Fairseq version at github repo.
xProphetNet has the same model structure but is pretrained with wikipedia 100 languages dataset as described in xGLUE. xGLUE is a benchmark for cross-lingual NLU and NLG tasks. xProphetNet is also served as a baseline model for cross-lingual generation tasks in xGLUE NTG and QG.

Usage:

Take xGLUE NTG task as an example:
The cross-lingual pretrained model is finetuned with English news title generation data, but inference with both English and other zero-shot language data.
A quick usage is like:

from transformers import ProphetNetTokenizer, ProphetNetForConditionalGeneration, ProphetNetConfig

model = ProphetNetForConditionalGeneration.from_pretrained('microsoft/xprophetnet-large-wiki100-cased-xglue-ntg')
tokenizer = ProphetNetTokenizer.from_pretrained('microsoft/xprophetnet-large-wiki100-cased-xglue-ntg')

EN_SENTENCE_TO_QUESTION = "Microsoft Corporation intends to officially end free support for the Windows 7 operating system after January 14, 2020, according to the official portal of the organization. From that day, users of this system will not be able to receive security updates, which could make their computers vulnerable to cyber attacks."
RU_SENTENCE_TO_QUESTION = "орпорация Microsoft намерена официально прекратить бесплатную поддержку операционной системы Windows 7 после 14 января 2020 года, сообщается на официальном портале организации . С указанного дня пользователи этой системы не смогут получать обновления безопасности, из-за чего их компьютеры могут стать уязвимыми к кибератакам."
ZH_SENTENCE_TO_QUESTION = "根据该组织的官方门户网站,微软公司打算在2020年1月14日之后正式终止对Windows 7操作系统的免费支持。从那时起,该系统的用户将无法接收安全更新,这可能会使他们的计算机容易受到网络攻击。"
inputs = tokenizer([EN_SENTENCE_TO_QUESTION, RU_SENTENCE_TO_QUESTION, ZH_SENTENCE_TO_QUESTION], padding=True, max_length=256, return_tensors='pt')

summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=100, early_stopping=True)
print([tokenizer.decode(g) for g in summary_ids])  

Model will generate news titles like:

['[SEP] Microsoft to end Windows 7 free support after January 14, 2020[SEP][PAD][PAD][PAD][PAD]',
 '[SEP] Microsoft намерена прекратить бесплатную поддержку Windows 7 после 14 января 2020 года[SEP]',
 '[SEP]微软打算终止对Windows 7操作系统的免费支持[SEP][PAD][PAD][PAD][PAD][PAD][PAD]']

Released checkpoints:

pretrained:

microsoft/prophetnet-large-uncased
microsoft/xprophetnet-large-wiki100-cased

fine-tuned:

microsoft/prophetnet-large-uncased-cnndm
microsoft/xprophetnet-large-wiki100-cased-xglue-ntg
microsoft/xprophetnet-large-wiki100-cased-xglue-qg

Notes

According to the outputs of original fairseq outputs, integration tests for prophetnet include:

  1. encoder hidden states, decoder hidden states, model hidden states of pretrained Prophetnet, xProphetnet checkpoints
  2. model hidden states of xProphetnet NTG finetuned model
  3. Cross-lingual outputs of xProphetNet NTG finetuned model with different beam sizes
  4. CNN/DM outputs of ProphetNet CNN/DM finetuned model with different input lengths

The model was implemented so all of its parts can be used separately. This means that ProphetNetEncoder and ProphetNetEncoder can be used as stand-alone models. ProphetNetForCausalLM can be instantiated easily from pretrained checkpoints and can be used within the EncoderDecoderModel framework.

@julien-c julien-c added the model card Related to pretrained model cards label Sep 16, 2020
@qiweizhen
Copy link
Copy Markdown
Contributor Author

I opened a wrong PR yesterday, please help me check this version, thanks!
@JetRunner @patrickvonplaten

@patrickvonplaten patrickvonplaten removed the model card Related to pretrained model cards label Sep 16, 2020
@patrickvonplaten
Copy link
Copy Markdown
Contributor

@qiweizhen - this looks great! Is this the complete PR? Can we close the "old" PR: #6187 in favor of this one?

@julien-c julien-c added the model card Related to pretrained model cards label Sep 16, 2020
@patrickvonplaten
Copy link
Copy Markdown
Contributor

@qiweizhen the Integration tests look great! @JetRunner, I think we can take it from here :-)

I saw that there are models, such "xprophetnet-large-wiki100-cased-xglue-ntg" that are both under microsoft and under weizhen - @qiweizhen are these models identical?

Comment thread src/transformers/modeling_prophetnet.py Outdated
@qiweizhen
Copy link
Copy Markdown
Contributor Author

This PR is complete version, as I rebased this branch to the latest huggingface version with directions of @JetRunner .

Models under Microsoft are what we actually used. Those under qiweizhen were used to debug. I will delete the models under qiweizhen.

Thank you for your helps @patrickvonplaten @JetRunner

Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
@patrickvonplaten
Copy link
Copy Markdown
Contributor

This PR is complete version, as I rebased this branch to the latest huggingface version with directions of @JetRunner .

Models under Microsoft are what we actually used. Those under qiweizhen were used to debug. I will delete the models under qiweizhen.

Thank you for your helps @patrickvonplaten @JetRunner

Awesome! Thanks a million for your work! We will take it from here :-)

Comment thread src/transformers/tokenization_prophetnet.py
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
@qiweizhen
Copy link
Copy Markdown
Contributor Author

@patrickvonplaten Hi, may I ask when could ProphetNet be added into Transformers? Are there any jobs I can co-work to help it be integrated?

@patrickvonplaten
Copy link
Copy Markdown
Contributor

Hey @qiweizhen ,

Sorry for the delay on this. Prophetnet is my no 1 priority next week. It should be merged by the end of next week. You have done your part - I might ping you for some further questions

@patrickvonplaten
Copy link
Copy Markdown
Contributor

@qiweizhen - the integration tests are awesome! Thanks to that it should be quite straightforward to integrate the model

@patrickvonplaten patrickvonplaten changed the title add Prophetnet IntegrationTest [WIP] ProphetNet Sep 28, 2020
@patrickvonplaten patrickvonplaten removed the model card Related to pretrained model cards label Sep 28, 2020
@julien-c julien-c added the model card Related to pretrained model cards label Sep 28, 2020
Comment thread src/transformers/tokenization_prophetnet.py Outdated
@patrickvonplaten
Copy link
Copy Markdown
Contributor

@qiweizhen - would it be ok for you if we add a ProphetNetModel and a XLMProphetNetModel, each with their respective tokenizers. I think this would be cleaner and is also more in line with Roberta and XLMRoberta for example. I should be quite easy to do this. I can take care of it - would just be great to have your approval on it :-)

@qiweizhen
Copy link
Copy Markdown
Contributor Author

@qiweizhen - would it be ok for you if we add a ProphetNetModel and a XLMProphetNetModel, each with their respective tokenizers. I think this would be cleaner and is also more in line with Roberta and XLMRoberta for example. I should be quite easy to do this. I can take care of it - would just be great to have your approval on it :-)

Sure! Thank you!

Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work in the implementation! I'm not a fan of breaking the naming conventions that are in all our modeling files, the building blocks should be prefixed with ProphetNet in my opinion. I'm also wondering why ProphetNetForCausalLM is excluded from the common tests.

The rest is just nits.

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread docs/source/index.rst Outdated
Comment thread src/transformers/configuration_xlm_prophetnet.py Outdated
Comment thread docs/source/model_doc/xlmprophetnet.rst
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread utils/check_repo.py Outdated
Copy link
Copy Markdown
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to use this! Great contribution!

I wrote comments as if I were reviewing Patrick's code. If anything is written without sufficient explanation or unclear, I'd be happy to clarify.

I read config_, modeling_, and tests.

Things I noticed in pycharm that I didn't include

(didn't write here, all related to modeling_prophetnet):

  • softmax onnx trace logic:
    deleted in bart without issue, but no strong preference.

  • NgramMultiheadAttention.forward: need_weights kwarg unused

  • ProphetNetDecoderLayer: output_attentions kwarg unused

  • why is it called predict_attention_mask instead of decoder_attention_mask
    I think main is used instead of encoder also?

  • There are two sets of logic for preparing causal masks
    prepare_attention_mask and prepare_predict_attention_mask.

  • I think these should both have docstrings/better names. I don't understand their role well enough to know exactly.

  • In prepare_predict_attention_mask, are we assuming that batches are padded to max_target_positions?

  • In prepare_predict_attention_mask, why do we expand predict_causal_mask to max_target_positions?

  • I would type hint that DecoderLayer returns Tuple

Comment thread docs/source/index.rst Outdated
Comment thread model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md Outdated
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=100, return_tensors='pt')

# Generate Summary
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if num_beams=4, max_length=512 are config defaults (512 seems high), they should not be specified.
If 512 is meant to be the source max_length, as I suspect, tokenizer.model_max_length should be set to handle it by default.

Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten Oct 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test was written by the author, so I'd prefer to leave to ensure model haves as originally expected by the atuhor.

Comment thread model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md Outdated
For xGLUE corss-lingual NLG tasks, xProphetNet is finetuned with English data, but inference with both English and other zero-shot language data.
### Usage
A quick usage is like:
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as ^^ apply to all model cards.

"microsoft/xprophetnet-large-wiki100-cased-xglue-ntg", use_cdn=False
)
model.to(torch_device)
model.config.max_length = 512
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you generate like 30 tokens?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great test otherwise!

Comment thread tests/test_modeling_xlm_prophetnet.py Outdated
@slow
def test_xprophetnet_ntg_inference(self):
model = XLMProphetNetForConditionalGeneration.from_pretrained(
"microsoft/xprophetnet-large-wiki100-cased-xglue-ntg", use_cdn=False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove use_cdn?

)

summary_ids_beam1 = model.generate(
input_ids, num_beams=1, length_penalty=1.0, no_repeat_ngram_size=3, early_stopping=True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_ids, num_beams=1, length_penalty=1.0, no_repeat_ngram_size=3, early_stopping=True
input_ids, num_beams=1,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming config defaults like BART)

Comment on lines +153 to +178
def test_is_whitespace(self):
self.assertTrue(_is_whitespace(" "))
self.assertTrue(_is_whitespace("\t"))
self.assertTrue(_is_whitespace("\r"))
self.assertTrue(_is_whitespace("\n"))
self.assertTrue(_is_whitespace("\u00A0"))

self.assertFalse(_is_whitespace("A"))
self.assertFalse(_is_whitespace("-"))

def test_is_control(self):
self.assertTrue(_is_control("\u0005"))

self.assertFalse(_is_control("A"))
self.assertFalse(_is_control(" "))
self.assertFalse(_is_control("\t"))
self.assertFalse(_is_control("\r"))

def test_is_punctuation(self):
self.assertTrue(_is_punctuation("-"))
self.assertTrue(_is_punctuation("$"))
self.assertTrue(_is_punctuation("`"))
self.assertTrue(_is_punctuation("."))

self.assertFalse(_is_punctuation("A"))
self.assertFalse(_is_punctuation(" "))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great tests

Comment thread utils/check_repo.py Outdated
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master will be removed by the release master but should be there until then ;-)

Comment thread README.md Outdated
Comment thread README.md Outdated
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complicated model! Great job on the implementation and finishing touches!

Mostly nits about logging. Should wait for #7659 to be merged before merging.

Comment thread src/transformers/configuration_prophetnet.py Outdated
Comment thread src/transformers/configuration_prophetnet.py Outdated
Comment thread src/transformers/configuration_prophetnet.py Outdated
Comment thread src/transformers/configuration_xlm_prophetnet.py Outdated
Comment thread src/transformers/modeling_prophetnet.py Outdated
Comment thread src/transformers/modeling_xlm_prophetnet.py
Comment thread src/transformers/tokenization_prophetnet.py Outdated
Comment thread src/transformers/tokenization_prophetnet.py Outdated
Comment thread src/transformers/tokenization_xlm_prophetnet.py Outdated
Comment thread src/transformers/tokenization_xlm_prophetnet.py Outdated
patrickvonplaten and others added 9 commits October 18, 2020 13:39
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
@patrickvonplaten patrickvonplaten merged commit 2422cda into huggingface:master Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model card Related to pretrained model cards

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants