-
Notifications
You must be signed in to change notification settings - Fork 1.4k
add multi-modal (vision + language) transformers #2936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/black |
wyli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, please some initial comments inline
monai/networks/nets/vltransformer.py
Outdated
| num_mixed_layers: number of mixed transformer layers. | ||
| """ | ||
| super().__init__() | ||
| bert_config = type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make it a module-level variable `, like
MONAI/monai/networks/nets/efficientnet.py
Line 28 in c50235c
| efficientnet_params = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue here is that module-level variable does not work properly when trying to initialize model weights from bert checkpoint in huggingface library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to replicate this issue, could you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can make the dictionary (in line 232-255) a global variable as the default,
def __init__(
self,
num_language_layers: int = 2,
num_vision_layers: int = 2,
num_mixed_layers: int = 2,
bert_config: Dict = BERT_CONFIG,
)it'll leave some flexibility for tuning the parameters for the end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way is to simply change the object-based definition to a dictionary-based config for BERT model, but class BertPreTrainedModel is not compatible with this. I am using this class directly from Huggingface with very minor modifications.
For your suggestion to give more flexibility, I think we need to keep the original BETR configurations to be able to use the pre-trained BERT weights (at least for this release). Pre-training a BERT type model on text data from scratch could be costly. Any suggestions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure that we need to keep exactly this version of config in order to load the pretrained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, I first tried with different configurations. But the model will not be initialized from the BERT pre-trained checkpoint,so takes longer for training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is mainly about the hard-coded config. for example, as a user if I want to set 'attention_probs_dropout_prob': 0.2 and "transformers_version": "4.10.1" for fine-tuning of the pretrained model, how can I set this MultiModal class without rewriting everything from scratch?
|
@wyli it seems like some check fails due to the following which is not related to multi-modality ? |
|
yes, I'm merging a fix for that flake8 issue very soon #2940 |
|
Hi @wyli Another issue that comes up is with the optional import ( or just import). For some reason
Thanks |
Thanks much. |
|
/black |
how to replicate this issue? I tried and the following works Python 3.7.10 (default, Feb 26 2021, 10:16:00)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from monai.utils import optional_import
>>> bert_embeddings = optional_import("transformers.models.bert.modeling_bert", name="BertEmbeddings")[0]
>>> bert_embeddings
<class 'transformers.models.bert.modeling_bert.BertEmbeddings'>
>>> |
|
/build |
Thanks @wyli. I pushed a new PR based on this for optional imports. |
Hi @wyli , seems like this did not resolve the issue as some of the checks failed. |
Please add the test to this list to exclude it for minimum environment tests Line 138 in 76dc5c2
|
Sure. |
|
/black |
1 similar comment
|
/black |
|
/build |
|
/black |
|
/build |
|
/black |
|
/build |
|
/black |
Signed-off-by: Nic Ma <nma@nvidia.com> Co-authored-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
* [DLMED] enhance the pad mode Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] update all the tensor pad related Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] fix error tests Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] fix GPU tests Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] update according to comments Signed-off-by: Nic Ma <nma@nvidia.com> Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
|
/black |
|
/build |
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
Signed-off-by: ahatamizadeh <ahatamizadeh@nvidia.com>
|
I close this pull request due to many conflicts between the two branches. Will push a new one. |
|
Hi @ahatamiz, why the close? the PR looks nice, sorry for the confusion that my previous comments might have created |
|
the latest update with the flexible bert_config looks good to me. |
|
Hi @wyli. Sorry about this. dealing with so many conflicts in this PR was a bit time-consuming. I had to submit another one. |
Signed-off-by: ahatamizadeh ahatamizadeh@nvidia.com
Description
This pull request adds the full pipeline and support for multimodal (vision + language ) transformers. The transformers implementation follow Huggingface repository.
Status
Ready
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests.make htmlcommand in thedocs/folder.