[wip] [fsmt] possible support of iwslt14 in fsmt#8374
[wip] [fsmt] possible support of iwslt14 in fsmt#8374stas00 wants to merge 1 commit intohuggingface:masterfrom
Conversation
| # model config | ||
| fsmt_model_config_file = os.path.join(pytorch_dump_folder_path, "config.json") | ||
|
|
||
| args = { |
There was a problem hiding this comment.
@lighteternal, also please check I got these defaults right.
wmt19 fairseq supplies all these in the checkpoint.
|
You were right, it was an EN to EL model so the src-tgt were inversed, however I changed the args and I tried again. which I fixed by changing one of the ignore keys due to a typo probably, in line 251 (see comment below): After that, the conversion script concludes succesfully: I guess the next thing is to try to load this model locally to test that it's working. |
|
Fantastic! Yes, load it up and let us know whether it works. If you want ready scripts to adapt from, perhaps try https://github.com/stas00/porting/blob/master/transformers/fairseq-wmt19/scripts/fsmt-translate.py |
I don't think that this solution works. While you made the conversion script complete, your ported model now has random weights for The error is:
For some reason it's creating a decoder with the size of the encoder dict. I think there might have been a bug introduced since I wrote and used this script. I will debug and get back to you. |
|
Hmm, it looks like This is the breaking change: facebookresearch/fairseq@3b27ed7 They did away with the |
|
OK, I updated the conversion script to support the latest fairseq and it now converts your model out of the box w/o needing any changes, please use the version in this PR if it hasn't been merged yet. #8377 Please let me know whether the results are satisfactory and you get a good translation out of it - note it uses some default hparams (see the script) so you can adjust those to your liking. Once you're happy you can upload the model to s3 as explained here: https://huggingface.co/transformers/model_sharing.html |
|
Resolved in #8377 |
|
This is great @stas00, I just tried it locally and it works as intended. Many thanks! One quick question, during training with fairseq, the tokenization also converted all letters to lower-case (to reduce the vocab I assume) so now in order to get correct translations the input text needs to be lowercase only. I can of course add a line to my script to do that automatically, but I was wondering how I can force the uploaded model to do that (so that anyone wanting to test it doesn't have to download it locally and add that additional line). Probably with a config argument...? I am uploading this to s3 soon, your help has been invaluable 👍 |
|
Excellent. I'm glad to hear we sorted it out. So have you validated that the translation works and the bleu score evals are satisfactory? You can do it easily with
It's currently not supported as all the models I worked with didn't have that restriction. I will add this functionality to the Out of curiosity, if you don't mind sharing, is there a special reason why you chose to train an older more restricted architecture and not one of the newer ones? Surely, losing the normal casing would be a hurdle for practical use. |
I validated the bleu and chrF scores on the fairseq equivalent of the model (before converting it to huggingface) on the Tatoeba testset, but now that there are additional evaluation scripts I will try these as well!
Thanks for this, sure I can wait if there's the option of adding that feature too!
Tbh, I was just following a fairseq guide that was suggesting this arch over the following possible choices: If you could indicate a more recent architecture that has about the same number of parameters (I have a constraint on complexity as I am using a single GTX2080SUPER) I would be happy to re-train! |
My only concern here is that the forced lower-case which won't be the case with the references bleu scores are evaled against.
I re-read the guide and I'm not sure what you mean when you said: "was suggesting this arch over the following possible choices" - I can't find any recommendations to use this particular model over the dozens of the ones you listed. e.g. how did you know that it's a smaller model than some others? I'm gradually getting to know the fairseq models and have only dealt with wmt-variations of When you did the training, was there an option not to force lowercase input or did it come automatic with the |
|
True, the forced lower-case may give slightly higher BLEU score.
By "suggesting" I mean that I just used the pre-defined arch on the available script (see below): I didn't know it would lead to a smaller model before digging it up a bit and discovering that e.g. there's a difference to the ffd hidden layer dimension. I experimented with some of them (not all) and since I am not an expert ofc I ended up on that one based on a.the fact that it actually worked, b. that the perplexity I was getting was getting lower quicker that other cases, and c.more importantly based on whether I would get an OOM after a while (most of the cases) :P.
The lowercase command comes at the data preparation script provided in the guide: https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-iwslt14.sh |
I think most of the recent ones are quite much bigger, but one that we ported that may warrant your attention is the distilled variation: https://github.com/jungokasai/deep-shallow/ |
I'm relatively new to this myself, so I haven't tried enough variations yet to make such recommendations. Perhaps asking at the forums stating your limitations would lead to some excellent recommendations - or perhaps what you have done is just fine - it all depends on your needs. Do check out the distilled approach I mentioned in the comment above.
Oh, I see, this is totally circumstantial - you just trained on lower-cased input so this is the world it knows. This makes total sense. Thank you for helping me understand this nuance. |
|
OK, I implemented the lowercase config, and wanted to automate the discovery of when this option should be pre-set, but the latter didn't work - my detector code discovered up-case letters - I looked at both vocabs you supplied and both have upcase letters in them - a lot of them in the el one and some in en-one (merges/code too). I tried this very simple heuristic: I suppose this has to be set manually then because you know you trained mainly on lowercase - but perhaps there is a bug somewhere on the fairseq side and should not have any upcase letters in either of the 3 files if it were to be properly lower-cased? |
|
The PR that adds lower-case support is here #8389, but for the converter to work with the recent fairseq #8377 is needed to be merged first, or you can do just this over 8389: Or if you don't want to mess with these, we can wait until both are merged. In either case before uploading to s3 you will need to manually set |
I will try this for sure, although I remember that the
Well I couldn't wait, so I tried following your steps and it works perfectly! :D Kudos once again! |
I'm not sure whether fairseq has some doc that compares the different arch configs, but this piece of code seems to be pretty clear on the differences: https://github.com/pytorch/fairseq/blob/master/fairseq/models/transformer.py#L985 The default
Fantastic! I suppose you're not concerned with the upcase letters in the dict/merge files of your pre-trained model. I'd have thought that fairseq pre-processor would lowercased all inputs. But if you think it's no problem, then all is good.
You have to first wait till the lowercasing-PR merged - probably Mon or early next week, and then AFAIK the online version doesn't get updated automatically - the models' code doesn't change often - so we will have to ask for this to happen. And once you see the demo working on the site, then it's in the clear to share with others. |
|
Thank you for the code showcasing the differences between models. I couldn't find a doc with that info.
Probably the perl command included in the fairseq preparation script didn't catch all cases; I can't think of another explanation. In any case I will be modifying this script to re-train without lower-casing and with a bigger number of BPE tokens, just to see if I get a more convenient model that doesn't need the lower-case argument (and hopefully without losing much BLEU).
OK so I'm waiting for the merge, then upload and probably come back in this thread to request an update on the online version if possible. Thanks for the help @stas00 ! |
|
FYI, the lower-casing PR has been merged, so please let me know whether you're waiting to re-train with mixed-casing or whether you want to upload the lower-case model and I will then ask to update the code on the models server. |
|
I already started the mixed-casing training and I was thinking I can upload all 4 of them (lower EN2EL, lower EL2EN, mixed EN2EL, mixed EL2EN) together. The mixed ones also have a bigger vocabulary (almost double) and BLEU scores are very similar to the older lower case ones. |
|
Great! I made the request to update the server - I will update when this is done. It's good to know that there is not much difference with the bigger vocab. I'm curious to how the scores would have been different if your original model was truly lower-case (since it's not at the moment if you check the vocab). (this is just for my learning should you ever run this test) |
|
Hi @stas00, just pinging to check if the code on the model hub is updated. But it returns an error when used online from the link above: I also noticed that the tag directly above the field box is falsely assigned as "text-generation". |
|
Hi, it seems your model card is defined in markdown format not in Yaml: https://huggingface.co/lighteternal/SSE-TUC-mt-en-el-cased leading to incorrect pipeline detection (hence the error you are seeing). Can you try setting the pieline correctly ? https://huggingface.co/docs#how-are-model-tags-determined Let us know if it works better |
|
Thanks @Narsil I updated the model card, but it doesn't seem to have made any difference yet. Is it possible that it takes some time to change pipeline? |
|
It appears that editing the README from the browser, doesn't work; after pulling, editing and pushing again it worked! Many thanks! :) |
|
Indeed, I pushed a fix that will be deployed soon, next time you edit a readme or another file on the website the changes will reflect instantly @lighteternal, thanks for reporting this! |
This is an attempt to see whether fsmt can support older fairseq archs, based on this request: #8233
Currently it's just changing the conversion code directly to see if it can be converted.
@lighteternal, please have a look.
I made the model configuration
argsbased on the arch configuration. The only issue at the moment is sizes of encoder decoder - for some reason the vocab size seems to be reversed?So this is English to Greek, correct? And not the other way around, correct? So the source is
9932and target is12892-long.Your issue mentioned "Greek<->English", but this model must be one way - which is it?
When I run the script:
So it suggests that the encoder is
12896- long, which should be the other way around, no? Unless it was trained on Greek to English.Well, you can also experiment with the conversion.