Skip to content

how to use EncoderDecoderModel to do en-de translation? #8944

@CharizardAcademy

Description

@CharizardAcademy

I have trained a EncoderDecoderModel from huggging face to do english-German translation task. I tried to overfit a small dataset (100 parallel sentences), and use model.generate() then tokenizer.decode() to perform the translation. However, the output seems to be proper German sentences, but it is definitely not the correct translation.

Here are the code for building the model

encoder_config = BertConfig()
decoder_config = BertConfig()
config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model = EncoderDecoderModel(config=config)

Here are the code for testing the model

model.eval()
input_ids = torch.tensor(tokenizer.encode(input_text)).unsqueeze(0)
output_ids = model.generate(input_ids.to('cuda'), decoder_start_token_id=model.config.decoder.pad_token_id)
output_text = tokenizer.decode(output_ids[0])

Example input: "iron cement is a ready for use paste which is laid as a fillet by putty knife or finger in the mould edges ( corners ) of the steel ingot mould ."

Ground truth translation: "iron cement ist eine gebrauchs ##AT##-##AT## fertige Paste , die mit einem Spachtel oder den Fingern als Hohlkehle in die Formecken ( Winkel ) der Stahlguss -Kokille aufgetragen wird ."

What the model outputs after trained 100 epochs: "[S] wenn sie den unten stehenden link anklicken, sehen sie ein video uber die erstellung ansprechender illustrationen in quarkxpress" which is totally nonesense.

Where is the problem?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions