Skip to content

UnicodeDecodeError at tag method #106

@umoqnier

Description

@umoqnier

Currently I base my code on this tutorial and I have some problems with tag method after the train section. I catch the UnicodeDecodeError exception like this

try:
    for xseq in X_test:
        Y_pred.append(tagger.tag(xseq))
except UnicodeDecodeError as e:
    print(e)    
    print(e.object)

The output looks like this

'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
b'B-qu\xc3\xa9'

I tried to decode my X_test before tag using decode('utf-8') but does seems not to works.

Just in case, I had some UnicodeEncodeError problems at the trainer object as shown below but seems that works using encode('utf-8') for every substring. With this method I'm forcing manual encoding before append objects in trainer. This issue is mentioned at #96 and this solution works for me.

for xseq, yseq in zip(X_train, Y_train):    
    trainer.append(xseq, yseq)

NOTE: Sorry for my deficent english. I hope I've been clear enough. If not, please tell me!!! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions