-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
feat / docFeature: Doc, Span and Token objectsFeature: Doc, Span and Token objectsfeat / serializeFeature: Serialization, saving and loadingFeature: Serialization, saving and loadingresolvedThe issue was addressed / answeredThe issue was addressed / answered
Description
How to reproduce the behaviour
I am trying to train a sense2vec model from scratch on a corpus which has around 5 million sentences (lines/docs). As the first step is to parse it and create a .spacy file, I ran the parse script on this corpus and the code crashes at doc_bin_bytes = doc_bin.to_bytes() with a value error saying the bytes object is too large. Can somebody help me with this issue? Thanks
Your Environment
- Operating System: Ubuntu 16.04.6 LTS
- Python Version Used: 3.5.2
- spaCy Version Used: 2.2.4
- Environment Information:
Metadata
Metadata
Assignees
Labels
feat / docFeature: Doc, Span and Token objectsFeature: Doc, Span and Token objectsfeat / serializeFeature: Serialization, saving and loadingFeature: Serialization, saving and loadingresolvedThe issue was addressed / answeredThe issue was addressed / answered