DocBin.to_bytes fails with a "ValueError: bytes object is too large"

## How to reproduce the behaviour

I am trying to train a sense2vec model from scratch on a corpus which has around 5 million sentences (lines/docs). As the first step is to parse it and create a .spacy file, I ran the parse script on this corpus and the code crashes at ```doc_bin_bytes = doc_bin.to_bytes()``` with a value error saying the bytes object is too large. Can somebody help me with this issue? Thanks

## Your Environment

* Operating System: Ubuntu 16.04.6 LTS
* Python Version Used: 3.5.2
* spaCy Version Used: 2.2.4
* Environment Information:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DocBin.to_bytes fails with a "ValueError: bytes object is too large" #5219

How to reproduce the behaviour

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DocBin.to_bytes fails with a "ValueError: bytes object is too large" #5219

Description

How to reproduce the behaviour

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions