Performance is poor when lines are much longer than stream chunks

Suppose we have lines that are very long (sometimes tens of megabytes each of JSON in my case), and stream chunks of more normal length, say 16kB. That means there can be over 1000 chunks per line. This comes up when trying to fetch the changes stream from NPM's couchdb replication endpoint for example.

Each time a new chunk comes in, it gets appended to `this[kLast]` and then `string.split()` get called on the resulting string. This means that `string.split()` is getting called over 1000 times to find a single newline, so splitting in effect becomes `O(n^2)` in the length of the line.

The fast way to do this is to search for `\n` *before* appending to `this[kLast]`. Unfortunately that doesn't work properly if you have a delimiter like the default `/\r?\n/` that can cross chunk boundaries.

So I don't know if this is something you can easily fix while keeping the same interface. A fast-path option that only works for a single character delimiter, or maybe a fixed delimiter string but not functions or regexes, would be handy.

Do you have any better ideas? Or is this use case out of scope for this package? Or am I doing something else dumb?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance is poor when lines are much longer than stream chunks #49

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Performance is poor when lines are much longer than stream chunks #49

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions