Skip to content

Performance is poor when lines are much longer than stream chunks #49

@jhiesey

Description

@jhiesey

Suppose we have lines that are very long (sometimes tens of megabytes each of JSON in my case), and stream chunks of more normal length, say 16kB. That means there can be over 1000 chunks per line. This comes up when trying to fetch the changes stream from NPM's couchdb replication endpoint for example.

Each time a new chunk comes in, it gets appended to this[kLast] and then string.split() get called on the resulting string. This means that string.split() is getting called over 1000 times to find a single newline, so splitting in effect becomes O(n^2) in the length of the line.

The fast way to do this is to search for \n before appending to this[kLast]. Unfortunately that doesn't work properly if you have a delimiter like the default /\r?\n/ that can cross chunk boundaries.

So I don't know if this is something you can easily fix while keeping the same interface. A fast-path option that only works for a single character delimiter, or maybe a fixed delimiter string but not functions or regexes, would be handy.

Do you have any better ideas? Or is this use case out of scope for this package? Or am I doing something else dumb?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions