Improve SIP message parsers #271

dennwc · 2025-12-02T16:31:39Z

This PR refactors and optimises the SIP message parsers.

Short list of changes:

Parser will now track the number of bytes consumed, removing the need for hacks in streaming mode.
Steaming parser now properly skips CRLF between messages (RFC 3261 - 7.5).
Content-Length is now required for streaming mode (RFC 3261 - 7.5).
Parser postpones the []byte -> string conversions for headers, improving performance by 4% and reducing allocations by 10%.
Streaming parser reuses more code from the regular parser.
Parser exposes new API for parsing the message headers without the body.
Streaming parser exposes API for getting individual messages and how many bytes were read.
Streaming parser can now discard data to recover the connection state after malformed messages.

Benchmarking results

4% faster parsing, up to 10% less allocations.

                               │   old.txt   │               new.txt                │
                               │   sec/op    │    sec/op     vs base                │
ParserStream/NoChunks-16         2.500µ ± 3%   2.377µ ±  4%   -4.94% (p=0.004 n=10)
ParserStream/SingleRoutine-16    2.612µ ± 2%   2.518µ ±  1%   -3.62% (p=0.000 n=10)
ParserStream/Paralel-16          13.18µ ± 6%   13.33µ ±  6%        ~ (p=0.912 n=10)
Parser/SingleRoutine-16          2.843µ ± 4%   2.785µ ±  1%   -2.04% (p=0.000 n=10)
Parser/Paralel-16                14.21µ ± 5%   14.10µ ±  5%        ~ (p=0.239 n=10)
ParserNoData/New-16              2.120µ ± 1%   2.014µ ±  3%   -5.00% (p=0.000 n=10)

                               │    old.txt     │                new.txt                 │
                               │      B/op      │     B/op      vs base                  │
ParserStream/NoChunks-16         2.790Ki ± 0%     2.656Ki ± 0%   -4.80% (p=0.000 n=10)
ParserStream/SingleRoutine-16    2.845Ki ± 0%     2.656Ki ± 0%   -6.63% (p=0.000 n=10)
ParserStream/Paralel-16          3.021Ki ± 0%     2.661Ki ± 0%  -11.90% (p=0.000 n=10)
Parser/SingleRoutine-16          3.117Ki ± 0%     2.984Ki ± 0%   -4.26% (p=0.000 n=10)
Parser/Paralel-16                3.123Ki ± 0%     2.989Ki ± 0%   -4.28% (p=0.000 n=10)
ParserNoData/New-16              2.520Ki ± 0%     2.430Ki ± 0%   -3.57% (p=0.000 n=10)

                               │   old.txt    │              new.txt                │
                               │  allocs/op   │ allocs/op   vs base                 │
ParserStream/NoChunks-16         39.00 ± 0%     37.00 ± 0%  -5.13% (p=0.000 n=10)
ParserStream/SingleRoutine-16    41.00 ± 0%     37.00 ± 0%  -9.76% (p=0.000 n=10)
ParserStream/Paralel-16          41.00 ± 0%     37.00 ± 0%  -9.76% (p=0.000 n=10)
Parser/SingleRoutine-16          44.00 ± 0%     43.00 ± 0%  -2.27% (p=0.000 n=10)
Parser/Paralel-16                44.00 ± 0%     43.00 ± 0%  -2.27% (p=0.000 n=10)
ParserNoData/New-16              32.00 ± 0%     30.00 ± 0%  -6.25% (p=0.000 n=10)

Changes to Parser

Original parsing method is kept unchanged:

func (p *Parser) ParseSIP(data []byte) (Message, error)

Instead, a new method is introduced:

func (p *Parser) Parse(data []byte, stream bool) (Message, int, error)

This method now returns number of bytes used to parse the message. It always stops at the beginning of the line that caused a failure, which helps reuse the same underlying code for streaming mode.

The stream flag adjust the behaviour of the parser slightly, for example, in streaming mode the Content-Length is required and CRLF at the beginning of the message is silently skipped (RFC 3261 - 7.5).

This new method now always returns io.UnexpectedEOF if message was not read completely and more data is required. The old ParseSIP is not affected and still returns ParseEOF error.

Additionally, another new method is added for parsing the message without the body:

func (p *Parser) ParseHeaders(data []byte, stream bool) (Message, int, error)

This allows using the parser more efficiently in proxies that only act on the headers. It will not require allocating the body separately, instead, the proxy may write the headers followed by the body from the original buffer.

Changes to ParserStream

Existing methods for ParserStream are kept unchanged:

func (p *ParserStream) ParseSIPStream(data []byte) ([]Message, error)
func (p *ParserStream) ParseSIPStreamEach(data []byte, cb func(msg Message)) error

New method was added for reading messages separately:

func (p *ParserStream) Write(data []byte) (int, error)
func (p *ParserStream) ParseNext() (Message, int, error)

After using Write to append data to the internal buffer, ParseNext can be called multiple times to get SIP messages. As opposed to ParseSIPStream and ParseSIPStreamEach, the caller can decide when to stop reading the messages.

Under the hood, all ParserStream methods now reuse the methods for parsing headers and start line from the Parser, reducing code duplication and differences in behaviour.

Similar to the new Parse method, ParseNext returns the number of bytes that were used to read the message, or io.UnexpectedEOF in case the message was parsed only partially. Old methods like ParseSIPStreamEach still return ErrParseSipPartial instead.

Additionally, there are a few new methods to control the stream state:

func (p *ParserStream) Buffer() *bytes.Buffer
func (p *ParserStream) Discard(n int)
func (p *ParserStream) Reset()
func (p *ParserStream) Close()

Buffer allows the caller to examine the underlying parser state after an error. ParseNext will return the offset into this buffer at the beginning of the failed line.

Discard allows the caller to drop N first bytes and reset the parser to recover the remaining stream. This API assumes that the caller has some heuristic to decide how and when to recover.

Reset completely resets the internal state and the buffer, allowing reuse of the ParserStream.

Close is similar, but calling it is now required to reuse the underlying buffer for other parsers (using the Pool). Previously, the parser was always discarding the buffer after each message. Now the caller can control it by calling Reset or Close.

emiago · 2025-12-02T22:25:42Z

Like a bomb :), just we are almost at 1.0.0 and I am now not sure where to go with this.
I like the performance improvements and all and would like to merge but...

So do I understand you need some exposure here of Parser because you are running some manual parsing?
I would like to see is there need to expose this right now all, and give chance for merging prio 1.0.0

I have to look more, but here some first looks.
This I think could be better as 2 seperate functions. Reason is that caller has to know this upfront, and
error handling will anyway be determined by flag, but it can be underhood left the rest.

func (p *Parser) Parse(data []byte, stream bool) (Message, int, error)

Also btw reason for ParseSIP and not Parse was historifcal as we had interface usage :)

emiago · 2025-12-02T22:28:45Z

sip/parse_address.go


 // headerParserTo generates ToHeader
-func headerParserTo(headerName string, headerText string) (header Header, err error) {
+func headerParserTo(headerName []byte, headerText string) (header Header, err error) {


I could understand this comes from line buffer, but value is mostly not used, so not sure do we need this change anyway?

Ideally it would be great to switch values to byte slices - that should give even better performance. But that would be a very large change. So I only did header names for now.

Answering your question, tbh I'm not sure why it helps here specifically. As you mentioned, the string here is unused and is mostly generated from static strings in the lower case header names switch. So it should not matter if we do bytes here. But it does affect performance in the end.

probably removing is better, but we can deal with this after.

emiago · 2025-12-02T22:30:16Z

sip/parse_header.go

-
-	colonIdx := strings.Index(headerText, ":")
+// ParseHeader parses a SIP header from the line and appends it to out.
+func (headersParser HeadersParser) ParseHeader(out []Header, line []byte) ([]Header, error) {


I was avoiding to expose this, but I guess there is use case?

I'm quite happy with the current API having the ParseHeader which avoids the body, so we could unexport it back. The reason I still exposed it is just in case we'll need something more exotic for parsing, while still keeping compatibility with sipgo data structures. We'll need to call this directly then. Maybe the answer is similar to the one for NextLine - move it to another package like parser. Would that work?

No. Parser is just natural fit for sip package, like json or any other.

emiago · 2025-12-03T08:39:11Z

sip/parser.go

+// It returns io.ErrUnexpectedEOF is there's no CRLF (\r\n) in the data.
+// If there's a CR (\r) which is not followed by LF (\n), a ErrParseLineNoCRLF is returned.
+// As a special case, it returns io.EOF if data is empty.
+func NextLine(data []byte) ([]byte, int, error) {


Lets not have this package exposed

Btw, would it be more acceptable if I introduce a new parser package that will expose this method without polluting the sip package?

hi @dennwc Can you pls share, why this functions will benefit to have on API level. I am not sure how you use this, but user should be able to parse message without knowing/using some extra traversing. Exposing this I can not see any sense unless you are reimplementing parts of Parser?

No parser is just generic name, it is not Go way. sip is where things need to land. Still I do not see how will change things. Also to mention, I regret on other stuff not putting here, but was hard to change.

Anyway do not worry about naming, I can deal with this, just I want now to keep API on lower exposure and bring up things up more slowly

You are right, I was considering to use this functionality to build a message segmenter, that is something that will only mark message boundaries, but won't do the full parsing. But for now it's unused, so we could unexport in back and consider this change separately.

dennwc · 2025-12-03T09:28:57Z

First, thank you for a quick reply!

So do I understand you need some exposure here of Parser because you are running some manual parsing?

That's correct. We now have a separate SIP proxy that needs the parser, but not the other layers. And we've hit a few issues with it. This is why we'd appreciate a bit more flexibility for the parser, which this PR attempts.

I would like to see is there need to expose this right now all, and give chance for merging prio 1.0.0

No rush at all! We are totally fine with running this code from our own branch for now, until you are ready to upstream these changes in one form or another (1.1.0 maybe?).

This I think could be better as 2 seperate functions.

You mean having two separate parsers? PR still keeps ParseSIP method intact, so it doe use two different function for old vs new signatures and error semantics. Under the hood it's the same code.

emiago · 2025-12-16T21:02:58Z

@dennwc if you can pls rebase, but if you find some werid conflicts let me know. I had some maybe renaming of stuff. I want to consider merging this soon.

dennwc · 2025-12-18T15:04:00Z

@emiago rebase complete!

emiago · 2025-12-18T21:40:48Z

sip/parser_stream.go

+type parserState int

-var (
-	ParseMaxMessageLength = 65535


I wonder was this removed by your rebase. Will need to check

It's still there, just in a different file.

dennwc · 2025-12-23T10:02:26Z

Hey @emiago! Is anything else blocking this? I do not want to rush it, just let me know if there's anything else remaining here.

dennwc force-pushed the stream-parser branch from eef130c to 846f256 Compare December 2, 2025 20:32

emiago reviewed Dec 2, 2025

View reviewed changes

emiago reviewed Dec 3, 2025

View reviewed changes

Improve SIP parsers.

e4ce8b9

dennwc force-pushed the stream-parser branch from 846f256 to e4ce8b9 Compare December 18, 2025 14:57

emiago reviewed Dec 18, 2025

View reviewed changes

Correctly enforce max message size for streams. Still allow to recover.

b2c9e42

Uh oh!

Improve SIP message parsers #271

Are you sure you want to change the base?

Improve SIP message parsers #271

Uh oh!

Conversation

dennwc commented Dec 2, 2025

Benchmarking results

Changes to Parser

Changes to ParserStream

Uh oh!

emiago commented Dec 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emiago Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dennwc commented Dec 3, 2025

Uh oh!

emiago commented Dec 16, 2025

Uh oh!

dennwc commented Dec 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dennwc commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

emiago Dec 16, 2025 •

edited

Loading