I have started working on this, but I wanted to post it as an issue so it could be discussed too.
In the WorldJam community, there has recently been a significant problem with audio quality between clients using Virgin Media as an ISP and the WorldJam servers at OVH. The audio sent from client to server is severely burbled, although the client hears everyone else clearly.
Some tests were conducted on Monday by @sthenos and others using a test program to send streams of UDP packets containing serial numbers from the VM client to the OVH host. These tests demonstrated that the UDP packets were arriving out of order. Our theory is that there is some kind of multipath load-balancing routing happening, where one of the paths is slower than the other.
It is likely to be difficult to get either Virgin or OVH to investigate this deeply for us.
My own view is that out-of-order packet delivery is part and parcel of the unreliable nature of UDP.
Currently, the Jamulus jitter buffer is a simple ring buffer, and expects all audio packets to arrive in the order sent. In the protocol, there is no sequence numbering of audio packets to enable out-of-order packets to be re-ordered correctly.
My plan is to enhance the jitter buffer to re-order out of order packets instead of assuming they are all in the correct order. Protocols like RTP (used with SIP for VoIP calls) do indeed include sequence numbering for exactly that reason. I'm not thinking of actually using RTP, but something much simpler. Just a 3-byte header with 0xFF flag and 2-byte sequence number (counting in bytes) prepended to the existing audio data. The existing Jamulus data never has 0xFF as the first byte, so the server would only treat specially any packets starting with 0xFF. All other traffic, e.g. from older clients, would be processed exactly as at present. If the server sees the 0xFF, it would get the sequence number and use that to determine where in the jitter buffer to put the remaining audio data in the packet (and if it's too old, discard it). Also, once the server had seen the 0xFF flag from a client, it would know that it could send data back to that client in the same way. Otherwise (for older clients) it would send audio unsequenced as at present.
If the sequence number in a received audio packet was higher than expected, that would indicate missing data. We would need to fill the skipped space with silence, in case the missing data never arrived in time. If it did subsequently arrive, it could be placed in the JB at the correct place, overwriting the silence.
I have started working on this, but I wanted to post it as an issue so it could be discussed too.
In the WorldJam community, there has recently been a significant problem with audio quality between clients using Virgin Media as an ISP and the WorldJam servers at OVH. The audio sent from client to server is severely burbled, although the client hears everyone else clearly.
Some tests were conducted on Monday by @sthenos and others using a test program to send streams of UDP packets containing serial numbers from the VM client to the OVH host. These tests demonstrated that the UDP packets were arriving out of order. Our theory is that there is some kind of multipath load-balancing routing happening, where one of the paths is slower than the other.
It is likely to be difficult to get either Virgin or OVH to investigate this deeply for us.
My own view is that out-of-order packet delivery is part and parcel of the unreliable nature of UDP.
Currently, the Jamulus jitter buffer is a simple ring buffer, and expects all audio packets to arrive in the order sent. In the protocol, there is no sequence numbering of audio packets to enable out-of-order packets to be re-ordered correctly.
My plan is to enhance the jitter buffer to re-order out of order packets instead of assuming they are all in the correct order. Protocols like RTP (used with SIP for VoIP calls) do indeed include sequence numbering for exactly that reason. I'm not thinking of actually using RTP, but something much simpler. Just a 3-byte header with 0xFF flag and 2-byte sequence number (counting in bytes) prepended to the existing audio data. The existing Jamulus data never has 0xFF as the first byte, so the server would only treat specially any packets starting with 0xFF. All other traffic, e.g. from older clients, would be processed exactly as at present. If the server sees the 0xFF, it would get the sequence number and use that to determine where in the jitter buffer to put the remaining audio data in the packet (and if it's too old, discard it). Also, once the server had seen the 0xFF flag from a client, it would know that it could send data back to that client in the same way. Otherwise (for older clients) it would send audio unsequenced as at present.
If the sequence number in a received audio packet was higher than expected, that would indicate missing data. We would need to fill the skipped space with silence, in case the missing data never arrived in time. If it did subsequently arrive, it could be placed in the JB at the correct place, overwriting the silence.