Problem 1 — Multi-channel WAV files produce garbled audio
The WAV header parser (get_wav_header_size in rtpstream.cpp) only scans for the start of the data chunk and completely ignores the fmt chunk. As a result:
NumChannels, BitsPerSample, and SampleRate are never read from the file.
When a stereo or multi-channel WAV is supplied, the interleaved L/R samples are forwarded to the RTP sender as raw mono bytes, producing heavily distorted audio.
bytes_per_packet is set for mono (e.g. 160 bytes for PCMU/8000), so a 2-channel file is consumed at half the correct data rate — playback takes 2× longer than the actual audio duration (4× for stereo 16-bit).
Problem 2 — Audio file size capped at ~2 GB
filesize in cached_file_t / cached_pattern_t, and the playback tracking fields in taskentry_t (audio_file_num_bytes, audio_file_bytes_left, new_audio_file_size, and video equivalents) are all declared as int (32-bit signed). This caps supported file size at ~2 GB. For higher-rate or multi-channel formats this limit can be reached much sooner.
Fix
Replaced get_wav_header_size() with parse_wav_header() which reads the fmt chunk to extract NumChannels, BitsPerSample, and SampleRate.
Added wav_downmix_to_mono() which averages all channels to mono at cache time (supports 8-bit unsigned and 16-bit signed PCM).
Added wav_processed flag to cached_file_t so rtpstream_play() skips any further header processing for already-converted files.
Changed filesize and all playback tracking fields from int to int64_t.
Note on contribution
I don't have push access to the main branch, so the fix has been pushed to a separate branch. Happy to open a PR from there for review.
Problem 1 — Multi-channel WAV files produce garbled audio
The WAV header parser (get_wav_header_size in rtpstream.cpp) only scans for the start of the data chunk and completely ignores the fmt chunk. As a result:
NumChannels, BitsPerSample, and SampleRate are never read from the file.
When a stereo or multi-channel WAV is supplied, the interleaved L/R samples are forwarded to the RTP sender as raw mono bytes, producing heavily distorted audio.
bytes_per_packet is set for mono (e.g. 160 bytes for PCMU/8000), so a 2-channel file is consumed at half the correct data rate — playback takes 2× longer than the actual audio duration (4× for stereo 16-bit).
Problem 2 — Audio file size capped at ~2 GB
filesize in cached_file_t / cached_pattern_t, and the playback tracking fields in taskentry_t (audio_file_num_bytes, audio_file_bytes_left, new_audio_file_size, and video equivalents) are all declared as int (32-bit signed). This caps supported file size at ~2 GB. For higher-rate or multi-channel formats this limit can be reached much sooner.
Fix
Replaced get_wav_header_size() with parse_wav_header() which reads the fmt chunk to extract NumChannels, BitsPerSample, and SampleRate.
Added wav_downmix_to_mono() which averages all channels to mono at cache time (supports 8-bit unsigned and 16-bit signed PCM).
Added wav_processed flag to cached_file_t so rtpstream_play() skips any further header processing for already-converted files.
Changed filesize and all playback tracking fields from int to int64_t.
Note on contribution
I don't have push access to the main branch, so the fix has been pushed to a separate branch. Happy to open a PR from there for review.