Adds raw_read/raw_write to exposed API#182
Conversation
+ adds support for reading/writing files directly from byte buffers for
additional 'dtype' formats:
- int24
- int8
- uint8
Notes:
+ raw_read/write only supports dtype argument since 'int24' has no native c
type
+ much of the functionality of _check_buffer/_cdata_io/_check_dtype has been
rewritten into these functions specifically because the current functions
aren't intended to handle int24 types
| out[frames:] = fill_value | ||
| return out | ||
|
|
||
| def read_raw(self, frames=-1, dtype=None): |
There was a problem hiding this comment.
I think that the dtype parameter is confusing here. Maybe instead of having both frames and dtype a simple numbytes would be preferable. At any rate, dtype can't be an argument, since the file itself has a non-mutable datatype.
There was a problem hiding this comment.
That was my initial thought too but the raw_read/write functions actually require that you specify a number of bytes that is a multiple of the audio file frame size. From the libsndfile docs:
The number of bytes read or written must always be an integer multiple of the number of channels multiplied by the number of bytes required to represent one sample from one channel.
Hence, it made more sense to make the interface this way.
There was a problem hiding this comment.
But isn't the dtype given by the audio file?
| Returns | ||
| ------- | ||
| buffer | ||
| A buffer containing the read data. |
There was a problem hiding this comment.
I do not want pysoundfile to leak CFFI data structures. I would much prefer a bytes or bytearray object instead of a cffi buffer.
There was a problem hiding this comment.
@bastibe I don't think there is reason for concern here. The repr() mentions some CFFI type, but it is really just like any other Python buffer object.
I think it's the right thing to use buffer objects, since those are the lowest level Python data structures and supported by many built-in and third-party libraries.
Many things might also work with bytearray(), but I see no advantage in wrapping the buffers in bytearrays.
|
Thank you for the pull request. What is your use case for this? Why would you ever want to read the raw binary data instead of numpy arrays? |
| assert written == len(data) | ||
| self._update_len(written) | ||
|
|
||
| def write_raw(self, data, dtype=None): |
|
|
||
| return _ffi.buffer(cdata) | ||
|
|
||
| def read_raw_into(self, buffer, dtype=None): |
|
@bastibe My use case for this is really for streaming audio data using sounddevice. The buffer_read/write functions are what I typically use, the only real reason for using raw_read/write is to be able to directly read/write 24 bit audio data. (without upcasting/downcasting). It's definitely kind of a corner case since I believe the OS will typically handle downcasting 32bit data to 24bit when writing to audio devices but since the functionality was already there in libsndfile I thought it was worth exposing it. Said another way, the raw_read/write functions are the closest thing to accessing the bytes directly from an open sndfile object. |
|
Just for reference, there has already been an issue about this topic: #25. I also tried to incorporate Regarding API, note that the word "raw" is quite ambiguous in the context of libsndfile, and since you are using buffers, there is also a high potential for confusion with Also note that Did you try using the What are your concerns regarding conversion from 24bit integer to 32bit float? |
|
@mgeier Thanks for pointing out those discussions; I hadn't realized you'd try to implement this before. I think I had the same idea as you in that I thought it would be good to have a simple read_bytes kind of functionality. But as you've stated earlier (and is mentioned in the sndfile docs) raw_read and raw_write only work for a subset of audio formats. There's also a dirty little caveat in the sndfile docs:
So in general it's not as 'user-friendly' as the other functions. On the other hand it still provides a way to read bytes directly from several standard audio formats (which ones I'm not entirely sure yet). I'd like to do some more testing with this and see how many formats are actually supported. I understand the concerns about adding something to the API which has incomplete functionality though. |
|
Wouldn't it be easier to stream the whole file over the network, and open the receiving socket with pysoundfile? |
|
That makes sense but I'm actually wanting to stream pcm audio to an audio device. |
|
Why not open the file without SoundFile, and just skip the header before streaming? The point of SoundFile is to be able to decode audio files. If you want to explicitly not decode them, why use SoundFile? I'm sorry, but I am going to reject this pull request. A good library is a library that does one minimal job, and while this is certainly a worthwhile functionality, I don't think that it is a good fit for this library. |
adds support for reading/writing files directly from byte buffers for additional 'dtype' formats:
Notes: