Skip to content

Adds raw_read/raw_write to exposed API#182

Closed
tgarc wants to merge 5 commits intobastibe:masterfrom
tgarc:raw_readwrite
Closed

Adds raw_read/raw_write to exposed API#182
tgarc wants to merge 5 commits intobastibe:masterfrom
tgarc:raw_readwrite

Conversation

@tgarc
Copy link
Contributor

@tgarc tgarc commented Jan 8, 2017

adds support for reading/writing files directly from byte buffers for additional 'dtype' formats:

  • int24
  • int8
  • uint8

Notes:

  • raw_read/write only supports dtype argument since 'int24' has no native ctype
  • much of the functionality of _check_buffer/_cdata_io/_check_dtype has been rewritten into these functions specifically because the current functions aren't intended to handle int24 types

tgarc added 5 commits January 8, 2017 15:18
+ adds support for reading/writing files directly from byte buffers for
  additional 'dtype' formats:
    - int24
    - int8
    - uint8

Notes:
  + raw_read/write only supports dtype argument since 'int24' has no native c
    type
  + much of the functionality of _check_buffer/_cdata_io/_check_dtype has been
    rewritten into these functions specifically because the current functions
    aren't intended to handle int24 types
out[frames:] = fill_value
return out

def read_raw(self, frames=-1, dtype=None):
Copy link
Owner

@bastibe bastibe Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the dtype parameter is confusing here. Maybe instead of having both frames and dtype a simple numbytes would be preferable. At any rate, dtype can't be an argument, since the file itself has a non-mutable datatype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thought too but the raw_read/write functions actually require that you specify a number of bytes that is a multiple of the audio file frame size. From the libsndfile docs:

The number of bytes read or written must always be an integer multiple of the number of channels multiplied by the number of bytes required to represent one sample from one channel.

Hence, it made more sense to make the interface this way.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't the dtype given by the audio file?

Returns
-------
buffer
A buffer containing the read data.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want pysoundfile to leak CFFI data structures. I would much prefer a bytes or bytearray object instead of a cffi buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bastibe I don't think there is reason for concern here. The repr() mentions some CFFI type, but it is really just like any other Python buffer object.

I think it's the right thing to use buffer objects, since those are the lowest level Python data structures and supported by many built-in and third-party libraries.

Many things might also work with bytearray(), but I see no advantage in wrapping the buffers in bytearrays.

@bastibe
Copy link
Owner

bastibe commented Jan 9, 2017

Thank you for the pull request.

What is your use case for this? Why would you ever want to read the raw binary data instead of numpy arrays?

assert written == len(data)
self._update_len(written)

def write_raw(self, data, dtype=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need a dtype, see above.


return _ffi.buffer(cdata)

def read_raw_into(self, buffer, dtype=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need a dtype, see above.

@tgarc
Copy link
Contributor Author

tgarc commented Jan 9, 2017

@bastibe My use case for this is really for streaming audio data using sounddevice. The buffer_read/write functions are what I typically use, the only real reason for using raw_read/write is to be able to directly read/write 24 bit audio data. (without upcasting/downcasting). It's definitely kind of a corner case since I believe the OS will typically handle downcasting 32bit data to 24bit when writing to audio devices but since the functionality was already there in libsndfile I thought it was worth exposing it.

Said another way, the raw_read/write functions are the closest thing to accessing the bytes directly from an open sndfile object.

@mgeier
Copy link
Contributor

mgeier commented Jan 9, 2017

Just for reference, there has already been an issue about this topic: #25.

I also tried to incorporate sf_read_raw() and sf_write_raw() in #72, but I didn't find a meaningful way to do this. So I skipped them.

Regarding API, note that the word "raw" is quite ambiguous in the context of libsndfile, and since you are using buffers, there is also a high potential for confusion with buffer_read() and buffer_write().

Also note that sf_read_raw() and sf_write_raw() work only on a subset of the supported file types (I don't know exactly, probably only the RIFF-based types?), therefore I don't think it's worth supporting it.

Did you try using the wave and/or aiff modules from the standard library?
They are quite limited, but if you need packed 24bit data in memory, they could actually be the right thing to use?

What are your concerns regarding conversion from 24bit integer to 32bit float?
Speed, size or accuracy?
IIRC correctly, the conversion is lossless, and I could imagine that the speed difference might be negligible.
And unless you are loading huge files into memory (instead of streaming them from disk), the size difference shouldn't matter that much either, right?

@tgarc
Copy link
Contributor Author

tgarc commented Jan 10, 2017

@mgeier Thanks for pointing out those discussions; I hadn't realized you'd try to implement this before. I think I had the same idea as you in that I thought it would be good to have a simple read_bytes kind of functionality. But as you've stated earlier (and is mentioned in the sndfile docs) raw_read and raw_write only work for a subset of audio formats. There's also a dirty little caveat in the sndfile docs:

Note : The result of using of both regular reads/writes and raw reads/writes on compressed file formats other than SF_FORMAT_ALAW and SF_FORMAT_ULAW is undefined.

So in general it's not as 'user-friendly' as the other functions. On the other hand it still provides a way to read bytes directly from several standard audio formats (which ones I'm not entirely sure yet).

I'd like to do some more testing with this and see how many formats are actually supported. I understand the concerns about adding something to the API which has incomplete functionality though.

@bastibe
Copy link
Owner

bastibe commented Jan 16, 2017

Wouldn't it be easier to stream the whole file over the network, and open the receiving socket with pysoundfile?

@tgarc
Copy link
Contributor Author

tgarc commented Jan 22, 2017

That makes sense but I'm actually wanting to stream pcm audio to an audio device.

@bastibe
Copy link
Owner

bastibe commented Jan 22, 2017

Why not open the file without SoundFile, and just skip the header before streaming?

The point of SoundFile is to be able to decode audio files. If you want to explicitly not decode them, why use SoundFile?

I'm sorry, but I am going to reject this pull request. A good library is a library that does one minimal job, and while this is certainly a worthwhile functionality, I don't think that it is a good fit for this library.

@bastibe bastibe closed this Jan 22, 2017
@mgeier mgeier mentioned this pull request Mar 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants