Skip to content

Support libsndfile’s virtual file interface#1

Merged
bastibe merged 17 commits intobastibe:masterfrom
davidblewett:master
Nov 29, 2013
Merged

Support libsndfile’s virtual file interface#1
bastibe merged 17 commits intobastibe:masterfrom
davidblewett:master

Conversation

@davidblewett
Copy link
Contributor

Documented here: http://www.mega-nerd.com/libsndfile/api.html#open_virtual .

I have need of being able to decode FLAC files from memory, so I added support for the virtual file interface in libsndfile.

  • Pass a file-like object and set virtual_io=True in the constructor
  • If the file-like object represents a stream of data such that seeking would be prohibitive,
    set stream=True. The file-like object must implement its own __len__ in this case.

This allows you to do the following:

from StringIO import StringIO
from pysoundfile import SoundFile, flac_file
fname = '/Users/davidb/src/01.flac'
fObj = StringIO(open(fname).read())
flac = SoundFile(fObj, format=flac_file, virtual_io=True)

…/libsndfile/api.html#open_virtual .

* Pass a file-like object and set virtual_io=True in the constructor
* If the file-like object represents a stream of data such that seeking would be prohibitive,
  set stream=True. The file-like object must implement its own __len__ in this case.
@bastibe
Copy link
Owner

bastibe commented Nov 22, 2013

Awesome! This is very cool! Thank you so much for contributing this!

I have a few questions though, which I annotated in your commit. I would be grateful if you could comment on those.

@davidblewett
Copy link
Contributor Author

I will try to do a bit more on this tonight; this is my first foray into using cffi. I agree about the stream argument; I will adjust that logic.

@bastibe
Copy link
Owner

bastibe commented Nov 23, 2013

I tried this on my own machine, and there was a weird issue: It wouldn't call the fObj.__len__ function when running len(fObj) for an io.BytesIO object in Python 3.3. Changing that particular line to fObj.__len__() solved the issue. Still, this is very weird.

Do you have any idea what might be causing this behavior?

Apart from that, it works great though!

Would you be willing to update the README as well?

@bastibe
Copy link
Owner

bastibe commented Nov 23, 2013

By the way, Python3 requires quite a different example code than what you wrote:

# Python2, as in your pull request
from StringIO import StringIO
from pysoundfile import SoundFile, flac_file
fObj = StringIO(open('filename.flac').read())
flac = SoundFile(fObj, format=flac_file, virtual_io=True) # Actually, specifying format is not necessary here

# Python3
from io import BytesIO
from pysoundfile import SoundFile
fObj = BytesIO(open('filename.flac', 'rb').read())
flac = SoundFile(fObj, virtual_io=True)

@davidblewett
Copy link
Contributor Author

Abusing __len__ like that seemed a bit hackish to me, so I instead switched to a pre-calculated _length attribute.

I also updated the README with your example (since yours works on 2.x as well, I just used it). I included an example of an HTTP request as well.

@bastibe
Copy link
Owner

bastibe commented Nov 23, 2013

OK, I like it! If you want to, we can merge it like this!

However, I just had an idea:

In the if virtual_io branch, you build this SF_VIRTUAL_IO data structure that is populated by some global functions. These global functions refer back to fObj through a void pointer user_data that is passed to C and converted back to the Python object in the callbacks.

However, this is Python after all, so we don't actually need to use the user_data. Instead, we could populate the SF_VIRTUAL_IO structure with locally created functions that capture the fObj in their closure. This would eliminate all the new_handle and from_handle business and the non-user-facing global functions. Furthermore, those functions could be tailor-made so they do the right thing with the __len__ function without having to mutate fObj.

As I said though, I think the current solution is good enough for a merge. Just tell me if you want to merge now or try to implement the callbacks as dynamically created functions first.

Again, thank you very very much for your help! I truly appreciate it!

@davidblewett
Copy link
Contributor Author

I can look at converting to dynamic functions. I might have to rip out the streaming stuff; I just ran some tests, and it appears that libsndfile requires being able to seek in the files. It reads the first 12 bytes (to read the header I imagine), then tries to rewind to the beginning.

@davidblewett
Copy link
Contributor Author

@bastibe: Is this what you were thinking of?

@bastibe
Copy link
Owner

bastibe commented Nov 28, 2013

Yes, very much so! I like it!

See comments in the source code for more elaboration.

@bastibe
Copy link
Owner

bastibe commented Nov 28, 2013

If you want to just merge at any point, I can take it from there. I fear that I'm asking too much...

@davidblewett
Copy link
Contributor Author

Okay, I will take a look at splitting it out to a distinct method. Also, I think I'm going to remove the stream logic, as it appears that libsndfile needs to be able to rewind at least once.

@davidblewett
Copy link
Contributor Author

Unfortunately, with these changes I'm getting a segfault when using the 2nd example from the README (HTTP request decoding):

In [1]: from io import BytesIO

In [2]: from pysoundfile import SoundFile

In [3]: import requests

In [4]: fObj = BytesIO()

In [5]: response = requests.get('http://example.com/my.flac', stream=True)

In [6]: for data in response.iter_content(4096):
   ...:         if data:
   ...:                 fObj.write(data)
   ...:

In [7]: fObj.seek(0)
Out[7]: 0L
flac = SoundFile(fObj, virtual_io=True)
In [8]: flac = SoundFile(fObj, virtual_io=True)

In [9]: flac_frames = flac.read(1024)
zsh: segmentation fault  ipython

@davidblewett
Copy link
Contributor Author

Seems to be one of my test files. If I copy it locally and use the first example, I get the same segfault. However, it worked fine before these changes.

@bastibe
Copy link
Owner

bastibe commented Nov 28, 2013

I think the thing is that the CFFI does not increment the reference counts of objects, so if the only place you store something in is a CFFI object, it will get garbage collected.

In your code, this is currently happeing in two places: self._info (only POD, no problem) and self._vio. The latter is what is probably causing the crashes you are seeing. The functions you are storing in the vio struct are getting garbage collected.

I would propose to save _vio, but as a dict instead of a CFFI object. You can construct the CFFI object simply by passing the dict to cffi.new("type", dict). CFFI will auto-read the dict into the struct if the keys are the same.

The CFFI documentation has this example:

typedef struct { int x, y; } foo_t;

foo_t v = { 1, 2 };            // C syntax
v = ffi.new("foo_t *", [1, 2]) # CFFI equivalent

foo_t v = { .y=1, .x=2 };                // C99 syntax
v = ffi.new("foo_t *", {'y': 1, 'x': 2}) # CFFI equivalent

@davidblewett
Copy link
Contributor Author

Looks like you were right; I'm no longer seeing the segfaults with this version.

@davidblewett
Copy link
Contributor Author

I think this is looking merge ready. Thanks for helping me fumble through cffi.

@bastibe bastibe merged commit 5dc5dec into bastibe:master Nov 29, 2013
@bastibe
Copy link
Owner

bastibe commented Nov 29, 2013

Thank you very much for contributing this code! It is already up on PyPi and I will update the binary installers soon.

@mgeier mgeier mentioned this pull request Mar 10, 2014
mgeier added a commit that referenced this pull request May 18, 2014
mgeier added a commit that referenced this pull request May 20, 2014
mgeier added a commit that referenced this pull request Dec 17, 2014
bastibe pushed a commit that referenced this pull request Oct 10, 2016
Add the number of samples to the `info` class.
bastibe pushed a commit that referenced this pull request Apr 28, 2025
Add type hints to all public methods, functions and classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants