Skip to content

Change order of arguments in read() function#54

Merged
mgeier merged 1 commit intomasterfrom
argument-order
Jul 10, 2014
Merged

Change order of arguments in read() function#54
mgeier merged 1 commit intomasterfrom
argument-order

Conversation

@mgeier
Copy link
Contributor

@mgeier mgeier commented Jul 2, 2014

I think this way the use cases for positional arguments are more meaningful (see commit message).

I'm not 100% sure, though ...

Any comments?

@bastibe
Copy link
Owner

bastibe commented Jul 3, 2014

I think that it is reasonable to expect the second argument of read to be the number of frames to read. I just had a quick search through the Python standard library, and pretty much all read functions have size as their first argument. I think we should stick to that.

That said, only reading the first few samples of an audio file is probably not all that useful. I would guess that the most common use cases would be either

read('filename.wav')

or one of

read('filename.wav', start=44100, stop=88200)
read('filename.wav')[44100:88200]  # ugh
open('filename.wav')[44100:88200]    

Although this might be a big contender, too:

read('filename.wav', start=88200, frames=1024)

@mgeier
Copy link
Contributor Author

mgeier commented Jul 3, 2014

That said, only reading the first few samples of an audio file is probably not all that useful.

That was exactly what I thought.
It's actually quite useless.

I think there will always be a range of 2 values start/stop or start/frames.
And I think it is a good idea to force users to use the named arguments, because having two unnamed numerical arguments will inevitably cause some bugs.

You're right that probably all Python-related read() methods have a number as their first argument.
But the objects we are reading from have some kind of state, namely a current read position.
So having frames as first argument totally makes sense in SoundFile.read(), but the module-level function read() has a quite different behavior.

@bastibe
Copy link
Owner

bastibe commented Jul 4, 2014

How about rearranging it to read(filename, start, stop, ...) then? This is probably a very reasonable order for the function arguments. read(filename, start, frames, ...) might be useful, too.

To avoid confusion between start, stop and start, frames, it might be preferable to delete frames altogether and only use start and stop.

This would enable things like this:

read('filename', 0, 1024) # first 1024 frames
read('filename', -1024) # last 1024 frames
read('filename', 1024, -1024) # all but the first and last 1024 frames

I like that!

On the other hand, this feels so much like indexing that this is probably more pythonic:

open('filename')[:1024] # first 1024 frames
open('filename')[-1024:] # last 1024 frames
open('filename')[1024:-1024] # all but the first and last 1024 frames

@mgeier
Copy link
Contributor Author

mgeier commented Jul 5, 2014

I think that start/frames can be very useful in addition to start/stop. I assume that when I want a part of a file, I'm normally interested in how long this part is rather than before which frame index it stops.
Therefore, I wouldn't want to remove the frames argument.
But I wouldn't remove stop either, because this has also valid use cases.

I also think that it's a really good idea to use named arguments in this case. This avoids confusion and makes the code much easier to read.
I think it's also a good idea to make them basically keyword-only arguments (by moving them further back the argument list).
If we keep the frames argument, I think it would be a bad idea to even allow positional arguments because of the frames/start/stop confusion.

Your examples are quite hard to read (without the comments), let me try to compare them with their keyword counterparts:

read('filename', 0, 1024) # first 1024 frames
# vs.
read('filename', frames=1024)
# vs.
read('filename', start=0, frames=1024)

The last alternative is probably the one that will be mostly used in real code (with changing start).

read('filename', -1024) # last 1024 frames
# vs.
read('filename', start=-1024)

Here the second one is definitely clearer.

read('filename', 1024, -1024) # all but the first and last 1024 frames
# vs.
read('filename', start=1024, stop=-1024)

Again, I like the second alternative more.

I would even be willing to type the argument names in an interactive session, because the 5 characters start= just make everything much clearer.
And I would probably load the whole file at once in most interactive use cases anyway.

Surprisingly many Python-Zen guidelines suggest (forced) named arguments:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Readability counts.
  • In the face of ambiguity, refuse the temptation to guess.

Having said all that, I think that the order of read() arguments can still be improved by making all function signatures consistent:

f = sf.open('myfile.wav', 'w', 44100, 2, 'FLOAT')
data = sf.read('myfile.raw', 44100, 2, 'FLOAT')
sf.write(data, 'another_file.wav', 44100, 'FLOAT')
blocks = sf.blocks('newfile.wav', 'w', 44100, 2, 'FLOAT')

I think that's the most consistent it can get, or am I missing something?
Of course it's not perfect because read()/write() don't have mode and write() has an additional data argument and no channels, but I think it's still pretty consistent.

This would mean that dtype cannot be used anymore as positional argument, but probably that's not so bad after all, because it is quite idiomatic in NumPy to write something like dtype='float32'.

@mgeier
Copy link
Contributor Author

mgeier commented Jul 5, 2014

I think the situation is much more obvious in case of the blocks() function (that's also the reason why I suggested this PR). See #35.

BTW, I forgot to add the obligatory blocksize argument in my above example:

blocks = sf.blocks('newfile.wav', 'w', 44100, 2, 'FLOAT', blocksize=1024)

I think it's best to force blocksize to be a keyword argument, because this leads to much clearer user code.
And once blocksize is a keyword argument, it doesn't make sense to make any of start/stop/frames positional arguments.
And to make all of them positional arguments would be even more confusing than it would be with the read() function.
And of course making them positional in one function and keyword in another would add even more confusion.

@bastibe
Copy link
Owner

bastibe commented Jul 7, 2014

We should definitely strive for consistent function arguments.

In my work, I tend to open files more often for reading than for writing, and usually try to have PySoundFile auto-guess the file meta data (fs, channels, data types). I don't think I ever specified more than the sample rate and the number of channels.

Thus, realistically, I would expect a more typical use case for me to look something like this:

sf.write(data, 'myfile.wav', 44100)
data = sf.read('myfile.flac')
for block in sf.blocks('myfile.flac', 'rw'):
    block[:] = block*hann(len(block))

I think that these are the use cases we should focus most on. In particular, I would say that the data formats should go last. I have never had any use case for them -- I prefer MAT-files for float data, since WAV/FLAC files that are not playable in my audio players are pretty useless to me.

However, if I were to use a special format, I would most likely do something like

format = {'format':'RAW', 'subtype':'FLOAT', 'endian':'FILE'}
sf.write(data, 'myfile.raw', **format)
data = sf.read('myfile.raw', dtype='float32', **format)

This is largely tangential to the issue at hand though. In the absence of any agreeable order for start, stop, and frames, I agree that they should be keyword-only (i.e. late in the arguments). The downside to this is that they are somewhat less discoverable, since arguments are usually ordered by importance.

This way the order is more consistent with the open()/write() functions.

It is no longer possible to write this, however:

    data, fs = sf.read('myfile.wav', 1024)

This might be a somehow frequent use case, but it might be unclear what
the numeric argument means ('frames' or 'start' or ...?).
It will lead to clearer user code if we force a keyword argument, e.g.:

    data, fs = sf.read('myfile.wav', start=31542, frames=1024)

As a side effect, the settings for RAW files can now be specified as
positional arguments:

    data, fs = sf.read('myfile.raw', 44100, 2, 'FLOAT')
@mgeier
Copy link
Contributor Author

mgeier commented Jul 9, 2014

Your examples are indeed more realistic (except the blocks-thing ... and I guess you meant a for loop instead of a with statement ... but anyway ...).

I like the **format stuff, we should add this somewhere to the documentation.

I changed the order again as discussed.

Now the arguments of open() come first, then start/stop/frames, then the arguments of read().

Is this OK?

@bastibe
Copy link
Owner

bastibe commented Jul 10, 2014

except the blocks-thing

You are right of course, these should have been for-loops. Somehow, I make this error constantly in Github comments, yet never in actual code. Weird.

The argument order looks good to me!

@bastibe bastibe mentioned this pull request Jul 10, 2014
mgeier added a commit that referenced this pull request Jul 10, 2014
Change order of arguments in read() function
@mgeier mgeier merged commit a4cc2b8 into master Jul 10, 2014
@mgeier mgeier deleted the argument-order branch July 10, 2014 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants