Skip to content

Implement info function#58

Merged
bastibe merged 6 commits intomasterfrom
info-function
Oct 20, 2015
Merged

Implement info function#58
bastibe merged 6 commits intomasterfrom
info-function

Conversation

@bastibe
Copy link
Owner

@bastibe bastibe commented Oct 16, 2015

It is often desirable to get information about a sound file without reading all its data.

@mgeier
Copy link
Contributor

mgeier commented Aug 14, 2014

Would this information be meant for a human to read or for re-use in a Python program?

In the latter case I don't think a separate function is necessary, because it's easy to open the file in a context manager and get any information you want.

In the former case this might be useful, but we'll have to choose what information to present and how to format it.

I think the only way this makes sense, is to print() the information on the terminal, because returning a string with the information isn't convenient enough and returning some tuple of things even less.

We'll have to answer the following questions:

  1. how should this function be called?

sf.print_file_info(), sf.info(), sf.print_info(), ...?

  1. what information should be shown?

File name, sampling rate, channels, format string, subtype string, SFC_GET_LOG_INFO (see #23) ...?

  1. how should it be formatted?

The possibilities are basically infinite ...

@bastibe
Copy link
Owner Author

bastibe commented Aug 14, 2014

I was thinking of a top-level function much like sf.open or sf.read, that returns a dictionary of information.

Probably something like

import pysoundfile as sf
sf.info('myfile.wav')
>>> { "format": "WAV",
      "subtype": "PCM_16",
      "endian": "FILE",
      "samplerate": 44100,
      "channels": 2,
      "length": 88200 }

@mgeier mgeier modified the milestone: 0.7 Oct 26, 2014
@mgeier mgeier mentioned this pull request Dec 26, 2014
mgeier added a commit that referenced this pull request Jan 9, 2015
OK, I'm merging this.

> I don't see why we need `format` should require `subtype` and `endian`, but if you insist on it, we'll do it that way. I'd rather have always too much information than a different amount of information for different files.

If we get annoyed by the large amount of information, we can still remove the `format`/`subtype`/`endian` triple at a later point.

Also, let's see if the implementation of #58 sheds a different light on this.
@bastibe bastibe modified the milestones: 0.6.x, future Mar 4, 2015
@mgeier mgeier modified the milestone: future Mar 5, 2015
@mgeier
Copy link
Contributor

mgeier commented Apr 28, 2015

This should probably help people to get information about marker times and labels for WAV files.
See http://stackoverflow.com/q/29895360/500098

mgeier added a commit that referenced this pull request Oct 1, 2015
@bastibe
Copy link
Owner Author

bastibe commented Oct 2, 2015

I would prefer info to be machine-readable. There is a lot of interesting information in that information string, though.

@mgeier mgeier modified the milestones: 0.8.0, future Oct 2, 2015
mgeier added a commit that referenced this pull request Oct 2, 2015
@bastibe bastibe self-assigned this Oct 4, 2015
mgeier added a commit that referenced this pull request Oct 4, 2015
mgeier added a commit that referenced this pull request Oct 4, 2015
@bastibe
Copy link
Owner Author

bastibe commented Oct 16, 2015

What do you think about this implementation?

I also tried making it a dict subclass, but that was confusing in practice. The only issue I have with the current implementation is that the __repr__ is usually meant to be somewhat machine-readable, while __str__ is supposed to be human-readable. Still, the current implementation is awfully convenient.

@mgeier
Copy link
Contributor

mgeier commented Oct 17, 2015

The idea with __repr__() works really well, I like it!
Now it can be actually used by machines and by humans. I didn't think this was possible ...

Did you consider making a namedtuple?
This wouldn't make a huge difference, but it has some advantages: immutability, automatic conversion to OrderedDict.
Are there any disadvantages?

There are a few formatting things which I don't like ...

The indentation is annoying, please remove it.

The SoundFile() stuff isn't necessary. I think it should be a plain list, it can simply start with the name property, no need for any framing/decoration.

I don't really like the equals signs, but that's of course very much a matter of taste.
But if you keep them, you should add spaces on both sides.
I think I would prefer colons, but I find spaces before them ugly (as in extra_info).

The commas at the line endings are unnecessary and distracting.

The extra_info should also not be indented.
Probably we should have some separation line between the other (one-line) properties and the extra_info lines?

I think I would display the format strings as strings (e.g. {0.subtype!r}) and it would probably look better if you drop the parentheses around format_info and subtype_info.

For comparison, this is how the output of sndfile-info may look like:

$ sndfile-info myfile.wav 
========================================
File : myfile.wav
Length : 456748
RIFF : 456740
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 2
  Sample Rate   : 44100
  Block Align   : 4
  Bit Width     : 16
  Bytes/sec     : 176400
data : 456704
End

----------------------------------------
Sample Rate : 44100
Frames      : 114176
Channels    : 2
Format      : 0x00010002
Sections    : 1
Seekable    : TRUE
Duration    : 00:00:02.589
Signal Max  : 32630 (-0.04 dB)

__repr__ is usually meant to be somewhat machine-readable

I don't see this as a problem at all, since this is only the __repr__ of a helper class, not of our main SoundFile class.

soundfile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'r' is the default, I would drop it here

soundfile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about some kind of a HH:MM:SS.x display?

@bastibe
Copy link
Owner Author

bastibe commented Oct 19, 2015

I have implemented most of your suggestions:

Mann_short.wav
samplerate: 44100 Hz
channels: 1
duration: 2.14 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16]endian: FILE
sections: 1
extra_info: """
    File : Mann_short.wav
    Length : 188540
    RIFF : 188532
    WAVE
    fmt  : 16
      Format        : 0x1 => WAVE_FORMAT_PCM
      Channels      : 1
      Sample Rate   : 44100
      Block Align   : 2
      Bit Width     : 16
      Bytes/sec     : 88200
    data : 188416
    LIST : 20
      INFO
        ISFT : Fission
    *** ID3  : 44 (unknown marker)
    End
    """

To be honest, I still prefer the original, indented style. But this is a workable compromise, and it doesn't hint at an actual SoundFile object existing.

I don't want verbose=True to be the default though. Especially the extra_info is almost never relevant, and can be of arbitrary length.

I also experimented with different styles for the duration. timedelta is too verbose for my taste, and datetime is more complex to use for our use case than simply writing my own formatter, which is what I ended up doing.

Lastly, I experimented with namedtuple, but I was not happy with the resulting code. I think the current solution works well enough.

@mgeier
Copy link
Contributor

mgeier commented Oct 19, 2015

I still prefer the original, indented style.

Then go ahead an use it!
This is a subjective decision.

I don't want verbose=True to be the default though.

Fair enough, I don't know any strong argument either way.

simply writing my own formatter, which is what I ended up doing.

Cool!
You could make the implementation a bit nicer by using divmod().

Is there a reason why you use exactly 2 digits after the decimal point?

I think it would be easier to differentiate between HH:MM:SS and MM:SS.ss with 3 digits, i.e. MM:SS.sss

I experimented with namedtuple, but I was not happy with the resulting code.

OK, I see now that this gets quite complicated because of the presence of the verbose argument.
Without it, it would be easy-peasy.

Given your example from above, let me show how I would "improve" it:

file: Mann_short.wav
samplerate: 44100 Hz
channels: 1
duration: 2.14 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16]
endian: FILE
sections: 1
*** extra_info ***
File : Mann_short.wav
Length : 188540
RIFF : 188532
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 1
  Sample Rate   : 44100
  Block Align   : 2
  Bit Width     : 16
  Bytes/sec     : 88200
data : 188416
LIST : 20
  INFO
    ISFT : Fission
*** ID3  : 44 (unknown marker)
End

This is what I changed:

  • add "file" in the first line
  • remove triple quotes
  • un-indent extra_info and use a separation line (could of course be more creative)

I don't like the spaces before colons nor the initial capital letters, but probably we should use them for consistency?
Or as a tribute to libsndfile?

File : Mann_short.wav
Samplerate : 44100 Hz
Channels : 1
Duration : 2.14 s
Format : WAV (Microsoft) [WAV]
Subtype : Signed 16 bit PCM [PCM_16]
Endian : FILE
Sections : 1
*** extra_info ***
File : Mann_short.wav
Length : 188540
RIFF : 188532
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 1
  Sample Rate   : 44100
  Block Align   : 2
  Bit Width     : 16
  Bytes/sec     : 88200
data : 188416
LIST : 20
  INFO
    ISFT : Fission
*** ID3  : 44 (unknown marker)
End

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm reading this it seems strange to me that verbose only affects the __repr__.
But I guess that's the price we have to pay for combining human- and machine-readability, right?

Hmmm ...

Probably verbose should also affect the attributes of the _SoundFileInfo instance?
I guess this would be more consistent ...

@bastibe
Copy link
Owner Author

bastibe commented Oct 20, 2015

add "file" in the first line

I like the slight bookending by the file name. Just a plain list with no "header" looks strange to me.

remove triple quotes
un-indent extra_info and use a separation line (could of course be more creative)

The triple quotes (and to a lesser extent, the indentation of extra_info) serve a purpose: They make it clear that this is not "parametric" metadata, but just a string. I think that this is quite useful, especially since we really don't have any control over the contents of that string.

I would prefer to keep it the way it is now. Can we merge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a newline missing before the endian line!

@mgeier
Copy link
Contributor

mgeier commented Oct 20, 2015

I like the slight bookending by the file name.

I don't. I find it very inconsistent and somewhat confusing.

Just a plain list with no "header" looks strange to me.

It doesn't to me, but I see your point.
What about adding some kind of fancy purely decorative header-line (without adding indentation afterwards)?

They make it clear that this is not "parametric" metadata, but just a string.

In my suggestion, this part is done by the separation line.
As I said, this can be more creative, what about adding an empty line and a hint that a string is following?

File : Mann_short.wav
Samplerate : 44100 Hz
Channels : 1
Duration : 2.14 s
Format : WAV (Microsoft) [WAV]
Subtype : Signed 16 bit PCM [PCM_16]
Endian : FILE
Sections : 1

##### extra_info string from libsndfile #####
File : Mann_short.wav
Length : 188540
RIFF : 188532
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 1
  Sample Rate   : 44100
  Block Align   : 2
  Bit Width     : 16
  Bytes/sec     : 88200
data : 188416
LIST : 20
  INFO
    ISFT : Fission
*** ID3  : 44 (unknown marker)
End

I would prefer to keep it the way it is now. Can we merge?

If find the indentation and the triple quotes butt-ugly, but that's only my unfounded personal subjective opinion.

So sure, go ahead and merge it, you have the last say on subjective decisions.

Except for the missing newline before endian, you should probably add this before merging.

@mgeier
Copy link
Contributor

mgeier commented Oct 20, 2015

Now that I'm seeing my previous example, I see a possible header:

##### SoundFile properties #####

But in this case we would have to make sure that those are actually properties of the SoundFile class, i.e. we'd have to add duration as a property (but I guess it's better to have it there as a number of seconds instead of the fancy formatted string).

And we should go back to small letters (like the actual property names) and more repr()-like display of format/subtype/endian:

##### SoundFile properties #####
file : 'Mann_short.wav'
samplerate : 44100 Hz
channels : 1
duration : 2.14 s
format : WAV (Microsoft) ['WAV']
subtype : Signed 16 bit PCM ['PCM_16']
endian : 'FILE'
sections : 1

##### extra_info string from libsndfile #####
File : Mann_short.wav
Length : 188540
RIFF : 188532
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 1
  Sample Rate   : 44100
  Block Align   : 2
  Bit Width     : 16
  Bytes/sec     : 88200
data : 188416
LIST : 20
  INFO
    ISFT : Fission
*** ID3  : 44 (unknown marker)
End

And then we might as well get rid of the spaces before colons again:

##### SoundFile properties #####
file: 'Mann_short.wav'
samplerate: 44100 Hz
channels: 1
duration: 2.14 s
format: WAV (Microsoft) ['WAV']
subtype: Signed 16 bit PCM ['PCM_16']
endian: 'FILE'
sections: 1

##### extra_info string from libsndfile #####
File : Mann_short.wav
Length : 188540
RIFF : 188532
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 1
  Sample Rate   : 44100
  Block Align   : 2
  Bit Width     : 16
  Bytes/sec     : 88200
data : 188416
LIST : 20
  INFO
    ISFT : Fission
*** ID3  : 44 (unknown marker)
End

BTW, why not add seekable?

@mgeier
Copy link
Contributor

mgeier commented Oct 20, 2015

We might even have a duration_hms property on the SoundFile ...?

bastibe added a commit that referenced this pull request Oct 20, 2015
@bastibe bastibe merged commit 10c22f3 into master Oct 20, 2015
@bastibe
Copy link
Owner Author

bastibe commented Oct 20, 2015

My priority right now is to get the next release out the door. I am still interested in a better way though. Maybe the idea with "SoundFile properties" is a good one, and we should add duration and seekable to the SoundFile object. At that point, we can also re-evaluate the layout of the output of the info function.

Feel free to continue this discussion in either a pull request to that effect, or here.

@mgeier
Copy link
Contributor

mgeier commented Oct 20, 2015

The right thing to do now would be:

  1. revert the merge
  2. re-open this issue
  3. remove it from the milestone 0.8.0
  4. release 0.8.0
  5. then, calmly continue discussing this

I'm not a fan of pushing half-baked stuff into a release in a hurry.
And there is no need for it, either.

@bastibe bastibe deleted the info-function branch April 25, 2016 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants