What happens when the data written is bigger than [-1, 1)? by mgeier · Pull Request #104 · bastibe/python-soundfile

mgeier · 2015-02-28T11:41:05Z

if a floating point format is chosen, the values get saved as is.
if an integer format is chosen, garbage is written.

The second point is unpleasant. We could

raise an error (this would be my choice)
clip and warn
scale and warn

mgeier · 2015-01-25T18:20:36Z

Wow, I wasn't aware of that!
I agree that that's bad default behavior.

Using SFC_SET_CLIPPING (see also #23) should solve the problem.
I don't see why anyone would ever not want to switch this on ...

Interestingly, the same thing was discussed repeatedly over the years:
http://audacity.238276.n2.nabble.com/libsndfile-adn-SFC-SET-CLIPPING-td528119.html
supercollider/supercollider#345
http://osdir.com/ml/linux.audio.mediaapi/2007-06/msg00005.html

It seems that this also changes the scaling behavior discussed in #20.

I'm not a fan of raising an error in this case (and I'm not a fan of warnings at all).
I think the behavior should clearly be stated in the documentation and that should be enough.
Apart from that, we would have to read the whole array to find minimum and maximum, which would add an non-negligible overhead when writing large arrays (since, as far as I know, this cannot be done within libsndfile).

mgeier · 2015-01-25T18:25:40Z

This is probably also a sign that we shouldn't rush 1.0 (see #102).
Who knows how many surprises of this kind are still waiting to be discovered ...

bastibe · 2015-01-25T21:02:29Z

Whatever solution we choose, I am a big fan of saving floating point data outside the [-1, 1) range, so this should definitely be preserved. This, might rule out using the sndfile flag.

As for integer normalization, I think that raising an error is the cleanest solution. In face of ambiguity, resist the temptation to guess. This thing is, I discovered this when I wrote some float data and expected it to just work. Only much later did I find out that I had written garbage to disk. If it had raised an error, I would have discovered the problem right away instead of hours too late. My files would have been useless with clipped or scaled data as well.

mgeier · 2015-01-26T20:13:00Z

Whatever solution we choose, I am a big fan of saving floating point data outside the [-1, 1) range, so this should definitely be preserved. This, might rule out using the sndfile flag.

The documentation states:

"Turn on/off automatic clipping when doing floating point to integer conversion."

So this shouldn't be a problem.
I also tried it, and it indeed isn't a problem.

What I find a bit annoying is that it also silently changes the scaling behavior.
With the default setting (SFC_SET_CLIPPING == SF_FALSE), the values +-1.0 are mapped to +-32767 (for 16-bit PCM, as an example), so a signal within +-1.0 is never clipped, but very slightly scaled down.
When setting SFC_SET_CLIPPING == SF_TRUE, in addition to enabling clipping (as the name suggests), it also changes the scaling behavior (which is totally un-obvious).
Suddenly, the value +1.0 is clipped to 1.0-2**-15, which is mapped to 32767 and the value -1.0 is directly mapped to -32768.
In other words, the signal is not scaled, but the value 1.0 cannot be represented anymore and is clipped.
Incidentally, this would be the behavior that @Whistler7 suggested in #20.

I would like to control the clipping behavior and the scaling behavior separately, but there seems to be no separate option for that in libsndfile.
Or am I missing something?

Anyway, there is no "right" solution for scaling, so we might as well choose the second one as default, even if I personally prefer the first one.

As for integer normalization, I think that raising an error is the cleanest solution. In face of ambiguity, resist the temptation to guess.

Sure, that sounds reasonable.
But what price are we willing to pay for it?
Shall we really penalize the common correct usage in order to raise an error if the user makes a mistake?
I don't want to wait longer for my hypothetical 1 GB file to be written, just to check if the values are within +-1.0, which I most probably already know beforehand.

bastibe · 2015-01-27T10:51:33Z

But what price are we willing to pay for it?

We can easily measure this. For very short signals, max(abs(...)) is about 1000 times faster than write. For longer signals (100s), it takes about 50% of write. For long signals (1h/350Mb), it takes about the same time as write.

If we can't find a faster solution than np.max(np.abs(...)), this clearly takes too much time.

In light of this, I am in favor of enabling clipping for integer formats by default.

mgeier · 2015-01-27T15:47:32Z

Thanks for the measurements, I guess that's settled, then!

The remaining question is if we should hard-code SFC_SET_CLIPPING in __init__() or if we should expose that to the user.
We can also hard-code it for now and re-visit the issue when working on #23.

bastibe · 2015-01-28T09:15:41Z

I don't see much utility in not clipping integer data. If unclipped, it's just garbage anyway.

See #104. Incidentally, this closes #20.

mgeier · 2015-02-28T11:43:15Z

I added a commit which enables clipping (7724a09).

What happens when the data written is bigger than [-1, 1)?

bastibe · 2015-02-28T13:11:11Z

Awesome!

bastibe changed the title ~~What happens when the data is bigger than [-1, 1)?~~ What happens when the data written is bigger than [-1, 1)? Jan 23, 2015

Switch on SFC_SET_CLIPPING in SoundFile.__init__()

7724a09

See #104. Incidentally, this closes #20.

bastibe added a commit that referenced this pull request Feb 28, 2015

Merge pull request #104 from bastibe/issue-104

d88aedf

What happens when the data written is bigger than [-1, 1)?

bastibe merged commit d88aedf into master Feb 28, 2015

mgeier deleted the issue-104 branch February 28, 2015 17:34

mgeier mentioned this pull request Mar 2, 2015

Support interface normalized to [-1,1) range? #20

Closed

mgeier mentioned this pull request Aug 16, 2019

What normalisation is used in sound file #241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens when the data written is bigger than [-1, 1)?#104

What happens when the data written is bigger than [-1, 1)?#104
bastibe merged 1 commit intomasterfrom
issue-104

mgeier commented Feb 28, 2015

Uh oh!

mgeier commented Jan 25, 2015

Uh oh!

mgeier commented Jan 25, 2015

Uh oh!

bastibe commented Jan 25, 2015

Uh oh!

mgeier commented Jan 26, 2015

Uh oh!

bastibe commented Jan 27, 2015

Uh oh!

mgeier commented Jan 27, 2015

Uh oh!

bastibe commented Jan 28, 2015

Uh oh!

mgeier commented Feb 28, 2015

Uh oh!

bastibe commented Feb 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgeier commented Feb 28, 2015

Uh oh!

mgeier commented Jan 25, 2015

Uh oh!

mgeier commented Jan 25, 2015

Uh oh!

bastibe commented Jan 25, 2015

Uh oh!

mgeier commented Jan 26, 2015

Uh oh!

bastibe commented Jan 27, 2015

Uh oh!

mgeier commented Jan 27, 2015

Uh oh!

bastibe commented Jan 28, 2015

Uh oh!

mgeier commented Feb 28, 2015

Uh oh!

bastibe commented Feb 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants