Skip to content

What happens when the data written is bigger than [-1, 1)?#104

Merged
bastibe merged 1 commit intomasterfrom
issue-104
Feb 28, 2015
Merged

What happens when the data written is bigger than [-1, 1)?#104
bastibe merged 1 commit intomasterfrom
issue-104

Conversation

@mgeier
Copy link
Contributor

@mgeier mgeier commented Feb 28, 2015

  • if a floating point format is chosen, the values get saved as is.
  • if an integer format is chosen, garbage is written.

The second point is unpleasant. We could

  • raise an error (this would be my choice)
  • clip and warn
  • scale and warn

@bastibe bastibe changed the title What happens when the data is bigger than [-1, 1)? What happens when the data written is bigger than [-1, 1)? Jan 23, 2015
@mgeier
Copy link
Contributor

mgeier commented Jan 25, 2015

Wow, I wasn't aware of that!
I agree that that's bad default behavior.

Using SFC_SET_CLIPPING (see also #23) should solve the problem.
I don't see why anyone would ever not want to switch this on ...

Interestingly, the same thing was discussed repeatedly over the years:
http://audacity.238276.n2.nabble.com/libsndfile-adn-SFC-SET-CLIPPING-td528119.html
supercollider/supercollider#345
http://osdir.com/ml/linux.audio.mediaapi/2007-06/msg00005.html

It seems that this also changes the scaling behavior discussed in #20.

I'm not a fan of raising an error in this case (and I'm not a fan of warnings at all).
I think the behavior should clearly be stated in the documentation and that should be enough.
Apart from that, we would have to read the whole array to find minimum and maximum, which would add an non-negligible overhead when writing large arrays (since, as far as I know, this cannot be done within libsndfile).

@mgeier
Copy link
Contributor

mgeier commented Jan 25, 2015

This is probably also a sign that we shouldn't rush 1.0 (see #102).
Who knows how many surprises of this kind are still waiting to be discovered ...

@bastibe
Copy link
Owner Author

bastibe commented Jan 25, 2015

Whatever solution we choose, I am a big fan of saving floating point data outside the [-1, 1) range, so this should definitely be preserved. This, might rule out using the sndfile flag.

As for integer normalization, I think that raising an error is the cleanest solution. In face of ambiguity, resist the temptation to guess. This thing is, I discovered this when I wrote some float data and expected it to just work. Only much later did I find out that I had written garbage to disk. If it had raised an error, I would have discovered the problem right away instead of hours too late. My files would have been useless with clipped or scaled data as well.

@mgeier
Copy link
Contributor

mgeier commented Jan 26, 2015

Whatever solution we choose, I am a big fan of saving floating point data outside the [-1, 1) range, so this should definitely be preserved. This, might rule out using the sndfile flag.

The documentation states:

"Turn on/off automatic clipping when doing floating point to integer conversion."

So this shouldn't be a problem.
I also tried it, and it indeed isn't a problem.

What I find a bit annoying is that it also silently changes the scaling behavior.
With the default setting (SFC_SET_CLIPPING == SF_FALSE), the values +-1.0 are mapped to +-32767 (for 16-bit PCM, as an example), so a signal within +-1.0 is never clipped, but very slightly scaled down.
When setting SFC_SET_CLIPPING == SF_TRUE, in addition to enabling clipping (as the name suggests), it also changes the scaling behavior (which is totally un-obvious).
Suddenly, the value +1.0 is clipped to 1.0-2**-15, which is mapped to 32767 and the value -1.0 is directly mapped to -32768.
In other words, the signal is not scaled, but the value 1.0 cannot be represented anymore and is clipped.
Incidentally, this would be the behavior that @Whistler7 suggested in #20.

I would like to control the clipping behavior and the scaling behavior separately, but there seems to be no separate option for that in libsndfile.
Or am I missing something?

Anyway, there is no "right" solution for scaling, so we might as well choose the second one as default, even if I personally prefer the first one.

As for integer normalization, I think that raising an error is the cleanest solution. In face of ambiguity, resist the temptation to guess.

Sure, that sounds reasonable.
But what price are we willing to pay for it?
Shall we really penalize the common correct usage in order to raise an error if the user makes a mistake?
I don't want to wait longer for my hypothetical 1 GB file to be written, just to check if the values are within +-1.0, which I most probably already know beforehand.

@bastibe
Copy link
Owner Author

bastibe commented Jan 27, 2015

But what price are we willing to pay for it?

We can easily measure this. For very short signals, max(abs(...)) is about 1000 times faster than write. For longer signals (100s), it takes about 50% of write. For long signals (1h/350Mb), it takes about the same time as write.

If we can't find a faster solution than np.max(np.abs(...)), this clearly takes too much time.

In light of this, I am in favor of enabling clipping for integer formats by default.

@mgeier
Copy link
Contributor

mgeier commented Jan 27, 2015

Thanks for the measurements, I guess that's settled, then!

The remaining question is if we should hard-code SFC_SET_CLIPPING in __init__() or if we should expose that to the user.
We can also hard-code it for now and re-visit the issue when working on #23.

@bastibe
Copy link
Owner Author

bastibe commented Jan 28, 2015

I don't see much utility in not clipping integer data. If unclipped, it's just garbage anyway.

@mgeier
Copy link
Contributor

mgeier commented Feb 28, 2015

I added a commit which enables clipping (7724a09).

bastibe added a commit that referenced this pull request Feb 28, 2015
What happens when the data written is bigger than [-1, 1)?
@bastibe bastibe merged commit d88aedf into master Feb 28, 2015
@bastibe
Copy link
Owner Author

bastibe commented Feb 28, 2015

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants