-
Notifications
You must be signed in to change notification settings - Fork 349
Audio: Fix pointer arithmetic mistake in ASRC initialize #3254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio: Fix pointer arithmetic mistake in ASRC initialize #3254
Conversation
|
I have not confirmed but previous code might have been OK in 32 bit xtensa. Testbench is 64 bits in 64 bit OS. |
|
Casting I don't really understand the code nor the comment but would something like Please also add a The https://sof-ci.01.org/sofpr/PR3254/build6755/devicetest/ error looks like this: |
|
Using this gives a segfault for test run The memory layout is like this
Would this be OK? Umm, I don't get how the devicetest is related. I doubt there's a test for ASRC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like a more serious fix is required. I have also noticed the cast from (int32_t **) to (int32_t *) which is a recipe for disaster. This PR simply doesn't fix that. What's the logic behind that?
I want a VERY CLEAR explanation on why this cast is needed before I allow merging. What's the structure of ring_buffers32/ring_buffers16?
|
@marc-hb @paulstelian97 Thanks for the attention! This is not originally my code and I would like to totally rewrite the data I/O part. It wastes RAM by writing each sample duplicated to FIR delay lines to avoid circular wrap checks when computing filters. Instead with xtensa the circular access SIMD are more efficient. ASRC is a memory hog so eventually it must be done. But I'd like to fix the currently known issues before doing that. If you look asrc_farrow_generic.c function asrc_fir_filter32() you see how each channel filter delay line is read. The current xtensa SIMD version code is similar, but this is easier to read: So there the start address is retrieved from C two dimensional array and stepped with (int32_t *) pointer arithmetic. It is initialized for varying dimensions by assumption of knowing how C internally stores two dimensional arrays. |
Wait, so ring_buffers32 is (int[something] *) or (int **)? These have different semantics in C so I still don't get it. I think the (int **) itself may be in the wrong though. When you say (int **) that is INCOMPATIBLE with the C arrangement of two-dimensional arrays. You instead get an array of pointers, each of the pointers itself leading to an array. But that is not a two-dimensional array, an array of arrays. Sounds extremely broken to me. |
|
@paulstelian97 Yep, this has been all the time hard for me debug. I'll rewrite this PR more with cleaner initialization of the two dimensional array. I will eventually replace the whole array with channel interleaved buffer like SOF normally uses but I want to make a working intermediate step as reference for RAM & more SIMD parallelized speed optimized implementation. I'll fix the stress test glitches issue first. Unfortunately this fix didn't help with that, there's still something else. |
|
The buffer length is always even (but I'll add check for it anyway because I will get rid of doubled data). For this part of code, is this acceptable? If yes, I will follow in other init code C99 convention of setting up the 2-dimensional arrays for 16 and 32 bit data. |
|
There's examples of this in https://www.geeksforgeeks.org/dynamically-allocate-2d-array-c/ . The current code like example 4, at least that's the intent. I feel you'd prefer example 2 but with single allocation. In doing that there's a problem in struct asrc_farrow in asrc_farrow.h. The would be two flexible array members. But I would be able to fix one dimension to PLATFORM_MAX_CHANNELS like this. The RAM for e.g. channels 3..8 would be not allocated if stream is stereo. Would such data stucture be OK @paulstelian97 ? |
Is this the existing structure or a new one? Either way it's fine. It's just that ring_buffers32 CANNOT correctly be cast to (int32_t **). Simple as that. ALTHOUGH you would automatically always allocate 8 channels by this allocation so you might want to reconsider. So it would be false that "the RAM for channels 3..8 would not be allocated if the stream is stereo" -- I get the intent but it won't work as such. I'd say: fine for a quick fix except for the memory-constrained platforms, but needs improvements in the future. |
|
Only the header of the array would be full array of 8. The pointers to columns (or rows?) would remain NULL for non-used channels. So the entire 8 * N matrix would not be allocated. But since I don't need cast to (int32_t **) wouldn't the above example be OK? That wouldn't increase RAM need from current. It compiled an ran without problems when defined as this (similarly as in example 4 in the above mentioned web page). |
OK, now that's a way to go. You can allocate a single large buffer and split it up into 8 distinct linear buffers, 1 per channel. Or fewer if fewer channels are required. Might want to put in some asserts just in case you accidentally try to use a channel that is unset (channel 4 in stereo, for example). Or would a null pointer dereference do the trick in that situation? Let's see the actual code now. |
3462ab2 to
de096f4
Compare
Anything cast-free and using array indexes |
src/audio/asrc/asrc_farrow.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always even atm since computed as src_obj->filter_length * 2 by initialise_buffer(). Should this check be moved to initialise_buffer() instead, as an assert() with a comment that change to the formula would result in unaligned pointer later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have in plans to remove the duplicated data delay line so I added this check to not forget it. But you are right it's unnecessary now. I can remove it since you mentioned.
|
@mmaka That looks cleaner, thanks. Though there's an additional vector of pointers for dynamic 2D array initialize. I could also reorder the impulse response and ring buffers location to not have bit depth in the pointers calculate code. I'll add the check vs. 128 and 256, good point. I can later optimize the sizes. |
de096f4 to
dd18969
Compare
|
Thanks for all the feedback @marc-hb @lyakh @mmaka1 @paulstelian97 , here's a new version that works and passes valgrind. Is it like you recommended? |
src/audio/asrc/asrc_farrow.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, how does this work? There is a bit of a reserved region of size filter_length before the first of the buffers? Also, 2D array is incompatible with (int32_t **)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the filter coefficients array to be before buffers32/16. This way I don't need to take into account the variable word length of buffers when computing next array location. The buffers32/16 is the last part of the allocated blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand a dynamic size 2D array needs to use (int32_t **) pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic sized 2D array isn't a continuous memory region unlike statically allocated 2D arrays. Just to be aware of that again.
Of course, when the lower size varies you sure can only go the array-of-pointers with each pointer being an individual array way, and if you want allocate the individual arrays continuously in a big chunk -- but you still need a separate memory region to hold the pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I know. The array of pointers to each row data allows that. However in this case the memory is allocated continuously for simplicity. I'm not aware of ways to do this in C in much cleaner way. We discussed the possibility to fix array first dimension to SOF max channels for simpler initialization but it would consume more RAM.
Now the channel # form the array rows and the channel samples are the columns for "src_obj->ring_buffers32[ch][frame]". The memory region to hold the pointers to column index zeros is in the beginning of the memory area for this "dynamic 2D array". This can be seen in function initialise_buffer().
Actually now I think there's a mistake in asrc_get_required_size(). When computing "size += ASRC_MAX_BUFFER_LENGTH * num_channels * (bit_depth / 8);" the array of pointers to colums data is missing. I think if should be num_channels * sizeof(int32 **). The bug has not triggered because the used filters are significantly shorter than MAX. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, no mistake, there's the "size += sizeof(int32_t *) * num_channels;". It holds the pointers in beginning of ring_buffers32/16 the pointers to columns data. So this proposal should be OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the array of pointers should have MAX, but null out any channels between the actual available ones and the max. So for stereo, ring_buffers32[0] and ring_buffers32[1] are valid, while ring_buffers32[2] through ring_buffers32[7] are reliably null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would change the initialization and consume more allocated RAM. I don't think the stream channels count could change between prepare() and copy(). So such channels could increase should not happen. Also the channel count for ASRC is initialized only in prepare(), it's not monitored/updated in copy(). Access to more than initialized would require cd->asrc_obj corruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing an allocation from 8 to 32 bytes? FWIW I don't think there are blocks in the heap smaller than 32 bytes in most memory configurations anyway. So for the array of pointers you can go ahead an allocate with [MAX] anyway, to protect against mistakes that escape review. Not gonna block on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ASRC object allocation goes to 2 kB or 4 kB pool usually. This is relatively small in that amount yes. I'll do the next version with channels max fixed length for pointers if keep the channel specific buffers. If channel interleaved (as SRC), no need for such.
src/audio/asrc/asrc_farrow.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned something new about C... And it's even in K&R and I wasn't aware of it. Here's the thing: &(a[1]) - &(a[0]) == 1 for both int a[2]; and int *a; and in both cases it's known at compile-time. However &(a[0][1]) - &(a[0][0]) is also valid for both int a[2][3]; and e.g. int *a[2]; but - for the former it's also known at compile time and is always 1 whereas for the latter it depends on values of a[0] and a[1] and consequently only can be calculated at run-time... Maybe it's obvious to everyone here except me, then we can certainly use this code, but if I'm not the only one who finds it... confusing - to put it mildly - maybe we want to make the code a bit clearer about what exactly we mean :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh How does it depend on a[1] when both values are read from a[0]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulstelian97 yes, sorry, see - it's that confusing (for me) :-) Make that &(a[1][0]) - &(a[0][0]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that one yeah, it is statically known for 2D arrays but determined as an actual runtime computation for jagged arrays (double pointer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please share your proposal how to make the code a bit clearer.
(I'd like to do a rewrite the data I/O part as this wastes some RAM but I'll do after I've managed to fix the stress test glitches issue. I would remove the "2D" array and use normal channel interleaved circular buffers for FIR as I use in other components like EQ and SRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singalsu as long as you use a dynamically allocated 2D array like int **a; or int *a[]; and you refer to individual elements using a[i][j] there doesn't seem to be much choice - C wants to have an array of pointers exactly like it's done now. If you can do that as an interleaved array and it would be easier to work with - that could be good, but if you're also fine with the current implementation - it's correct too, basically it's just me who wasn't aware of this C... peculiarity...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like get this simple fix merged unless there are still violations against good practices. I don't promise when but I will do the rewrite.
If it would be just for MCPS improve it would be low priority (current speed should be quite acceptable) but the RAM usage is much more than needed. It's not a large task after there's a stable version that passes the testing. I have now fix PR for 2 of 3 different type know bug. The third is under work. If that fixes remaining issues the rewrite can be done.
This patch fixes the issue where impulse response update can randomly corrupt stack in testbench environment. The error seen in freeing cd->asrc_obj was free(): invalid next size (normal). The violation was also confirmed with valgrind with error Invalid write of size 4. The order of impulse response storage and channel buffers are swapped to simplify the pointer initialize calculations. The default sizes for them are added as macro definitions into asrc_farrow.h. Checks for used sizes are added to buffer and filter initialize functions. The clear of buffer is done with memset instead of for loop. Fixes: 4c995d0 Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
dd18969 to
f93f519
Compare
paulstelian97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the most part I think this one is alright and, unless it breaks something critical (does the CI work? I haven't checked...) this can be merged as-is.
|
I'll update the comment, don't merge yet. |
I myself don't have merge perms anyway :) Guess when the second approval comes through. |
The 80 characters split lines were hard to read. This change improves readability error codes enums. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
I added second commit with longer lines for those comment. Please review it too! |
|
Jenkins CI DUT failure. |
This pull request consists of two commits, see invidual descriptions