-
Notifications
You must be signed in to change notification settings - Fork 349
Minor fixes #4860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes #4860
Conversation
eeaa77d to
5e86bce
Compare
paulstelian97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First two look like they should be squashed. I'll accept this whether they're squashed or not though.
src/trace/dma-trace.c
Outdated
|
|
||
| /* return if already initialized */ | ||
| if (buffer->addr) | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you reset all the fields? The test on the address will not clear the rest, so you could be in a weird state....
buffer->w_ptr = buffer->addr;
buffer->r_ptr = buffer->addr;
buffer->end_addr = (char *)buffer->addr + buffer->size;
buffer->avail = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you reset all the fields? The test on the address will not clear the rest, so you could be in a weird state....
The intention of my change here is only to test and avoid multiple initializations , not to clear buffer, the clearing will be perform in dma_trace_buffer_free() e.g. "memset(buffer, 0, sizeof(*buffer));"
buffer->w_ptr = buffer->addr; buffer->r_ptr = buffer->addr; buffer->end_addr = (char *)buffer->addr + buffer->size; buffer->avail = 0;
src/trace/dma-trace.c
Outdated
| /* return if already initialized */ | ||
| if (buffer->addr) | ||
| return 0; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I am at it, if rballoc initializes to zero, while if there a bzero() line 239?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I am at it, if rballoc initializes to zero, while if there a bzero() line 239?
good point, @lyakh has an argument that we should not perform 0 initialization for rballoc() and leave the decision to be made by the caller. Maybe let me add a new rbzalloc() for 0-initialized allocation.
Thanks, I would leave them as they are, as I want the testbench part to be separated for easier bisection. |
src/ipc/ipc-common.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SOF_MEM_ZONE_SYS_SHARED works the same way as SOF_MEM_ZONE_SYS in that it panics if allocation fails. No need to check return value, this is never done on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
SOF_MEM_ZONE_SYS_SHAREDworks the same way asSOF_MEM_ZONE_SYSin that it panics if allocation fails. No need to check return value, this is never done on purpose.
We should not depend on the panic inside the allocator, I will merge the _SYS_SHARED zone to BUFFER zone in PR #4747.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we rely on them in many locations and I have been reminded about that at least twice and since then I myself repeated that request to others. I don't find this particularly pretty, but I also see the logic behind that design. However, if you move those allocations to a non-system zone, we might have more work to do to clean those up. I currently count more than 15 allocations from SOF_MEM_ZONE_SYS_SHARED in my local tree. If you really change them all, I think a separate patch would be good for that, addressing them all.
Also, those allocations are of type "allocate once at init and never free" so, maybe we can switch most or all of them to use .data or .bss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh who corrected? this really seems to be a policy call. if SYS_SHARED has guaranteed allocations don't have to be checked (like glib in Linux, you panic if you fail to allocate), this should be followed universally. or if not, it should be the same all over the place.
This is basicly a fundamental design question for any reusable C code. Either you check allocs or you do not. Both are deployed successfully, but you have to stick to the plan. If you start mixing both approaches, then it gets really complicated.
Personally, I don't know the history, but I'm a bit skeptical of the usefulness of SYS/SYS_SHARED. Some code/checks is saved, but it seems non-trivial to enforce the policy when SYS/SYS_SHARED should be used. E..g IPC is really application code and one can argue whether it should use the SYS heap at all, or should it be limited to RTOS usage. OTOH, in most products using SOF, IPC is mandatory part of the system, so use of SYS makes sense, but this is now a product call. And so forth and forth, seems complex when scaled up to more and more products.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for filing #4865 to track the TODOs, @lyakh .
@kv2019i agree that most of the sys/sys_shared usage today are actually runtime usage for us, imagine that those secondary cores related should be freed at powering down a specific core, DMA trace and be disabled. On the other hand, we reserve relative big zones for sys/sys_runtime for APL today, that's one of the reason that we are hitting OOM issues there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to check now as Zephyr mode requires a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i I'm sure this is the current official design: SYS/SYS_SHARED failures cause a panic, they must not be checked. And I haven't seen any violations of this in the code, if I had, I'd probably complain.
I'm not against changing that, but then we have to declare that change properly, not as a minor fix, and change it globally.
As I commented elsewhere in this PR - we need a formal API, once we have one, we can follow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, we need to assume we are using Zephyr allocator and this could fail. No harm in xtos mode as we transition.
src/trace/dma-trace.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to not like this one either. Does the description itself of the rballoc function state that the buffer is zeroed? If it's not stated in there it isn't happening from the point of view of the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @paulstelian97 , this caught my eye as well, but seems Keyon now added that to rballoc documentation in alloc.h.
OTOH, L236 is a bit different. @keyonjie , rballoc() doesn't guarantee the zeros are written to memory. At least the Zephyr version of rballoc doesn't guarantee it. Is this still intentional to remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulstelian97 @kv2019i maybe let me to keep the L236 at this point, though writeback it without any writing here looks a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking keep both 235 (bzero) and 236 (writeback). No need to remove the bzero when you don't have a guarantee (but instead an xtos implementation detail) that rballoc zeroes things. Just keep it around (is there such a big performance impact that it matters?)
Or remove both if there's no real need for the buffer to be zeroed. Just don't omit bzero but expect zeros, that is all. bzero and the writeback should work together (either both or neither in the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both bzero() and WB here, but can we also make sure Zephyr wrapper has the same behavior with buffer allocations (wrt to bzero())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, or @keyonjie just drop this patch from the series? This isn't performance critical code and this is not really making the code easier to understand or maintain, so not sure of the benefit.
Looking at other dma-trace.c , I start to wonder what is the benefit allocation the buffer from cached memory. DSP only writes to the memory once, and then DMA hw reads (uncached) so DSP always needs to writeback the data. GIven you always need to writeback, it's not clear there's any benefit to using a cached mapping. But it does make the code harder to read and more error prone.
FYI to @marc-hb and @lyakh who've been looking at trace issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, let's be consistent. Either rballoc() zeroes out the buffer or it doesn't. I don't see why this is done in a piecemeal operation related to dma-trace.
I have a really hard time btw with the notion of "minor fixes' for this PR. the moment you change the dma trace it's no longer minor but can have large consequences on the work of others. Can we please learn from the snaffus in the last 3 months and be careful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @plbossart. After you call calloc() / kzalloc() you do not immediately call memset(0) on the same memory.
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few questions (inline).
src/trace/dma-trace.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @paulstelian97 , this caught my eye as well, but seems Keyon now added that to rballoc documentation in alloc.h.
OTOH, L236 is a bit different. @keyonjie , rballoc() doesn't guarantee the zeros are written to memory. At least the Zephyr version of rballoc doesn't guarantee it. Is this still intentional to remove?
src/ipc/ipc-common.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh who corrected? this really seems to be a policy call. if SYS_SHARED has guaranteed allocations don't have to be checked (like glib in Linux, you panic if you fail to allocate), this should be followed universally. or if not, it should be the same all over the place.
This is basicly a fundamental design question for any reusable C code. Either you check allocs or you do not. Both are deployed successfully, but you have to stick to the plan. If you start mixing both approaches, then it gets really complicated.
Personally, I don't know the history, but I'm a bit skeptical of the usefulness of SYS/SYS_SHARED. Some code/checks is saved, but it seems non-trivial to enforce the policy when SYS/SYS_SHARED should be used. E..g IPC is really application code and one can argue whether it should use the SYS heap at all, or should it be limited to RTOS usage. OTOH, in most products using SOF, IPC is mandatory part of the system, so use of SYS makes sense, but this is now a product call. And so forth and forth, seems complex when scaled up to more and more products.
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bzero() will help for buffers as this has caught out a few users already (one recently fixed just prior to v1.9)
src/ipc/ipc-common.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to check now as Zephyr mode requires a check.
src/trace/dma-trace.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both bzero() and WB here, but can we also make sure Zephyr wrapper has the same behavior with buffer allocations (wrt to bzero())
marc-hb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a really hard time btw with the notion of "minor fixes' for this PR. the moment you change the dma trace it's no longer minor but can have large consequences on the work of others. Can we please learn from the snaffus in the last 3 months and be careful?
+1, a non-boilerplate and non-misleading PR title is really the least that can be expected. I found this PR only because I'm now running this pseudo-code / workaround but it's not sustainable:
if PR_subject ~= 'Minor changes' and author == Keyon:
PR_subject = 'Not Minor unknown'Also, please cc: me on any trace-related PR. Not because I can help (likely not) but to give me advance notice of potential regressions like d8a19a4 or f2c13f5
|
@marc-hb @plbossart the initial PR was intended to be minor. Anyway, let me move controversial ones out and take it back to minor again. @plbossart to unify the zeroing of the allocated buffer, I will make another PR to proceed that. |
Add checks and return failure if buffer allocation fails to ipc_init(). Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Check and return failure, in case of failure happens in ipc_init(). Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The fuzzy IPC testing reports there is memory leak if dma_trace_config IPC is sent multiple times, we should just return if the DMA trace buffer is already allocated, to avoid allocating new buffers and the leakage of the old buffers. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@lgirdwood shouldn't those users be fixed? If we're at this, shall we maybe take a step back and decide what we want our heap allocation API look like? If we're moving towards Zephyr, should we say, that we basically only have |
|
@keyonjie as a side-track: I think it would be worthwhile looking through all |
@lyakh there is one troublesome thing that if we want to use uncached buffer to sharing among cores, we have to use e.g. platform_shared_get() if the object is a static object one. |
marc-hb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when they're actually minor, bug fixes deserve a more specific name than "Minor fixes". At the very least : "Minor IPC and DMA trace fixes" so this PR can be identified among others.
Small change requested for the trace fix.
PS: I would not call "minor" a memory leak issue, especially not in this sort of environment.
| int err; | ||
|
|
||
| /* the existence of the trace buffer means the dma trace already initialized */ | ||
| if (d->dmatb.addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a safer dma_trace_initialized()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi , @plbossart , others: there is no IPC to _disable() the trace, correct? I found some IPC only for trace_off() but none to unwind this.
Just making sure running dma_trace_enable() a second time is never a valid use case and that this fix is purely "defensive programming". If correct the comment should say that this is never expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi , @plbossart , others: there is no IPC to
_disable()the trace, correct?
It is under work by @ranj063.
I found some IPC only for
trace_off()but none to unwind this.Just making sure running
dma_trace_enable()a second time is never a valid use case and that this fix is purely "defensive programming". If correct the comment should say that this is never expected.
It is going to be valid with the SOF client support, but if we have the #4849 then yes, this is purely safe coding against broken or old host
|
|
||
| /* the existence of the trace buffer means the dma trace already initialized */ | ||
| if (d->dmatb.addr) | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, you must proceed to dma_trace_start() to get the new buffer parameters into use and reconfigure the channel.
if (!d->dmatb.addr) {
err = dma_trace_buffer_init(d);
if (err < 0) {
mtrace_printf(LOG_LEVEL_ERROR, "dma_trace_enable: buffer_init failed");
goto out;
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int err; | ||
|
|
||
| /* the existence of the trace buffer means the dma trace already initialized */ | ||
| if (d->dmatb.addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi , @plbossart , others: there is no IPC to
_disable()the trace, correct?
It is under work by @ranj063.
I found some IPC only for
trace_off()but none to unwind this.Just making sure running
dma_trace_enable()a second time is never a valid use case and that this fix is purely "defensive programming". If correct the comment should say that this is never expected.
It is going to be valid with the SOF client support, but if we have the #4849 then yes, this is purely safe coding against broken or old host
Right, but only once. |
|
@keyonjie I've lost track of what's going on here, is there any rework required or is this no longer required ? |
@lgirdwood this was split from #4747 , don't have chance to proceed the refinement part4 PR, so is this. |
ujfalusi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtrace fix is still not correct.
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First 2 patches seem fine, could be a separate PR whilst the final patch is being aligned.
@lgirdwood, fwiw PR #4879 deals with the memleak along with other problems which could happen if the release dtrace message is not sent. |
|
it doesn't seem like this PR is anywhere close to a consensus. @lgirdwood @keyonjie what's the plan here? |
The change is needed in case we want the #4747 get proceed, they are for memory zones refinement/simplification only and not mandatory needed. I will proceed them if have time before Christmas, otherwise will close them. |
|
I have no plan to work on this anymore, closing it. |
The series includes 2 fixes: