-
Notifications
You must be signed in to change notification settings - Fork 349
dma-trace: add check to avoid dereference from NULL #4678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dbaluta @lgirdwood looks the PR #4636 (comment) introduce some stress test regression as reported in #4676, we might need the fix in urgent. |
src/include/sof/lib/dma.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overkill. If channel is not null, channel->dma and channel->dma->ops should not be null (you can make an assert() out of that), and I believe the copy operation may not be null either. So make an if (channel) { assert(channel->dma); assert(channel->dma->ops); assert(channel->dma->ops->copy); return ...} or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also possibly feels like it's not the right solution in the first place anyway. So please do the check if (channel) at the caller rather than at the callee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I haven't root caused yet which explicit item is NULL as the stress test issue is quite time consuming to reproduce, let me try to simplify it and put the check to the caller.
The DMA trace is not necessary enabled so the trace_data could be NULL, add check to avoid dereference from NULL pointer and panics. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
|
The CI for GLK BOB shows timeout but logs look fine. |
|
@paulstelian97 good now ? |
|
GLK BOB showing a timeout but logs look good afaict. |
|
I think it would have been much better to revert #4636 that fixed a bug without zero description, no test and no confirmed reproduction. |
This reverts commit 7df3674. This restores the ability to use CONFIG_TRACEM (copy everything to mailbox) without crashing, in other words it fixes thesofproject#4699 This also fixes the other DSP panic thesofproject#4676 and removes the need for logical changes in thesofproject#4678, which can be reverted too. commit 7df3674 ("trace: enable trace after it is ready") was meant to fix a crash when tr_xxx() was used early. However I've used very early tracing for months and it never caused any crash (see thesofproject#4334) I tried adding a tr_err() statement immediately after trace_init(sof) in primary_core_init() and it works just fine. primary_core_init() runs extremely early so I don't think it's too demanding not to use an tr_XXX() before the trace even exists. The reverted commits confused initializing and enabling. Reproduction thesofproject#4683 did not seem to demonstrate anything obvious, there's not even a link to a failed test run. I don't understand how playing with spin locks is relevant to this. Later, reproduction thesofproject#4759 finally demonstrated the real issue: through DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal debug circumstances specific to the original author) before the trace is initialized. The previous commit in this series fixes that by simply guarding it with if(trace_get()) -------- I am _not_ pretending that these reverts make the tracing code bug-free and perfect again, absolutely not and very far from it. I'm merely saying that: - The first reverted commit caused at least two regressions: thesofproject#4676 and thesofproject#4699 - These two commits added yet another variable (time) in an already complex situation with an already existing combinatorial "explosion": compile-time Kconfigs, run-time settings, platform-specific bugs (thesofproject#4333, thesofproject#4573, ...), various races, mbox + DMA, different DMA engines, Zephyr vs XTOS, etc. - Last but not least, we don't want to invest in making the exist trace implementation better. We want to switch to the Zephyr implementation instead So let's go back to a previous known good state, I mean _relatively_ good and stay there if we can. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
|
I submitted a revert in #4760 |
This reverts commit 7df3674. This restores the ability to use CONFIG_TRACEM (copy everything to mailbox) without crashing, in other words it fixes #4699 This also fixes the other DSP panic #4676 and removes the need for logical changes in #4678, which can be reverted too. commit 7df3674 ("trace: enable trace after it is ready") was meant to fix a crash when tr_xxx() was used early. However I've used very early tracing for months and it never caused any crash (see #4334) I tried adding a tr_err() statement immediately after trace_init(sof) in primary_core_init() and it works just fine. primary_core_init() runs extremely early so I don't think it's too demanding not to use an tr_XXX() before the trace even exists. The reverted commits confused initializing and enabling. Reproduction #4683 did not seem to demonstrate anything obvious, there's not even a link to a failed test run. I don't understand how playing with spin locks is relevant to this. Later, reproduction #4759 finally demonstrated the real issue: through DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal debug circumstances specific to the original author) before the trace is initialized. The previous commit in this series fixes that by simply guarding it with if(trace_get()) -------- I am _not_ pretending that these reverts make the tracing code bug-free and perfect again, absolutely not and very far from it. I'm merely saying that: - The first reverted commit caused at least two regressions: #4676 and #4699 - These two commits added yet another variable (time) in an already complex situation with an already existing combinatorial "explosion": compile-time Kconfigs, run-time settings, platform-specific bugs (#4333, #4573, ...), various races, mbox + DMA, different DMA engines, Zephyr vs XTOS, etc. - Last but not least, we don't want to invest in making the exist trace implementation better. We want to switch to the Zephyr implementation instead So let's go back to a previous known good state, I mean _relatively_ good and stay there if we can. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
This reverts commit 7df3674. This restores the ability to use CONFIG_TRACEM (copy everything to mailbox) without crashing, in other words it fixes thesofproject#4699 This also fixes the other DSP panic thesofproject#4676 and removes the need for logical changes in thesofproject#4678, which can be reverted too. commit 7df3674 ("trace: enable trace after it is ready") was meant to fix a crash when tr_xxx() was used early. However I've used very early tracing for months and it never caused any crash (see thesofproject#4334) I tried adding a tr_err() statement immediately after trace_init(sof) in primary_core_init() and it works just fine. primary_core_init() runs extremely early so I don't think it's too demanding not to use an tr_XXX() before the trace even exists. The reverted commits confused initializing and enabling. Reproduction thesofproject#4683 did not seem to demonstrate anything obvious, there's not even a link to a failed test run. I don't understand how playing with spin locks is relevant to this. Later, reproduction thesofproject#4759 finally demonstrated the real issue: through DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal debug circumstances specific to the original author) before the trace is initialized. The previous commit in this series fixes that by simply guarding it with if(trace_get()) -------- I am _not_ pretending that these reverts make the tracing code bug-free and perfect again, absolutely not and very far from it. I'm merely saying that: - The first reverted commit caused at least two regressions: thesofproject#4676 and thesofproject#4699 - These two commits added yet another variable (time) in an already complex situation with an already existing combinatorial "explosion": compile-time Kconfigs, run-time settings, platform-specific bugs (thesofproject#4333, thesofproject#4573, ...), various races, mbox + DMA, different DMA engines, Zephyr vs XTOS, etc. - Last but not least, we don't want to invest in making the exist trace implementation better. We want to switch to the Zephyr implementation instead So let's go back to a previous known good state, I mean _relatively_ good and stay there if we can. Signed-off-by: Marc Herbert <marc.herbert@intel.com> (cherry picked from commit f2c13f5)
This reverts commit 7df3674. This restores the ability to use CONFIG_TRACEM (copy everything to mailbox) without crashing, in other words it fixes #4699 This also fixes the other DSP panic #4676 and removes the need for logical changes in #4678, which can be reverted too. commit 7df3674 ("trace: enable trace after it is ready") was meant to fix a crash when tr_xxx() was used early. However I've used very early tracing for months and it never caused any crash (see #4334) I tried adding a tr_err() statement immediately after trace_init(sof) in primary_core_init() and it works just fine. primary_core_init() runs extremely early so I don't think it's too demanding not to use an tr_XXX() before the trace even exists. The reverted commits confused initializing and enabling. Reproduction #4683 did not seem to demonstrate anything obvious, there's not even a link to a failed test run. I don't understand how playing with spin locks is relevant to this. Later, reproduction #4759 finally demonstrated the real issue: through DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal debug circumstances specific to the original author) before the trace is initialized. The previous commit in this series fixes that by simply guarding it with if(trace_get()) -------- I am _not_ pretending that these reverts make the tracing code bug-free and perfect again, absolutely not and very far from it. I'm merely saying that: - The first reverted commit caused at least two regressions: #4676 and #4699 - These two commits added yet another variable (time) in an already complex situation with an already existing combinatorial "explosion": compile-time Kconfigs, run-time settings, platform-specific bugs (#4333, #4573, ...), various races, mbox + DMA, different DMA engines, Zephyr vs XTOS, etc. - Last but not least, we don't want to invest in making the exist trace implementation better. We want to switch to the Zephyr implementation instead So let's go back to a previous known good state, I mean _relatively_ good and stay there if we can. Signed-off-by: Marc Herbert <marc.herbert@intel.com> (cherry picked from commit f2c13f5)
The DMA trace is not necessary enabled so the trace_data could be NULL,
add check to avoid dereference from NULL pointer and panics.
Signed-off-by: Keyon Jie yang.jie@linux.intel.com