-
Notifications
You must be signed in to change notification settings - Fork 140
ASoC: SOF: force end-of-file for debugfs trace at suspend #781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: force end-of-file for debugfs trace at suspend #781
Conversation
|
@singalsu can you please give this a try and see if it fixes the problem you faced with missing traces with runtime PM enabled? |
|
@ranj063 I just tried this with the sof patch rebuilt tools but now I get empty trace output. I'll check tomorrow with Kai what I've missed. |
|
@ranj063 @kv2019i Correction, I didn't realize that despite the patched trace didn't show the prior content from boot until this moment when start it shows it later! Now when I had left sof-logger on and started music playback it traced out normally and I saw now the params() output that got dropped previously when started. Also the traces from boot are seen (component new() stuff) is seen when I scroll back the screen. So, this looks usable to me now and improvement from non-patched version. Though the start of showing the events should be tested with some lower amount of output test case. Maybe pressing volume up/down from alsamixer and see if it activates the trace. |
keyonjie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it's good pre-work before we can go further to support retrieving logs belonged to stage before suspend, I have concern about reset sdev->host_offset at file open, we need figure out that.
b9ec601 to
bda0e5c
Compare
|
@keyonjie wrote:
Ok, I figured out a way to capture the early logs if sof-logger is started after FW has been active. Please see my comments above I wrote today, and the new patch I uploaded today. |
bda0e5c to
d5897a3
Compare
New patchset available. This tries to simplify the logic while still keeping the main feature of resyncing position at suspend/resume cycles. A new sof-logger patch was needed (see PR1312). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real comments on the functionality proper but I find this PR a bit messy. It mixes renames, splitting code and addition of new things. It'd be nicer with separate patches.
d5897a3 to
45f0248
Compare
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kv2019i, much simpler than the previous version but I am still missing a couple of points, see below.
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @kv2019i still confused about the expected transitions for the dtrace_eof boolean. this might be a missing set of comments to explain how this might be used.
Move duplicated code in sof_wait_trace_avail() to a helper function. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
0e97ea4 to
54dacdc
Compare
Current trace implementation gets out of sync when sof device is put to suspend. The debugfs file handle is kept open, but firmware will reset its state. After resume, debugfs client's read offset will not be synchronized to firmware and this may result in traces read in incorrect order and/or stale data being read after resume. Add logic to signal end-of-file to read() when firmware tracing has ended, and all trace data has been read. This allows debugfs client to capture all trace data, and reopen the trace file to ensure proper synchronization with firmware after reopening the node. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
54dacdc to
891b86b
Compare
|
@plbossart wrote:
Eh, sorry, this round was waste of your time, I slipped a bug in the V4 patchset that further obfuscated intended usage of dtrace_eof variable (although functionally still ok and passed my tests). No excuses, should have been more careful. Even without my mistake, I agree 'dtrace_eof' is still not ideal, too many comments are needed to explain it. I went back and forth with different names and ended up to rename the variable to 'dtrace_draining' in V5 patchset and further iterated the comments (shorter, just focusing on WHY). This is still a kludge, I agree, but has a lot of practical utility for sof-logger usage, so I'll still update this PR. If you think this has no chance in upstream, then we might consider dropping altogether and wait until we have time to implement a bigger rework (current WA is to disable runtime PM when taking traces). |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kv2019i, I think I get it now and it looks self-explanatory enough.
I'll need a second approver to merge - yes I know I sound like a broken record but this is annoying to see so few reviewers.
|
@kv2019i something is not right with this methodology suggested in this PR. This is what I tried.
So everytime I stop the logger and start it again its output is empty. This is likely because you reset the host_offset to 0 in trace_release. I find this behaviour odd though. I think sof-logger should always be able to read the trace irrespective of whether it is opened for the first time after the DSP is suspended or closed & opened again. |
|
Acked on @ranj063, please make sure the PR really works as we expected before we can merge it, the debugfs trace is used to share FW logs for debugging, so the criteria here should be if it really helps for the debugging, we need show the logs as much as possible, resetting the host_offset and clearing the log output at each logger open sounds really bad idea to me. |
|
Ranjani wrote:
[...]
@ranj063 and @keyonjie , yes, this is exactly the expected behaviour. I sent an RFC about the change of behaviour to the SOF mailing list in April when I started to work on this: Note you will not lose traces. You'll always get the last (upto 64K) traces FW sent even if you did not have logger running. So with the patch, the DMA trace works like "dmesg -c" in kernel. You can get logs out reliably, but you can only read them out once. Until we have implemented offset trace (or trace double-buffering), we must reset the host_offset so SW/FW are resycnhronized. Currently you will often get duplicated traces, or not traces at all -> it's very undeterministic now. |
|
@keyonjie wrote:
Ack, but this is not done in the patch. We clear offset at close, specifically to allow user-space to read out the traces before we sync on the position. |
@kv2019i but this is quite inconvenient if you can only read the trace once after the DSP is suspended. We can live with it for a short while but we need the bigger change to implement trace offset and should be done soon. Users will start complaining about not being able to read the trace the second time around. |
|
@ranj063 wrote:
But in context of this PR, the question is which is now a bigger problem:
That's the call we need to make for this PR -> @ranj063 @keyonjie @plbossart Longer term, I'll reiterate my stance that the offset patch does not solve all issues. The trace buffer remains a scratch buffer and even with offset support implemented, user-space has no way to discover where the DSP is writing to. So only reliable way to use DMA trace is to run sof-logger all the time and store traces to user-space continuously. Offset support will not fix this. |
@kv2019i I personally feel we can't deal with either of them. A few months ago I'd have agreed with you and said (1) is the bigger problem but now we've got external folks using SOF and it is not easy to explain to them that the logger can only read once with PM enabled. |
|
@kv2019i I have a simple suggestion. I agree this PR is necessary but its annoying to not be able to read the data the second time around. Can we possibly change the logger application to read and save the data at first read (possibly in a file) and read from there instead? |
|
@ranj063 wrote:
But why the expectation of a "trace store"? Can you elaborate the usage model a bit. For debugging, you should be storing traces continuously to a file. If you want to debug an error after something strange happens, you should cp out the files (trace, exception, memory dumps) and analyze offlien (and attach to bug issue). What is the usage where you read the trace may times? I think the interface should be viewed as a "trace port" with some intermediate buffering, but it's not a trace store and you'll end with issues if you expect it to be a store (might work sometimes, but not reliable). |
|
@ranj063 wrote:
But this is already supported, right. You 'cp /sys/kernel/debug/sof/trace logfile' and run sof-logger with '-i logfile'. Having sof-logger cache data implicitly sounds a complicated. Where to write, how many traces to keep, potential confusion of combining invalid data from state file cache and valid data from a target data, etc, etc. If really, really need, I'd add a carbon-copy option where sof-logger copies the verbatim binary data all the data to a file, in addition to showing out the parsed log. |
@kv2019i sorry didnt realize we had the option of reading from a file with the logger. I am OK with it as long we shout out loud and clear that this is a handicap with the logger at the moment. |
|
Sorry @kv2019i didn't have much time to go to details of your commits and explaining, I am seeing DMA trace logs lost issue(sometimes no any DMA trace logs show even the FW is running and theoretically we should get logs e.g. trigger commands, hw_params commands) when doing KWD debugging these days. Just went through your commits, my understand to it is something like this:
Without suspend/resume, the typical flow is 1->2->3->4(first sof-logger run)->2->3->4...->2->3->4 (the Nth sof-logger run), this works fine as dtrace_draining is never set and we do nothing in 4) actually. dtrace_is_enabled is true, dtrace_draining is false, and host_offset is keep aligned with FW position update. When suspend happens, let's consider about 2 scenarios: a. suspend without sof-logger run, that is 1->[2->3->4]->5, after 5), dtrace_is_enabled is false, dtrace_draining is true, and host_offset is aligned with the last update from FW position update. Here, when we run sof-logger, 2->3(read out all logs, and then clear dtrace_draining, works fine here)->4(reset host_offset, that means if we try open sof-logger again, no logs will be showed at the 2nd iteration, this is also fine IMO). Then 1->[2->3->4] again, everything same with the first boot, we won't get logs that belong to the previous active stage, this also looks legit(so for first boot fail issues e.g. tplg IPC failure, we can run sof-logger before we turn the driver back to resume via any audio commands, to debug those issues). So in general, this case a is fine for me. b. suspend with sof-logger run, that is 1->2->3->5, after 5), dtrace_is_enabled is false, dtrace_draining is true, and host_offset is aligned with the last update from FW position update. Here, the last logs before suspend is already showed, so if So this case b is acceptable to me. So, after long analyzing and chatter here, I think this is good PR to be merged now, thank you so much. |
keyonjie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After long analyzing for different cases, and the test on my side shows this is good PR to be merged now, thank you so much.
|
Thank you @keyonjie for the very detailed response and review. Your analysis is correct. I only have one note for this:
In fact we can cover this case as well. With recent patches to sof-logger, when you run sof-logger in trace mode (-t) and this b_2 case happens, sof-logger will notice the end of draining phase (read returns once with 0 bytes available) and reopens the trace node. So when FW starts again, sof-logger will continue reading from 0. If you use an older sof-logger binary, then the result will be as per your analysis above and some stale traces will be output.
Great, thank you! @plbossart This would seem to be good to go! |
Great to know about this, thanks.
|
|
Thanks @keyonjie let's merge and test |
Running with KASAN on a VIM3L systems leads to the following splat when probing the Ethernet device: ================================================================== BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8 Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139 CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G E 5.7.0-rc1-00101-g8624b7577b9c #781 Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020 Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x20/0x30 dump_stack+0xec/0x148 print_address_description.isra.12+0x70/0x35c __kasan_report+0xfc/0x1d4 kasan_report+0x4c/0x68 __asan_load4+0x9c/0xd8 _get_maxdiv+0x74/0xd8 clk_divider_bestdiv+0x74/0x5e0 clk_divider_round_rate+0x80/0x1a8 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_hw_round_rate+0xac/0xf0 clk_factor_round_rate+0xb8/0xd0 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_core_round_rate_nolock+0xbc/0x108 clk_core_set_rate_nolock+0xc4/0x2e8 clk_set_rate+0x58/0xe0 meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b] platform_drv_probe+0x78/0xd8 really_probe+0x158/0x610 driver_probe_device+0x140/0x1b0 device_driver_attach+0xa4/0xb0 __driver_attach+0xcc/0x1c8 bus_for_each_dev+0xf4/0x168 driver_attach+0x3c/0x50 bus_add_driver+0x238/0x2e8 driver_register+0xc8/0x1e8 __platform_driver_register+0x88/0x98 meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b] do_one_initcall+0xa8/0x328 do_init_module+0xe8/0x368 load_module+0x3300/0x36b0 __do_sys_finit_module+0x120/0x1a8 __arm64_sys_finit_module+0x4c/0x60 el0_svc_common.constprop.2+0xe4/0x268 do_el0_svc+0x98/0xa8 el0_svc+0x24/0x68 el0_sync_handler+0x12c/0x318 el0_sync+0x158/0x180 The buggy address belongs to the variable: div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b] Memory state around the buggy address: ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa ^ ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa ================================================================== Digging into this indeed shows that the clock divider array is lacking a final fence, and that the clock subsystems goes in the weeds. Oh well. Let's add the empty structure that indicates the end of the array. Fixes: bd6f485 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit f0212a5 ] Running with KASAN on a VIM3L systems leads to the following splat when probing the Ethernet device: ================================================================== BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8 Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139 CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G E 5.7.0-rc1-00101-g8624b7577b9c thesofproject#781 Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020 Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x20/0x30 dump_stack+0xec/0x148 print_address_description.isra.12+0x70/0x35c __kasan_report+0xfc/0x1d4 kasan_report+0x4c/0x68 __asan_load4+0x9c/0xd8 _get_maxdiv+0x74/0xd8 clk_divider_bestdiv+0x74/0x5e0 clk_divider_round_rate+0x80/0x1a8 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_hw_round_rate+0xac/0xf0 clk_factor_round_rate+0xb8/0xd0 clk_core_determine_round_nolock.part.9+0x9c/0xd0 clk_core_round_rate_nolock+0xf0/0x108 clk_core_round_rate_nolock+0xbc/0x108 clk_core_set_rate_nolock+0xc4/0x2e8 clk_set_rate+0x58/0xe0 meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b] platform_drv_probe+0x78/0xd8 really_probe+0x158/0x610 driver_probe_device+0x140/0x1b0 device_driver_attach+0xa4/0xb0 __driver_attach+0xcc/0x1c8 bus_for_each_dev+0xf4/0x168 driver_attach+0x3c/0x50 bus_add_driver+0x238/0x2e8 driver_register+0xc8/0x1e8 __platform_driver_register+0x88/0x98 meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b] do_one_initcall+0xa8/0x328 do_init_module+0xe8/0x368 load_module+0x3300/0x36b0 __do_sys_finit_module+0x120/0x1a8 __arm64_sys_finit_module+0x4c/0x60 el0_svc_common.constprop.2+0xe4/0x268 do_el0_svc+0x98/0xa8 el0_svc+0x24/0x68 el0_sync_handler+0x12c/0x318 el0_sync+0x158/0x180 The buggy address belongs to the variable: div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b] Memory state around the buggy address: ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa ^ ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa ================================================================== Digging into this indeed shows that the clock divider array is lacking a final fence, and that the clock subsystems goes in the weeds. Oh well. Let's add the empty structure that indicates the end of the array. Fixes: bd6f485 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes checkpatch issues: CHECK: Alignment should match open parenthesis thesofproject#87: FILE: drivers/reset/core.c:87: +static int of_reset_simple_xlate(struct reset_controller_dev *rcdev, + const struct of_phandle_args *reset_spec) CHECK: Lines should not end with a '(' thesofproject#540: FILE: drivers/reset/core.c:540: +static struct reset_control *__reset_control_get_internal( CHECK: Alignment should match open parenthesis thesofproject#603: FILE: drivers/reset/core.c:603: +struct reset_control *__of_reset_control_get(struct device_node *node, + const char *id, int index, bool shared, CHECK: Alignment should match open parenthesis thesofproject#781: FILE: drivers/reset/core.c:781: +struct reset_control *__devm_reset_control_get(struct device *dev, + const char *id, int index, bool shared, Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Current trace implementation gets out of sync when sof device is
put to suspend. The debugfs file handle is kept open, but
firmware will reset its state. After resume, debugfs client's read
offset will not be synchronized to firmware and this may result
in traces read in incorrect order and/or stale data being read
after resume.
Changelog:
Related patches:
sof: logger: reopen trace file upon EOF sof#1312
~~ Issues handled in previous iterations~~:
this is not full alternative but has overlap with issue SOF DMA Trace: avoid logs clearance at runtime suspend/resume #306 (and related PR)-> this can be merged as incremental updatelocking needs to be revisited still (I'll go through this if we move forward)-> handled in PR800having multiple u32 entries for various single-bit pieces of dtrace info in sof-priv.h is ugly, I'd like to combine these-> sorted in 11th April push