-
Notifications
You must be signed in to change notification settings - Fork 140
Add "Test" ipc class #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "Test" ipc class #852
Conversation
8422f7c to
0fdd76b
Compare
|
@ranj063 since we are modifying the ABI, we need to follow the process discussed last week, with the suggested header changes reviewed by the TSG. |
|
@ranj063 @plbossart ready tomorrow, been looking at IPC races Friday -> today. |
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor ask for for stats
sound/soc/sof/debug.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a read() for this debugFS where it can return count, min time, max time and avg time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood does that mean a read would have to be followed by a write? I could modify the "write to dump the stats into dmesg after the flood test is complete. Would that be OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063
I think we should dump out what is the current test status. By show how many IPC send and how many get from the same debugfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiulipan are you suggesting you want to maintain a history of all the times we've run this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063
So we will use demsg for all the test logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiulipan it is the easiest method for now. If we read from the debugfs, it means I have to find a way to store the results of the last test and have a way to parse the output that the driver returns. It wont be as simple as cat ipc_test or hexdump ipc_test which will give us no info regardign the stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063 read() could return human-friendly ascii summary of statistics so far. I.e. like the various debug files of graphics -> /sys/kernel/debug/dri/0/ .. this cp/cat'ed out, attached to a bug filing, plus still machine parsed for automated testing (less convenient if only outputting to dmesg)
|
FW part: thesofproject/sof#1290 |
sound/soc/sof/debug.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063
I think we should dump out what is the current test status. By show how many IPC send and how many get from the same debugfs.
sound/soc/sof/debug.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063
Will this block the write process? I think there will be long time hang after we write something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiulipan yes depending on how many time you want to send the IPC's. But the dmesg will show you the progress in terms of the ipc's being sent and replies received isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063
What if we rewrite something into the debugfs while this one is running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiulipan are you worried that the first flood test will be affected by the second flood test?
I tried to write while a test was in progress and didnt see anything unusual
|
@ranj063 what is the status of the PR? last code push 11 days ago, no one approved, is this still relevant? |
|
@plbossart sorry been lagging behind this week. I'll push a new version by Monday |
@lgirdwood sorry for the delay on this but what I am struggling with is that the test flod IPC seems to take absolutely no time at all. I tried to compute the response time from the return value of the wait_event_timeout() call in the IPC which returns the remaining jiffies when the response for the IPC is received and it is always 300 which is equal to the IPC timeout value. How should I go about this? |
|
@lgirdwood @xiulipan @plbossart PR updated with Flood test stats. The stats are printed in dmesg logs after the test completes. |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ranj063. I did not pay attention to this PR so far but it's an interesting idea, see below suggestions to make it even more useful (and readable).
a659313 to
9f00f4e
Compare
|
@plbossart <https://github.com/plbossart> sure, I could do that. Do we
want to retain both types though count and duration?
Does it hurt to have both? It's useful to have the exact count, to get
an idea of the volume of IPC transactions. It's useful to have the
duration to think more in terms of transients. I don't view them as
exclusive, more of a different type of tests. E.g. for QEMU/FPGA the
notion of time is not meaningful, for an complete system it is.
|
|
@plbossart <https://github.com/plbossart> I think it makes sense to
cover it under the top-level DEBUG. If we enable top-level DEBUG, I
think we should expose as may debugging features as possible. But if you
disagree, I can make this specific. I dont mind either way
The guidance is to have top-level selectors that enable/disable an
entire menu, and then individual parts that are selected on a
need-basis. Enabling a maximum number of capabilities will lead to
Heisenbugs. See e.g. the kernel debug/trace features, they are
identified separately and while we could have default options selecting
them all is not wise.
|
sound/soc/sof/debug.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063 hm, sorry, I didn't follow or re-read the complete discussion, what's preventing you from doing snprintf(dfse->cache_buf, IPC_FLOOD_TEST_RESULT_LEN, ...) directly?
sound/soc/sof/debug.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh.... May I suggest getting rid of gotos?
if (!strcmp(dfse->dfsentry->d_name.name, "ipc_flood_duration_ms"))
flood_duration_test = true;
else if (strcmp(dfse->dfsentry->d_name.name, "ipc_flood_duration_ms"))
return -EINVAL;
sound/soc/sof/debug.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this empty line
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@lyakh doing a snprintf(dfse->cache_buf, IPC_FLOOD_TEST_RESULT_LEN, ...) would split the strings across multiple lines and the formatting gets messed up |
|
@plbossart @kv2019i @lyakh updated the PR now. @lyakh suggestion to directly write into dfse->cache_buf was golden. Thanks! |
acb2c43 to
4926cc0
Compare
|
@plbossart let me know about the ABI. I feel like ths can go in as is without an ABI update to match the FW side update. |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick on code location.
sound/soc/sof/loader.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-pick: do we need this code here? Couldn't it be moved for debug.c for consistency? This really has nothing to do with the loader, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or could it be moved earlier in the function to have all the debugfs stuff in one place. Here it's in the middle of unrelated code, it's a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart makes sense. I've moved it to the debugfs init function. I think the fw_version creation can also be moved there. But I'll leave it in as is for now
I've pinged @lgirdwood, no reply, so I will let him update the ABI classifier and use MINOR 7 as done for firmware. |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, need 2nd approver. Thanks @ranj063
@plbossart I just checked with him and updated the ABI classifier myself. |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, thanks for the changes!
|
Gah, I added a conflict, @ranj063 can you fix and resubmit? |
Add mode parameter for snd_sof_debugfs_buf_item() to specify the mode while creating debugfs entries. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Add a new class of IPC command along with the first test type, IPC_FLOOD, which will be used for flooding the DSP with IPCs. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Add a couple of new debugfs entries "ipc_flood_count" and "ipc_flood_duration_ms" that can be used to execute the IPC flood test. "ipc_flood_count" floods the DSP with the number of test IPCs specified and ipc_flood_duration_ms floods the DSP with test IPC's for the duration(in ms) specified. The test stats such as average, min and max IPC response times are logged in the dmesg and saved in the debugfs entry cache buffer. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
@plbossart done! |
On device re-insertion, the RDMA device driver crashes trying to set up a new QP: Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0 Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page Nov 27 16:32:06 manet kernel: PGD 0 P4D 0 Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852 Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015 Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma] Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12 Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00 Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046 Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000 Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0 Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000 Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8 Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000 Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000 Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0 Nov 27 16:32:06 manet kernel: Call Trace: Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib] Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139 Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib] The fix is to copy the qp_init_attr struct that was just created by rpcrdma_ep_create() instead of using the one from the previous connection instance. Fixes: 98ef77d ("xprtrdma: Send Queue size grows after a reconnect") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This will be used for stress testing the IPC interface. The first type "IPC_FLOOD" is used for flooding the DSP with test IPC messages to check for failures.
We add a couple of new debugfs entries "ipc_flood_count" and "ipc_flood_duration_ms" that can be used to execute the IPC flood test. "ipc_flood_count" floods the DSP with the number of test IPCs specified and ipc_flood_duration_ms floods the DSP with test IPC's for the duration (in ms)specified. The test stats such as average, min and max IPC response times are logged in the dmesg.