Skip to content

Conversation

@ranj063
Copy link
Collaborator

@ranj063 ranj063 commented Apr 22, 2019

This will be used for stress testing the IPC interface. The first type "IPC_FLOOD" is used for flooding the DSP with test IPC messages to check for failures.

We add a couple of new debugfs entries "ipc_flood_count" and "ipc_flood_duration_ms" that can be used to execute the IPC flood test. "ipc_flood_count" floods the DSP with the number of test IPCs specified and ipc_flood_duration_ms floods the DSP with test IPC's for the duration (in ms)specified. The test stats such as average, min and max IPC response times are logged in the dmesg.

@plbossart
Copy link
Member

@ranj063 since we are modifying the ABI, we need to follow the process discussed last week, with the suggested header changes reviewed by the TSG.
@lgirdwood you mentioned some experiments with GitHub, how do we move forward here? This is the first real case we need to review.

@lgirdwood
Copy link
Member

@ranj063 @plbossart ready tomorrow, been looking at IPC races Friday -> today.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor ask for for stats

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a read() for this debugFS where it can return count, min time, max time and avg time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood does that mean a read would have to be followed by a write? I could modify the "write to dump the stats into dmesg after the flood test is complete. Would that be OK?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063
I think we should dump out what is the current test status. By show how many IPC send and how many get from the same debugfs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiulipan are you suggesting you want to maintain a history of all the times we've run this test?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063
So we will use demsg for all the test logs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiulipan it is the easiest method for now. If we read from the debugfs, it means I have to find a way to store the results of the last test and have a way to parse the output that the driver returns. It wont be as simple as cat ipc_test or hexdump ipc_test which will give us no info regardign the stats

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 read() could return human-friendly ascii summary of statistics so far. I.e. like the various debug files of graphics -> /sys/kernel/debug/dri/0/ .. this cp/cat'ed out, attached to a bug filing, plus still machine parsed for automated testing (less convenient if only outputting to dmesg)

@xiulipan xiulipan added the ABI involves ABI changes, need extra attention label Apr 22, 2019
@xiulipan
Copy link

FW part: thesofproject/sof#1290
also add ABI tag for both PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063
I think we should dump out what is the current test status. By show how many IPC send and how many get from the same debugfs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063
Will this block the write process? I think there will be long time hang after we write something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiulipan yes depending on how many time you want to send the IPC's. But the dmesg will show you the progress in terms of the ipc's being sent and replies received isnt it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063
What if we rewrite something into the debugfs while this one is running?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiulipan are you worried that the first flood test will be affected by the second flood test?
I tried to write while a test was in progress and didnt see anything unusual

@plbossart
Copy link
Member

@ranj063 what is the status of the PR? last code push 11 days ago, no one approved, is this still relevant?

@ranj063
Copy link
Collaborator Author

ranj063 commented May 3, 2019

@plbossart sorry been lagging behind this week. I'll push a new version by Monday

@ranj063
Copy link
Collaborator Author

ranj063 commented May 7, 2019

Minor ask for for stats

@lgirdwood sorry for the delay on this but what I am struggling with is that the test flod IPC seems to take absolutely no time at all. I tried to compute the response time from the return value of the wait_event_timeout() call in the IPC which returns the remaining jiffies when the response for the IPC is received and it is always 300 which is equal to the IPC timeout value. How should I go about this?

@ranj063
Copy link
Collaborator Author

ranj063 commented May 8, 2019

@lgirdwood @xiulipan @plbossart PR updated with Flood test stats. The stats are printed in dmesg logs after the test completes.

@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from 2adb385 to 706c988 Compare May 8, 2019 06:34
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ranj063. I did not pay attention to this PR so far but it's an interesting idea, see below suggestions to make it even more useful (and readable).

@ranj063 ranj063 force-pushed the fix/add_test_ipc branch 3 times, most recently from a659313 to 9f00f4e Compare May 8, 2019 17:52
@plbossart
Copy link
Member

plbossart commented May 8, 2019 via email

@plbossart
Copy link
Member

plbossart commented May 8, 2019 via email

@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from 9f00f4e to 83f935e Compare May 9, 2019 02:25
@ranj063 ranj063 requested a review from libinyang as a code owner May 9, 2019 02:25
@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from 83f935e to 70e33e5 Compare May 9, 2019 02:29
@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from d7610ed to afc2d2a Compare May 21, 2019 03:57
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 hm, sorry, I didn't follow or re-read the complete discussion, what's preventing you from doing snprintf(dfse->cache_buf, IPC_FLOOD_TEST_RESULT_LEN, ...) directly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmh.... May I suggest getting rid of gotos?

if (!strcmp(dfse->dfsentry->d_name.name, "ipc_flood_duration_ms"))
	flood_duration_test = true;
else if (strcmp(dfse->dfsentry->d_name.name, "ipc_flood_duration_ms"))
	return -EINVAL;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this empty line

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 looks mostly good, if you can make the code nicer with @lyakh s comments we can merge today. Nice capability to add.

@ranj063
Copy link
Collaborator Author

ranj063 commented May 21, 2019

@lyakh doing a snprintf(dfse->cache_buf, IPC_FLOOD_TEST_RESULT_LEN, ...) would split the strings across multiple lines and the formatting gets messed up

@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from afc2d2a to bc2e58e Compare May 21, 2019 16:34
@ranj063
Copy link
Collaborator Author

ranj063 commented May 21, 2019

@plbossart @kv2019i @lyakh updated the PR now. @lyakh suggestion to directly write into dfse->cache_buf was golden. Thanks!

@ranj063 ranj063 force-pushed the fix/add_test_ipc branch 2 times, most recently from acb2c43 to 4926cc0 Compare May 21, 2019 16:43
@ranj063
Copy link
Collaborator Author

ranj063 commented May 21, 2019

@plbossart let me know about the ABI. I feel like ths can go in as is without an ABI update to match the FW side update.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick on code location.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: do we need this code here? Couldn't it be moved for debug.c for consistency? This really has nothing to do with the loader, does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or could it be moved earlier in the function to have all the debugfs stuff in one place. Here it's in the middle of unrelated code, it's a bit odd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart makes sense. I've moved it to the debugfs init function. I think the fw_version creation can also be moved there. But I'll leave it in as is for now

@plbossart
Copy link
Member

@plbossart let me know about the ABI. I feel like ths can go in as is without an ABI update to match the FW side update.

I've pinged @lgirdwood, no reply, so I will let him update the ABI classifier and use MINOR 7 as done for firmware.

@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from 4926cc0 to 7cae5a2 Compare May 21, 2019 18:31
plbossart
plbossart previously approved these changes May 21, 2019
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, need 2nd approver. Thanks @ranj063

@ranj063
Copy link
Collaborator Author

ranj063 commented May 21, 2019

@plbossart let me know about the ABI. I feel like ths can go in as is without an ABI update to match the FW side update.

I've pinged @lgirdwood, no reply, so I will let him update the ABI classifier and use MINOR 7 as done for firmware.

@plbossart I just checked with him and updated the ABI classifier myself.

kv2019i
kv2019i previously approved these changes May 21, 2019
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thanks for the changes!

@plbossart
Copy link
Member

Gah, I added a conflict, @ranj063 can you fix and resubmit?

ranj063 added 3 commits May 21, 2019 14:23
Add mode parameter for snd_sof_debugfs_buf_item() to specify
the mode while creating debugfs entries.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Add a new class of IPC command along with the first
test type, IPC_FLOOD, which will be used for flooding the DSP
with IPCs.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Add a couple of new debugfs entries "ipc_flood_count"
and "ipc_flood_duration_ms" that can be used to
execute the IPC flood test. "ipc_flood_count" floods the DSP
with the number of test IPCs specified and ipc_flood_duration_ms
floods the DSP with test IPC's for the duration(in ms) specified.
The test stats such as average, min and max IPC response times
are logged in the dmesg and saved in the debugfs entry cache buffer.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
@ranj063 ranj063 dismissed stale reviews from kv2019i and plbossart via ea03361 May 21, 2019 21:27
@ranj063 ranj063 force-pushed the fix/add_test_ipc branch from 7cae5a2 to ea03361 Compare May 21, 2019 21:27
@ranj063
Copy link
Collaborator Author

ranj063 commented May 21, 2019

Gah, I added a conflict, @ranj063 can you fix and resubmit?

@plbossart done!

@ranj063 ranj063 requested review from kv2019i, lyakh and plbossart and removed request for plbossart May 21, 2019 21:48
@plbossart plbossart merged commit cab8ad5 into thesofproject:topic/sof-dev May 21, 2019
@ranj063 ranj063 deleted the fix/add_test_ipc branch July 18, 2019 01:18
plbossart pushed a commit that referenced this pull request Jan 28, 2020
On device re-insertion, the RDMA device driver crashes trying to set
up a new QP:

Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G        W         5.4.0 #852
Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
Nov 27 16:32:06 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
Nov 27 16:32:06 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
Nov 27 16:32:06 manet kernel: Call Trace:
Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]

The fix is to copy the qp_init_attr struct that was just created by
rpcrdma_ep_create() instead of using the one from the previous
connection instance.

Fixes: 98ef77d ("xprtrdma: Send Queue size grows after a reconnect")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ABI involves ABI changes, need extra attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants