Skip to content

Conversation

@ktrzcinx
Copy link
Member

@ktrzcinx ktrzcinx commented Sep 21, 2020

Such an mechanism may be used to monitor memory utilization and memory leaks.

Possible memory leaks tracking was main motivation for this mechanism. To scan them, memory usage should be scanned before topology creation, then some functional test should be done, topology deletion and iterate whole process a few times - when memory usage grows, then some memory leak is present. To efficiently track memory leaks, memory utilization must be scanned at strictly defined moments in FW life-cycle, it's why scan should be done on host request instead regular memory utilization printing (eq. in trace system).

FW part: thesofproject/sof#3389

Each define value in series should be aligned and tabs should
be used instead of spaces to follow code-style.

Signed-off-by: Karol Trzcinski <karolx.trzcinski@linux.intel.com>
Section comment should be coherent with IPC prefix from define
names.

Signed-off-by: Karol Trzcinski <karolx.trzcinski@linux.intel.com>
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. I think we need to enable this mode by taking a reference to the device so as to prevent pm_runtime suspend, e.g.

echo 1 > /sys/bus/pci/..../mem_info_enable
cat /sys/bus/pci/..../mem_info
cat /sys/bus/pci/..../mem_info
cat /sys/bus/pci/..../mem_info
echo 0 > /sys/bus/pci/..../mem_info_enable

@ktrzcinx ktrzcinx force-pushed the mem-leak branch 2 times, most recently from d8f8baf to 638544a Compare September 22, 2020 16:01
@ktrzcinx ktrzcinx requested a review from lyakh September 22, 2020 16:07
lyakh
lyakh previously approved these changes Sep 24, 2020
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks correct to me now

lyakh
lyakh previously approved these changes Sep 25, 2020
@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 22, 2020

LGTM but one thing I'd like to make sure we agree upon this difference.
There is no LAST_ELEM in kernel header, which is different from SOF FW header. Hence the array size is remained variable struct config_elem elems[];

Updated: I read late further discussion in
thesofproject/sof#3527
If we can add comments in firmware header, actually I like this idea.

fredoh9
fredoh9 previously approved these changes Oct 22, 2020
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ktrzcinx , looks good now!

kv2019i
kv2019i previously approved these changes Oct 23, 2020
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, these checkpatch things would be nice-to-fix as well.

Values given in this dictionary describes used firmware configuration,
like feature availability, buffer size limits and similar properties.

Signed-off-by: Karol Trzcinski <karolx.trzcinski@linux.intel.com>
kv2019i
kv2019i previously approved these changes Oct 23, 2020
This file content describes memory allocation status
at run-time, typically to detect memory leaks.

Signed-off-by: Karol Trzcinski <karolx.trzcinski@linux.intel.com>
Copy link

@xiulipan xiulipan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. manually test pass on APL UP2 board.
IPC dmesg logs and debugfs output all look nice.

[   58.540955] sof-audio-pci 0000:00:0e.0: ipc tx: 0xd0010000: GLB_DEBUG: MEM_USAGE
[   58.541049] sof-audio-pci 0000:00:0e.0: ipc tx succeeded: 0xd0010000: GLB_DEBUG: MEM_USAGE
/sys/kernel/debug/sof# cat memory_info
zone 0.0 used   0x2340 free   0x5cc0
zone 1.0 used    0xf00 free   0x4100
zone 2.0 used   0x7500 free   0x6300
zone 3.0 used   0x9200 free   0x6e00
zone 3.1 used      0x0 free  0x20000

struct sof_ipc_reply rhdr; /**< generic IPC reply header */
uint32_t reserved[4]; /**< reserved for future use */
uint32_t num_elems; /**< elems[] counter */
struct sof_ipc_dbg_mem_usage_elem elems[]; /**< memory usage information */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this has been discussed and I know the reasons and I don't have a good universal solution, but having different (compatible) struct definitions in the kernel and the firmware still feels uncomfortable to me...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh I think that's a fair point. One one hand, we could have a more strict approach of defining the ABI with a separate formal description (we've started to documents some bits with doxygen notation in FW headers), but in practise the current model has been fairly effective. So for now, any exception needs to be agreed separately, but e.g. in this case, after a lot of back and forth, this seems to be best candidate.

@plbossart
Copy link
Member

@kv2019i I'll let you merge if you feel this is good enough.

@kv2019i
Copy link
Collaborator

kv2019i commented Nov 2, 2020

Thanks all, looks ready for merge. Checkpatch issues are related to uint32 usage which we do allow. One travis failure unrelated to the PR and one CI device test failure is the known DRM warning (already filtered in latest sof-test):
"[ 1017.783841] [drm:drm_dp_send_link_address [drm_kms_helper]] ERROR Sending link address failed with -5"
So all good. Thanks @ktrzcinx !

@kv2019i kv2019i merged commit 7183af0 into thesofproject:topic/sof-dev Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ABI involves ABI changes, need extra attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants