Skip to content

Conversation

@keyonjie
Copy link
Contributor

Merge zones SYSTEM_SHARED and RUNTIME_SHARED to the BUFFER zone, and change the corresponding callers.

src/lib/alloc.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if SOF_MEM_FLAG_COHERENT == 1, then we prefer uncache addr ? the flag name is confusing. Or named SOF_MEM_FLAG_UNCACHE

Kconfig.xtos-dbg Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

means uncached buffer ? I know cache coherence but what is coherence buffer ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so coherent (a generic term) means uncache (coherent) memory on Intel HW.

@keyonjie keyonjie force-pushed the heap_refinement branch 3 times, most recently from b7ad7cf to 00633db Compare September 15, 2021 07:43
@gkbldcig
Copy link
Collaborator

Can one of the admins verify this patch?

@keyonjie keyonjie mentioned this pull request Sep 22, 2021
@keyonjie keyonjie force-pushed the heap_refinement branch 8 times, most recently from 359dd45 to 029ad51 Compare September 28, 2021 09:22
@keyonjie keyonjie force-pushed the heap_refinement branch 2 times, most recently from a4fdaec to bf05d38 Compare October 9, 2021 05:40
@keyonjie keyonjie requested a review from yaochunhung as a code owner October 9, 2021 16:00
@keyonjie
Copy link
Contributor Author

@lgirdwood the CI result is becoming better now.
only fuzzy IPC and test bench issues looks valid to me, will fix them soon.

Update to initiaze the new allocated buffers to all 0s by
rballoc_align() helper.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The rballoc_align() helper will initialize new allocated buffers with
0s, update comments to document this.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The fuzzy IPC testing reports there is memory leak if dma_trace_config
IPC is sent multiple times, we should just return if the DMA trace
buffer is already allocated, to avoid allocating new buffers and the
leakage of the old buffers.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
When freeing blocks, we need to perform writeback and invalidate to the
dirty cache lines, otherwise, they will be evicted from the cache at
some point in the future, which will break the usage of the same memory
region from another DSP core.

Introduce a free_ptr to make sure the original 'ptr' is not changed, so
we can use it for this wb/inv operation.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Change the usage of the runtime_shared zone to the buffer zone, and the
calling of rmalloc/rzalloc() from runtime_shared zone to rballoc() with
SOF_MEM_ZONE_RUNTIME_SHARED flag.

Change the rballoc to zeroing allocated buffers to meet the rzalloc()
usage also.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Change to use rballoc(SOF_MEM_FLAG_COHERENT,) one now.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The mux cd could be bigger than 1KB, change to use rballoc() for
allocating the buffer, to try to ensure the allocation is success.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
We are allocating more buffers from BUFFER zone now, change size of
ZONEs to reflect that.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Change the usage of the system_shared zone to the buffer zone, and the
calling of rmalloc/rzalloc() from system_shared zone to rballoc() with
SOF_MEM_ZONE_RUNTIME_SHARED flag.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
We are allocating more buffers from BUFFER zone now, change size of
ZONEs to reflect that.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Since there is no any usage of the runtime_shared zone now, we can
remove it thoroughly now.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Since there is no any usage of the system_shared zone now, we can
remove it thoroughly now.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Align to the new heap memory map and allocator, SYSTEM_SHARED and
RUNTIME_SHARED zones are removed.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Now buffer zone should have bigger size, and runtime zone is smaller.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general idea of reducing the number of zones is good, but I don't think using rballoc() for all allocations is good. Could we do the opposite - remove rballoc() and use rmalloc() for all allocations?

uint32_t alignment)
{
return malloc(bytes);
return calloc(bytes, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, why? I'm against this. The user can decide if they need to initialise the buffer. Often enough the whole of the buffer will be overwritten anyway. This would just waste cycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, why? I'm against this. The user can decide if they need to initialise the buffer. Often enough the whole of the buffer will be overwritten anyway. This would just waste cycles.

this is good point, this is for alignment the testbench to the existed allocator only. In subsequent cleanup series, I will unify the allocation helpers, e.g. to remove the rballoc() thoroughly and use rmalloc()/rzalloc() correspondingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we align this with Zephyr, i.e. if Zephyr clears then we should too, if Zephyr does not clear then we clear in the caller code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood are you planning to keep wrappers around Zephyr heap API or you want to use them directly? I think we should have wrappers. Firstly, they only have two kinds of allocations: aligned and unaligned. Secondly, their allocators take a heap instance as an argument, which applications commonly have no idea about. So, no zeroing (calloc()), no cache-coherency. And if we do have wrappers - let's design them and move towards that design.


/* return if already initialized */
if (buffer->addr)
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't dma_trace_enable() return immediately in this case too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't dma_trace_enable() return immediately in this case too?

Tried that but if failed a lot according to CI result. This is only to fix the fuzzy test failure reported, may be it should be split out as a separated PR.

* future, on top of the memory region now being used for
* different purposes on another core.
*/
dcache_writeback_invalidate_region(ptr, block_map->block_size * hdr->size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why writing it back? If memory is freed, its contents aren't needed any more? But anyway I guess this is superseded by #4851

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why writing it back? If memory is freed, its contents aren't needed any more? But anyway I guess this is superseded by #4851

yes, it was inspired by Andy's #4851 but this is for XTOS, doing DHWBI (Data-cache hit writeback invalidate) looks like a better operation for the memory freeing to me, it follows the willing that the last changed content flushed back to memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, not following. "Better" must have an explanation. Why write it back? Nobody needs it.

uint8_t *dynamic_vectors;

dynamic_vectors = rzalloc(SOF_MEM_ZONE_RUNTIME_SHARED, 0, 0, SOF_DYNAMIC_VECTORS_SIZE);
dynamic_vectors = rballoc(SOF_MEM_FLAG_COHERENT, 0, SOF_DYNAMIC_VECTORS_SIZE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to explain why. Just because it's a relatively large allocation of 1K?


/* allocate new buffer */
buffer = rzalloc(SOF_MEM_ZONE_RUNTIME_SHARED, 0, SOF_MEM_CAPS_RAM,
buffer = rballoc(SOF_MEM_FLAG_COHERENT, SOF_MEM_CAPS_RAM,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I'm totally confused now. I thought rballoc() was only for audio buffers. Then some other uses cropped in. Now you're suggesting even more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I'm totally confused now. I thought rballoc() was only for audio buffers. Then some other uses cropped in. Now you're suggesting even more.

It is just replacement here, as we will remove the RUNTIME_SHARED zone thoroughly in the subsequent commits.

The naming is not so important to me, for the allocator, it doesn't know about what the required buffer will be used for, @lgirdwood asked me to do a "Part 5" to unify the rxxalloc() helpers, at that time we will have only e.g. rmalloc() and rzalloc(), and the allocator internal will decide where (which zone) it will allocate the required buffer on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh direction of travel is simplification of allocator to align with Zephyr

@keyonjie keyonjie mentioned this pull request Oct 12, 2021
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets align the clear/no clear with Zephyr.

uint32_t alignment)
{
return malloc(bytes);
return calloc(bytes, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we align this with Zephyr, i.e. if Zephyr clears then we should too, if Zephyr does not clear then we clear in the caller code.

@lgirdwood
Copy link
Member

@keyonjie @lyakh lets simplify further in alloc.c/h and in wrapper.c and take us closer to POSIX here (which will also align with Zephyr as updates are merged). Can we consolidate our memory APIs to

/* reuse current caps */
#define SOF_MEM_CAPS_RAM			(1 << 0)
#define SOF_MEM_CAPS_ROM			(1 << 1)
#define SOF_MEM_CAPS_EXT			(1 << 2) /**< external */
#define SOF_MEM_CAPS_LP			(1 << 3) /**< low power */
#define SOF_MEM_CAPS_HP			(1 << 4) /**< high performance */
#define SOF_MEM_CAPS_DMA			(1 << 5) /**< DMA'able */
#define SOF_MEM_CAPS_CACHE			(1 << 6) /**< cacheable */
#define SOF_MEM_CAPS_EXEC			(1 << 7) /**< executable */

void *rmalloc(size_t size, uint32_t caps);
void rfree(void *ptr);
void *rcalloc(size_t nmemb, size_t size, uint32_t caps);
void *rrealloc(void *ptr, size_t size, uint32_t caps);

This means we can delete enum mem_zone

@keyonjie
Copy link
Contributor Author

keyonjie commented Oct 18, 2021

void *rmalloc(size_t size, uint32_t caps);
void rfree(void *ptr);
void *rcalloc(size_t nmemb, size_t size, uint32_t caps);
void *rrealloc(void *ptr, size_t size, uint32_t caps);


This means we can delete `enum mem_zone`

Sure, maybe "rcalloc" -> "rzalloc".

By the way, should the allocator always return uncached address as we don't have flag param for those APIs?

@lgirdwood
Copy link
Member

void *rmalloc(size_t size, uint32_t caps);
void rfree(void *ptr);
void *rcalloc(size_t nmemb, size_t size, uint32_t caps);
void *rrealloc(void *ptr, size_t size, uint32_t caps);


This means we can delete `enum mem_zone`

Sure, maybe "rcalloc" -> "rzalloc".

Yep, that's fine.

By the way, should the allocator always return uncached address as we don't have flag param for those APIs?

Look here

#define SOF_MEM_CAPS_CACHE			(1 << 6) /**< cacheable */

We already have a cached flag, so should use it. Btw, @kv2019i has now aligned Zephyr to use same cache/uncache mapping now as xtos so your updates should align on both xtos and zephyr :)

@keyonjie
Copy link
Contributor Author

CAPS_CACHE

Hi @lgirdwood let me try to clarify it.

Now the CAPS_CACHE is somewhat confusion, the flag doesn't mean anything, it is listed in all zones in memory.c.

And the usage of it from the topology don't tell any difference wrt this flag:

dnl Memory capabilities for different buffer types on Tigerlake
define(`PLATFORM_DAI_MEM_CAP',
        MEMCAPS(MEM_CAP_RAM, MEM_CAP_DMA, MEM_CAP_CACHE, MEM_CAP_HP))
define(`PLATFORM_HOST_MEM_CAP',
        MEMCAPS(MEM_CAP_RAM, MEM_CAP_DMA, MEM_CAP_CACHE, MEM_CAP_HP))
define(`PLATFORM_PASS_MEM_CAP',
        MEMCAPS(MEM_CAP_RAM, MEM_CAP_DMA, MEM_CAP_CACHE, MEM_CAP_HP))
define(`PLATFORM_COMP_MEM_CAP', MEMCAPS(MEM_CAP_RAM, MEM_CAP_CACHE))

So maybe we can add a new flag e.g. MEM_CAP_DIRECT_ADDRESS if we want to use it to denote if the zone is uncached address used only, but for this kind of zone, do we intend to return cached address/ptr to the user in allocation APIs?

@lgirdwood
Copy link
Member

And the usage of it from the topology don't tell any difference wrt this flag:

dnl Memory capabilities for different buffer types on Tigerlake
define(`PLATFORM_DAI_MEM_CAP',
        MEMCAPS(MEM_CAP_RAM, MEM_CAP_DMA, MEM_CAP_CACHE, MEM_CAP_HP))
define(`PLATFORM_HOST_MEM_CAP',
        MEMCAPS(MEM_CAP_RAM, MEM_CAP_DMA, MEM_CAP_CACHE, MEM_CAP_HP))
define(`PLATFORM_PASS_MEM_CAP',
        MEMCAPS(MEM_CAP_RAM, MEM_CAP_DMA, MEM_CAP_CACHE, MEM_CAP_HP))
define(`PLATFORM_COMP_MEM_CAP', MEMCAPS(MEM_CAP_RAM, MEM_CAP_CACHE))

So maybe we can add a new flag e.g. MEM_CAP_DIRECT_ADDRESS if we want to use it to denote if the zone is uncached address used only, but for this kind of zone, do we intend to return cached address/ptr to the user in allocation APIs?

The point here is that API callers who pass SOF_MEM_CAPS_CACHE will get the cached address and those who don't pass it will get uncached.

@keyonjie
Copy link
Contributor Author

keyonjie commented Oct 21, 2021

So maybe we can add a new flag e.g. MEM_CAP_DIRECT_ADDRESS if we want to use it to denote if the zone is uncached address used only, but for this kind of zone, do we intend to return cached address/ptr to the user in allocation APIs?

The point here is that API callers who pass SOF_MEM_CAPS_CACHE will get the cached address and those who don't pass it will get uncached.

That's doable, if we are aligned on that, we might need to revisit all API callers as most of them are not following that at the moment, almost none of the caller uses SOF_MEM_CAPS_CACHE explicitly when they are asking for cached address except the audio buffer from the topoloies:

src/arch/xtensa/schedule/task.c:86:		ctx->stack_base = rballoc(0, SOF_MEM_CAPS_RAM,
src/audio/drc/drc.c:76:	state->pre_delay_buffers[0] = rballoc(0, SOF_MEM_CAPS_RAM, bytes_total);
src/audio/codec_adapter/codec_adapter.c:517:		codec->runtime_params = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/codec_adapter/codec/generic.c:43:		dst->data = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/codec_adapter/codec/generic.c:49:		dst->data = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/codec_adapter/codec/generic.c:151:		ptr = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/codec_adapter/codec/passthrough.c:29:	codec->cpd.in_buff = rballoc(0, SOF_MEM_CAPS_RAM, cd->period_bytes);
src/audio/codec_adapter/codec/passthrough.c:36:	codec->cpd.out_buff = rballoc(0, SOF_MEM_CAPS_RAM, cd->period_bytes);
src/audio/dai.c.rej:8:+	dd = rballoc(SOF_MEM_FLAG_COHERENT, SOF_MEM_CAPS_RAM, sizeof(*dd));
src/audio/component.c:258:	blob_handler->data = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/component.c:321:		blob_handler->data_new = rballoc(0, SOF_MEM_CAPS_RAM,
src/audio/tdfb/tdfb.c:380:		cd->fir_delay = rballoc(0, SOF_MEM_CAPS_RAM, delay_size);
src/audio/eq_fir/eq_fir.c:296:	cd->fir_delay = rballoc(0, SOF_MEM_CAPS_RAM, delay_size);
src/audio/base_fw.c:172:	tuple_data = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/src/src.c:613:	cd->delay_lines = rballoc(0, SOF_MEM_CAPS_RAM, delay_lines_size);
src/audio/igo_nr/igo_nr.c:308:	cd->p_handle = rballoc(0, SOF_MEM_CAPS_RAM, cd->igo_lib_info.handle_size);
src/audio/smart_amp/smart_amp.c:84:	sad->mod_handle = rballoc(0, SOF_MEM_CAPS_RAM, mem_sz);
src/audio/smart_amp/smart_amp.c:93:	hspk->buf.frame_in = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:100:	hspk->buf.frame_out = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:107:	hspk->buf.frame_iv = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:114:	hspk->buf.input = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:121:	hspk->buf.output = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:128:	hspk->buf.voltage = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:135:	hspk->buf.current = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:142:	hspk->buf.ff.buf = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:149:	hspk->buf.ff_out.buf = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:156:	hspk->buf.fb.buf = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:163:	hspk->dsmhandle = rballoc(0, SOF_MEM_CAPS_RAM, size);
src/audio/smart_amp/smart_amp.c:199:	caldata->data = rballoc(0, SOF_MEM_CAPS_RAM, size);

src/drivers/intel/pmc-ipc.c:136:	_pmc = rmalloc(SOF_MEM_ZONE_SYS, 0, SOF_MEM_CAPS_RAM,
src/drivers/dw/ssi-spi.c:513:	spi_devices = rmalloc(SOF_MEM_ZONE_SYS, 0, SOF_MEM_CAPS_RAM,
src/platform/library/lib/alloc.c:19:void *rmalloc(enum mem_zone zone, uint32_t flags, uint32_t caps, size_t bytes)
src/schedule/zephyr_ll.c:496:	sch = rmalloc(SOF_MEM_ZONE_SYS, 0, SOF_MEM_CAPS_RAM, sizeof(*sch));

@lgirdwood
Copy link
Member

That's doable, if we are aligned on that, we might need to revisit all API callers as most of them are not following that at the moment, almost none of the caller uses SOF_MEM_CAPS_CACHE explicitly when they are asking for cached address except the audio buffer from the topoloies:

Great, that's what we need to do. Thanks !

@keyonjie
Copy link
Contributor Author

This is only improvement not mandatory, I have no more time to work on this, closing it.

@keyonjie keyonjie closed this Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants