Skip to content

Conversation

@tmleman
Copy link
Contributor

@tmleman tmleman commented Oct 16, 2025

Add dcache_writeback_region() calls after copying library data to IMR (Isolated Memory Region) to ensure cache coherency in multicore scenarios.

Without these cache operations, when Core 0 loads a library into IMR via memcpy_s(), the data remains in Core 0's data cache and is not written back to main memory. When Core 1 later tries to create a module from this library, it reads uninitialized or stale data from IMR, causing either:

  • "Unsupported module API version" errors (reading garbage build info)
  • Fatal PIF data errors and crashes (accessing corrupted module data)

The fix adds cache writeback operations at two critical points:

  1. After copying the manifest (MAN_MAX_SIZE_V1_8 bytes)
  2. After copying the entire library (preload_size bytes)

This ensures library data written by Core 0 is flushed from cache to IMR memory before Core 1 attempts to read it, following the standard cache coherency protocol for non-coherent Harvard architecture (Xtensa).

Fixes multicore topology crashes on Intel MTL, LNL, PTL, NVL platforms when loading external libraries with modules instantiated on secondary cores.

Copilot AI review requested due to automatic review settings October 16, 2025 11:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a cache coherency issue in multicore scenarios where libraries loaded into IMR (Isolated Memory Region) by one core are not visible to other cores, causing crashes and errors on Intel MTL, LNL, PTL, and NVL platforms.

Key Changes:

  • Added cache writeback operation after copying the library manifest to IMR
  • Added cache writeback operation after copying the complete library data to IMR

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@tmleman tmleman force-pushed the topic/upstream/pr/lib/fix_for_multicore branch from c3b6e0a to eb93619 Compare October 16, 2025 13:35
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tmleman ! Excellent bugfix and also a very good commit message. This really helps when reviewing!

Add dcache_writeback_region() call after copying library data to IMR
(Isolated Memory Region) to ensure cache coherency in multicore
scenarios.

Without this cache operation, when Core 0 loads a library into IMR via
memcpy_s(), the data remains in Core 0's data cache and is not written
back to main memory. When Core 1 later tries to create a module from
this library, it reads uninitialized or stale data from IMR, causing
either:
- "Unsupported module API version" errors (reading garbage build info)
- Fatal PIF data errors and crashes (accessing corrupted module data)

The fix adds a single cache writeback operation after all library data
(manifest + module code/data) has been copied to IMR. This ensures
library data written by Core 0 is flushed from cache to IMR memory
before Core 1 attempts to read it, following the standard cache
coherency protocol for non-coherent Harvard architecture (Xtensa).

Fixes multicore topology crashes on Intel MTL, LNL, PTL, NVL platforms
when loading external libraries with modules instantiated on secondary
cores.

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright... My only question is - why aren't Jenkins multicore nocodec tests failing left and right?..

}

/* Writeback entire library to ensure it's visible to other cores */
dcache_writeback_region((__sparse_force void *)library_base_address, preload_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we continue using this API or should we gradually switch over to calling sys_cache_data_flush_range() directly?

@kv2019i kv2019i merged commit 2c248ee into thesofproject:main Oct 17, 2025
39 of 45 checks passed
@kv2019i kv2019i added the backport-to-stable PRs that should be backported to stable branches label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-stable PRs that should be backported to stable branches

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants