Invalidate instruction cache when restoring#115
Conversation
When restoring executable pages, invalidate the instruction cache for that address range. This ensures that a thread executing in a different CPU on restore will execute the newly-restored instructions. We do that in an arch-independent fashion by using __builtin___clear_cache (which is a no-op on e.g. X86). On e.g. ARMv7, this needs to execute the cacheflush(..., ICACHE) syscall, so we only do that for VMAs that are marked as executable. For the same reason, we now need to to link the parasite against libgcc.a, so switch to using the compiler driver (gcc) for linking rather than invoking ld.bfd directly, so that it can resolve the path to libgcc.a for us.
|
Thanks for the patch - this is an interesting case. Could you please share a bit more context on how this surfaced? Which CPU(s) did you see this on? Under what conditions? Anything special about the process being frozen? Are you able to reproduce it? Would be ideal if you could share an isolated test case. I tried several approaches to test this cache invalidation, and it always behaves correctly on all platforms I have. |
|
|
||
| read(cd, (void *)req.u.mem.vmr.addr, req.u.mem.vmr.len); | ||
| read(cd, (void *)req.u.mem.vmr.addr, req.u.mem.vmr.len); | ||
| /* Invalidate the instruction cache for the region we just overwrote. */ |
There was a problem hiding this comment.
Oops, will tabify, thanks.
| read(cd, (void *)req.u.mem.vmr.addr, req.u.mem.vmr.len); | ||
| /* Invalidate the instruction cache for the region we just overwrote. */ | ||
| if (req.u.mem.vmr.prot & PROT_EXEC) | ||
| __builtin___clear_cache((void *)req.u.mem.vmr.addr, (char *)req.u.mem.vmr.addr + req.u.mem.vmr.len); |
There was a problem hiding this comment.
this fails to link on riscv:
/usr/bin/ld.bfd: ./parasite.o: in function .L206': parasite.c:(.text+0x43a): undefined reference to __riscv_flush_icache'
There was a problem hiding this comment.
Yah, just saw that in your CI. Whereas the ARMv7 version of __builtin___clear_cache expands to a call to a helper in libgcc.a (which simply invokes the cacheflush() syscall), it looks like on RISC-V it expands to a call to __riscv_flush_icache which the manpage says is implemented in libc.
So unfortunately fixing this on RISC-V would mean having a dependency on the availability of a static version of libc. It may be more practical to use syscall(2) to invoke cacheflush there then, what do you think?
There was a problem hiding this comment.
Looks like you're supposed to invoke the riscv_flush_icache syscall on RISC-V: https://www.kernel.org/doc/html/latest/arch/riscv/cmodx.html#cmodx-in-the-user-space.
There was a problem hiding this comment.
yes, on RISC-V syscall is fine - we could wrap this in something like arch_clear_icache() that would hide arch details: syscall on RISC-V, __builtin___clear_cache on ARM, etc.
static libc is a no-go for parasite code that is injected into a target process
|
Hi @mkozlowski, I can't confirm I'm seeing this in production yet, but it would be an issue with a process that does JIT compilation and is multi-threaded. Conceptually, the issue seems clear: the instruction and data caches are not consistent on all architectures. They are on I can cook up a test case to demonstrate this if that would be helpful, but this won't ever trigger on x86 systems. |
Please do. |
When restoring executable pages, invalidate the instruction cache for that address range. This ensures that a thread executing in a different CPU on restore will execute the newly-restored instructions.
We do that in an arch-independent fashion by using __builtin___clear_cache (which is a no-op on e.g. X86). On e.g. ARMv7, this needs to execute the cacheflush(..., ICACHE) syscall, so we only do that for VMAs that are marked as executable.
For the same reason, we now need to to link the parasite against libgcc.a, so switch to using the compiler driver (gcc) for linking rather than invoking ld.bfd directly, so that it can resolve the path to libgcc.a for us.