-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Lurking in the lower layers of liballocs is a run-time service that is somewhere intermediate between librunt (ELF-level introspection) and liballocs (a typed allocation graph). That intermediate level is roughly a rebindable ELF runtime. It might be good to factor that out and build liballocs on top. This relates to at least the following issues.
- Early introspection #98
- Do binary instrumentation of allocation functions #11
- Binary instrumentation of inline allocation functions and alloca #15
We use various techniques for interposing on bindings: system call trapping, Detours-style binary rewriting with a trampoline, batch-link-time interposition (formerly --wrap, now using my xwrap linker plugin) and load-time interposition (LD_PRELOAD). It would be great if our ELF runtime supported explicit interposition and rebinding, at run time.
This also has an overlap with Shiva and similar binary patching tools, supporting "interposing" (symbol-definition-granularity, expressible as a rebinding) and "splicing" (intra-definition changes).
And it also has an overlap with symbol versioning. Indeed I suspect there is a publishable research contribution in doing this right, if we can do this in a way that unifies symbol versioning, symbol interposition and binary patching techniques.
Stretching things a bit further, system calls arguably should be marked by relocations (we do this! #35) and be rebindable analogously to a call site (doable if we identify a local write to %rax that dominates the syscall and is not used elsewhere... tricks like "zpoline" can be used in other cases, although I dislike the idea of mapping stuff at virtual address zero, and then there is always instruction punning).
And all this should be transparent to tools. Shiva and other tools currently are not. Using libdlbind we can get some traction on added code, e.g. trampolines. What about modified code, e.g. during detours? We get a wrong backtrace on the replaced instruction. Do we get wrong breakpoint hits? Do we get wrong dlsym behaviour?
There is the lurking idea of "pushing the detour back across the incoming call edges". But it works only if the old entry point has not been address-taken and spread unknowably elsewhere.