Skip to content

Do binary instrumentation of allocation functions #11

@stephenrkell

Description

@stephenrkell

Rather than doing a lot of hairy link-time stuff (see tools/allocscompilerwrapper.py) to interpose on allocation functions, it would be better to do it at run time. This should be less fragile, and may benefit from access to run-time type information. It will also avoid the need to relink the target binary if we want to change our list of allocation functions, making the system more convenient to use. That also opens the possibility of inferring a "good guess" about the allocation functions themselves, from looking at the dynamic call tree, so taking away some of the developer effort that is needed.

Allocation functions that are accessed via call/return are the easy case, so I propose to investigate a solution for those first. (The harder cases are alloca and inlined allocation functions.)

The idea is to trampoline-rewrite the entry and exit paths of allocation functions, to call out into stubs. These stubs should simply do the same things that our current ones do (as generated by stubgen.h). Since we have xed in our dependencies anyway, it may be possible to hand-roll a solution. Most allocation functions have at least five bytes' worth of prologue, into which there are no inward jumps. In such a case, all we really need is to identify a 5-bytes-or-more "launch pad" at the start of the prologue, displace those instructions elsewhere (re-relocating them as necessary), replace them with a jump, and append to them a return instruction. To handle the return path, online rewriting of the on-stack return address should be sufficient.

If it gets hairier (e.g. we have to get involved with diverting branches back into the 5-byte displaced chunk), DynInst (https://dyninst.org/) will look more worthwhile. Importing DynInst as a dependency has pros and cons. There is quite a lot of overlap with what we do. My preference, if possible, would be to build it under contrib/ in the form of an archive, from which we can pick only a few functions and hopefully not pull in too much code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions