Skip to content

RFC: Segment aware vm addressing#480

Open
rjmansfield wants to merge 1 commit intogoogle:mainfrom
rjmansfield:segment-aware-vm-addressing
Open

RFC: Segment aware vm addressing#480
rjmansfield wants to merge 1 commit intogoogle:mainfrom
rjmansfield:segment-aware-vm-addressing

Conversation

@rjmansfield
Copy link
Contributor

Bloaty currently cannot correctly analyze macho universal binaries. When processing a universal binary containing multiple architectures. e.g. arm64 and x86_64, each architecture slice should have its own virtual address space. However Bloaty's current implementation uses a single flat address space, causing these overlapping addresses to conflict and produce incorrect results. As suggested in #153 (comment), this adds a segment id, and then updates the logic to handle multiple address spaces.

This bulk of the changes is introducing a VMAddr structure which contains a segment identifier and an address. The more challenging and complex changes are to ComputeRollup which required relaxing the logic for secondary maps to accommodate things like, padding or gaps (otherwise these maps trigger asserts in the previous code). With the segment infrastructure changes in place, the macho changes were fairly straightforward.

@haberman Does this approach seem reasonable to you? I hope my understanding of the rollup algorithm and the changes are correct. All of the existing tests are passing for me locally but it's possible I've missed something, so any feedback would be appreciated.

src/range_map.h Outdated
// RangeMap maps
//
// [uint64_t, uint64_t) -> std::string, [optional other range base]
// [VMAddr, uint64_t) -> std::string, [optional other range base]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler to make RangeMap mostly unaware of segments, and instead make the container holding the RangeMap contain a map of segment_id -> RangeMap.

I think there is only one place that RangeMap needs to know about segment, and that is other_start, which will indeed need to know which segment the other start belongs to (if the "other" domain is the file, then the segment will always be 0).

Introduces per-segment VM address spaces to allow multiple Mach-O
architecture slices to coexist without address conflicts. RangeSink
gains a segment_id field which gets passed to all ForEachLoadCommand
calls. DualMap holds a map<int, RangeMap> for VM space per segment_id
while the file map remains shared.
@rjmansfield rjmansfield force-pushed the segment-aware-vm-addressing branch from 5e26996 to 95f1d0e Compare March 2, 2026 19:34
@rjmansfield
Copy link
Contributor Author

Updated based on previous feedback, which simplifies things. Note that the segment/slice names depend on changes added in #481 so they're currently stubbed out currently I can rebase or fix if/when it lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants