Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 154 additions & 87 deletions Documentation/project-docs/AssemblyStores.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@

- [Assembly Store format and purpose](#assembly-store-format-and-purpose)
- [Rationale](#rationale)
- [Store kinds and locations](#store-kinds-and-locations)
- [Store locations](#store-locations)
- [Store format](#store-format)
- [Common header](#common-header)
- [Assembly descriptor table](#assembly-descriptor-table)
- [Index store](#index-store)
- [Hash table format](#hash-table-format)
- [Native Struct Documentation](#native-struct-documentation)
- [AssemblyStoreHeader](#assemblystoreheader)
- [AssemblyStoreIndexEntry](#assemblystoreindexentry)
- [AssemblyStoreEntryDescriptor](#assemblystoreentrydescriptor)

<!-- markdown-toc end -->

Expand All @@ -19,11 +23,16 @@ assemblies, their debug data (optionally) and the associated config
file (optionally). They are placed inside the Android APK/AAB
archives, replacing individual assemblies/pdb/config files.

Assembly stores are an optional form of assembly storage in the
Assembly stores are a form of assembly storage in the
archive, they can be used in all build configurations **except** when
Fast Deployment is in effect (in which case assemblies aren't placed
in the archives at all, they are instead synchronized from the host to
the device/emulator filesystem)
the device/emulator filesystem).

Applications using MonoVM runtime have the option to turn assembly
stores off in favor of individual assemblies, but CoreCLR applications
(with the exception of FastDev, as mentioned above) support only
this form of assembly storage.

## Rationale

Expand All @@ -50,40 +59,71 @@ An assembly store, however, needs to be mapped only once and any
further operations are merely pointer arithmetic, making the process
not only faster but also reducing the algorithm complexity to O(1).

# Store kinds and locations

Each application will contain at least a single assembly store, with
assemblies that are architecture-agnostics and any number of
architecture-specific stores. dotnet ships with a handful of
assemblies that **are** architecture-specific - those assemblies are
placed in an architecture specific store, one per architecture
supported by and enabled for the application. On the execution time,
the .NET for Android runtime will always map the architecture-agnostic
store and one, and **only** one, of the architecture-specific stores.
# Store locations

Stores are placed in the same location in the APK/AAB archive where the
individual assemblies traditionally live, the `assemblies/` (for APK)
and `base/root/assemblies/` (for AAB) folders.
There exists only one Assembly Store per architecture. Each application will contain
architecture-specific assembly stores, with one store per architecture supported by
and enabled for the application. On the execution time, the .NET for Android runtime
will map one, and **only** one, of the architecture-specific stores based on the
current device architecture.

The architecture agnostic store is always named `assemblies.blob` while
the architecture-specific one is called `assemblies.[ARCH].blob`.
Assembly Store files are placed in the architecture-specific `lib/` directory in the
APK or AAB archives. The Assembly Store file in the APK or AAB archive is found
inside an ELF shared library.

Each APK in the application (e.g. the future Feature APKs) **may**
contain the above two assembly store files (some APKs may contain only
contain assembly store files (some APKs may contain only
resources, other may contain only native libraries etc)

Currently, .NET for Android applications will produce only one set of
stores but when .NET for Android adds support for Android Features, each
feature APK will contain its own set of stores. All of the APKs will
follow the location, format and naming conventions described above.

# Store format

Each target ABI/architecture has a single assembly store file, composed of the following parts:

- **[HEADER]** - Fixed size assembly store header
- **[INDEX]** - Variable size index for assembly name lookups
- **[ASSEMBLY_DESCRIPTORS]** - Assembly descriptor entries
- **[ASSEMBLY_NAMES]** - Assembly name strings
- **[ASSEMBLY DATA]** - The actual assembly data

Each store is a structured binary file, using little-endian byte order
and aligned to a byte boundary. Each store consists of a header, an
assembly descriptor table and, optionally (see below), two tables with
assembly name hashes. All the stores are assigned a unique ID, with
the store having ID equal to `0` being the [Index store](#index-store)
and aligned to a byte boundary.

## [HEADER]

The header is a fixed-size structure at the beginning of each assembly store file:

- **MAGIC** (`uint32_t`) - Magic value `0x41424158` ("XABA" in little-endian)
- **FORMAT_VERSION** (`uint32_t`) - Store format version number (includes ABI and 64-bit flags)
- **ENTRY_COUNT** (`uint32_t`) - Number of assemblies in the store
- **INDEX_ENTRY_COUNT** (`uint32_t`) - Number of entries in the index (typically `ENTRY_COUNT * 2`)
- **INDEX_SIZE** (`uint32_t`) - Index size in bytes

## [INDEX]

Variable-size section containing hash-based lookup entries for assembly names. Contains `INDEX_ENTRY_COUNT` entries (typically `ENTRY_COUNT * 2` entries to handle assembly names both with and without file extensions):

- **NAME_HASH** (`hash_t`) - xxHash of the assembly name (`uint32_t` on 32-bit platforms, `uint64_t` on 64-bit platforms)
- **DESCRIPTOR_INDEX** (`uint32_t`) - Index into the assembly descriptor array
- **IGNORE** (`uint8_t`) - If set to any value other than 0, the assembly should be ignored during loading

## [ASSEMBLY_DESCRIPTORS]

Variable-size section with `ENTRY_COUNT` entries, each describing one assembly:

- **MAPPING_INDEX** (`uint32_t`) - Index into runtime array where assembly data pointers are stored
- **DATA_OFFSET** (`uint32_t`) - Offset from store beginning to assembly data start
- **DATA_SIZE** (`uint32_t`) - Size of the stored assembly data
- **DEBUG_DATA_OFFSET** (`uint32_t`) - Offset to assembly PDB data start (0 if absent)
- **DEBUG_DATA_SIZE** (`uint32_t`) - Size of assembly PDB data (0 if absent)
- **CONFIG_DATA_OFFSET** (`uint32_t`) - Offset to assembly .config file content start (0 if absent)
- **CONFIG_DATA_SIZE** (`uint32_t`) - Size of assembly .config file content (0 if absent)

## [ASSEMBLY_NAMES]

Variable-size section with `ENTRY_COUNT` entries containing assembly name strings:

- **NAME_LENGTH** (`uint32_t`) - Length of assembly name in bytes
- **NAME** (variable length) - UTF-8 encoded assembly name bytes (without NUL terminator)

Assemblies are stored as adjacent byte streams:

Expand All @@ -98,7 +138,7 @@ Assemblies are stored as adjacent byte streams:
make runtime code slightly more efficient.

All the structures described here are defined in the
[`xamarin-app.hh`](../../src/monodroid/jni/xamarin-app.hh) file.
[`xamarin-app.hh`](../../src/native/clr/include/xamarin-app.hh) file.
Should there be any difference between this document and the
structures in the header file, the information from the header is the
one that should be trusted.
Expand All @@ -111,33 +151,30 @@ All kinds of stores share the following header format:
{
uint32_t magic;
uint32_t version;
uint32_t local_entry_count;
uint32_t global_entry_count;
uint32_t store_id;
;
uint32_t entry_count;
uint32_t index_entry_count;
uint32_t index_size; // index size in bytes
};

Individual fields have the following meanings:

- `magic`: has the value of 0x41424158 (`XABA`)
- `version`: a value increased every time assembly store format changes.
- `local_entry_count`: number of assemblies stored in this assembly
- `entry_count`: number of assemblies stored in this assembly
store (also the number of entries in the assembly descriptor
table, see below)
- `global_entry_count`: number of entries in the index store's (see
below) hash tables and, thus, the number of assemblies stored in
**all** of the assembly stores across **all** of the application's
APK files, all the other assembly stores have `0` in this field
since they do **not** have the hash tables.
- `store_id`: a unique ID of this store.
- `index_entry_count`: number of entries in the index
- `index_size`: index size in bytes

## Assembly descriptor table

Each store header is followed by a table of
`AssemblyStoreHeader.local_entry_count` entries, each entry
`AssemblyStoreHeader.entry_count` entries, each entry
defined by the following structure:

struct AssemblyStoreAssemblyDescriptor
struct AssemblyStoreEntryDescriptor
{
uint32_t mapping_index;
uint32_t data_offset;
uint32_t data_size;
uint32_t debug_data_offset;
Expand All @@ -151,8 +188,8 @@ value, other fields describe optional data and can be set to `0`.

Individual fields have the following meanings:

- `data_offset`: offset of the assembly image data from the
beginning of the store file
- `mapping_index`: index into a runtime array where assembly data pointers are stored
- `data_offset`: offset of the assembly image data from the beginning of the store file
- `data_size`: number of bytes of the image data
- `debug_data_offset`: offset of the assembly's debug data from the
beginning of the store file. A value of `0` indicates there's no
Expand All @@ -167,58 +204,88 @@ Individual fields have the following meanings:

## Index store

Each application will contain exactly one store with a global index -
two tables with assembly name hashes. All the other stores **do not**
contain these tables. Two hash tables are necessary because hashes
for 32-bit and 64-bit devices are different.

The hash tables follow the [Assembly descriptor
table](#assembly-descriptor-table) and precede the individual assembly
streams.

Placing the hash tables in a single index store, while "wasting" a
certain amount of memory (since 32-bit devices won't use the 64-bit
table and vice versa), makes for simpler and faster runtime
implementation and the amount of memory wasted isn't big (1000
two tables which are 8kb long each, this being the amount of memory
wasted)
The Assembly Store contains an index section that follows the header and precedes the assembly descriptors.
The index contains entries for assembly name lookups, with each entry formatted according to the `AssemblyStoreIndexEntry` structure.

### Hash table format

Both tables share the same format, despite the hashes themselves being
of different sizes. This is done to make handling of the tables
easier on the runtime.

Each entry contains, among other fields, the assembly name hash. In
case of satellite assemblies, the assembly culture (e.g. `en/` or
`fr/`) is treated as part of the assembly name, thus resulting in a
unique hash. The hash value is obtained using the
Each entry contains the assembly name hash. In case of satellite assemblies,
the assembly culture (e.g. `en/` or `fr/`) is treated as part of the assembly
name, thus resulting in a unique hash. The hash value is obtained using the
[xxHash](https://cyan4973.github.io/xxHash/) algorithm and is
calculated **without** including the `.dll` extension. This is done
for runtime efficiency as the vast majority of Mono requests to load
an assembly does not include the `.dll` suffix, thus saving us time of
calculated **without** including the `.dll` extension. This is done
for runtime efficiency as the vast majority of runtime requests to load
an assembly do not include the `.dll` suffix, thus saving us time of
appending it in order to generate the hash for index lookup.

Each entry is represented by the following structure:

struct AssemblyStoreHashEntry
{
union {
uint64_t hash64;
uint32_t hash32;
};
uint32_t mapping_index;
uint32_t local_store_index;
uint32_t store_id;
};
```cpp
struct AssemblyStoreIndexEntry
{
xamarin::android::hash_t name_hash;
uint32_t descriptor_index;
uint8_t ignore;
};
```

Individual fields have the following meanings:

- `hash64`/`hash32`: the 32-bit or 64-bit hash of the assembly's name
**without** the `.dll` suffix
- `mapping_index`: index into a compile-time generated array of
assembly data pointers. This is a global index, unique across
**all** the APK files comprising the application.
- `local_store_index`: index into assembly store [Assembly descriptor table](#assembly-descriptor-table)
- `name_hash`: the platform-specific hash of the assembly's name
**without** the `.dll` suffix (32-bit hash on 32-bit platforms,
64-bit hash on 64-bit platforms)
- `descriptor_index`: index into assembly store [Assembly descriptor table](#assembly-descriptor-table)
describing the assembly.
- `store_id`: ID of the assembly store containing the assembly
- `ignore`: if set to anything other than 0, the assembly should be ignored when loading

# Native Struct Documentation

This section documents the native C++ structures used in the Assembly Store format, as defined in [`xamarin-app.hh`](../../src/native/clr/include/xamarin-app.hh).

## AssemblyStoreHeader

```cpp
struct [[gnu::packed]] AssemblyStoreHeader final
{
uint32_t magic;
uint32_t version;
uint32_t entry_count;
uint32_t index_entry_count;
uint32_t index_size; // index size in bytes
};
```

This structure defines the header of each Assembly Store file. The `[[gnu::packed]]` attribute ensures that the structure is stored without padding, which is crucial for binary file format compatibility.

## AssemblyStoreIndexEntry

```cpp
struct [[gnu::packed]] AssemblyStoreIndexEntry final
{
xamarin::android::hash_t name_hash;
uint32_t descriptor_index;
uint8_t ignore; // Assembly should be ignored when loading, its data isn't actually there
};
```

This structure represents an entry in the Assembly Store index. The `name_hash` field is either a 32-bit or 64-bit hash depending on the target platform architecture (`xamarin::android::hash_t` is `uint32_t` on 32-bit platforms and `XXH64_hash_t` on 64-bit platforms).

## AssemblyStoreEntryDescriptor

```cpp
struct [[gnu::packed]] AssemblyStoreEntryDescriptor final
{
uint32_t mapping_index;

uint32_t data_offset;
uint32_t data_size;

uint32_t debug_data_offset;
uint32_t debug_data_size;

uint32_t config_data_offset;
uint32_t config_data_size;
};
```

This structure describes an individual assembly within the Assembly Store, including offsets and sizes for the assembly data, debug data (PDB files), and configuration data (.config files).
Loading