diff --git a/Sources/IndexStore/Index Store.md b/Sources/IndexStore/Index Store.md new file mode 100644 index 0000000..dcd0485 --- /dev/null +++ b/Sources/IndexStore/Index Store.md @@ -0,0 +1,192 @@ +# Index Store + +## Overview + +The Swift and Apple Clang compilers are able to emit references between symbols during compilation as an *Index Store*. Conceptually, the Index Store can be thought of as a dump of source locations mapped to unique identifiers of the symbols that are declared or referenced at those locations. Despite its name, it is not efficiently queryable, other technologies like indexstore-db are required to efficiently find occurrences of a given symbol. + +### Example + +Consider the following source file: + +```swift +func fib(_ n: Int) -> Int { + if n == 0 || n == 1 { + return 0 + } + return fib(n - 1) + fib(n - 2) +} +``` + +Compiling it can generate the following Index Store: + +``` +function/Swift | fib(_:) | s:4test3fibyS2iF | | Def,Ref,Call,RelCall,RelCont - RelChild,RelCall,RelCont +param(local)/Swift | n | s:4test3fibyS2iF1nL_Sivp | | Def,RelChild - +struct/Swift | Int | s:Si | | Ref,RelCont - +static-method/infix-operator/Swift | ==(_:_:) | s:Si2eeoiySbSi_SitFZ | | Ref,Call,RelCall,RelCont - +static-method/infix-operator/Swift | ||(_:_:) | s:Sb2oooiyS2b_SbyKXKtKFZ | | Ref,Call,RelCall,RelCont - +static-method/infix-operator/Swift | -(_:_:) | s:Si1soiyS2i_SitFZ | | Ref,Call,RelCall,RelCont - +static-method/infix-operator/Swift | +(_:_:) | s:Si1poiyS2i_SitFZ | | Ref,Call,RelCall,RelCont - +------------ +1:6 | function/Swift | s:4test3fibyS2iF | Def | rel: 0 +1:12 | param(local)/Swift | s:4test3fibyS2iF1nL_Sivp | Def,RelChild | rel: 1 + RelChild | s:4test3fibyS2iF +1:15 | struct/Swift | s:Si | Ref,RelCont | rel: 1 + RelCont | s:4test3fibyS2iF +1:23 | struct/Swift | s:Si | Ref,RelCont | rel: 1 + RelCont | s:4test3fibyS2iF +2:10 | static-method/infix-operator/Swift | s:Si2eeoiySbSi_SitFZ | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +2:15 | static-method/infix-operator/Swift | s:Sb2oooiyS2b_SbyKXKtKFZ | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +2:20 | static-method/infix-operator/Swift | s:Si2eeoiySbSi_SitFZ | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +5:12 | function/Swift | s:4test3fibyS2iF | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +5:18 | static-method/infix-operator/Swift | s:Si1soiyS2i_SitFZ | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +5:23 | static-method/infix-operator/Swift | s:Si1poiyS2i_SitFZ | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +5:25 | function/Swift | s:4test3fibyS2iF | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +5:31 | static-method/infix-operator/Swift | s:Si1soiyS2i_SitFZ | Ref,Call,RelCall,RelCont | rel: 1 + RelCall,RelCont | s:4test3fibyS2iF +``` + +The dump consists of two sections. The first section lists all symbols that occur within this dump. In the context of the Index Store, a symbol is a declaration that can be referenced by other parts of the source code, like types, functions, properties, global variables etc. Each symbol is uniquely identified by a *USR (Unified Symbol Resolution)*, eg. `s:4test3fibyS2iF`. A USR contains all information to uniquely identify a symbol. For Swift symbols a USR is very similar to the symbol's mangled name. For C it is a function's base name since C doesn’t allow overloading based on parameters, for C++ it contains namespace information etc. + +The second section lists all occurrences of these symbols. For example, we can see that +- At 1:6 a Swift function is defined (role `Def`) and it has the USR `s:4test3fibyS2iF`. We can look up the name of this USR in the first section to see that it is `fib(_:)`. +- At 1:12, a parameter is defined with the USR `s:4test3fibyS2iF1nL_Sivp`, a lookup in the symbols section shows that its name is `n`. This parameter has a single relation: It is the child (`RelChild`) of the function declaration `s:4test3fibyS2iF`. +- At 1:15 we have a reference to the struct `s:Si`, ie. `Int`. This reference is contained in (`RelCont`) our function `s:4test3fibyS2iF`. + +## Generating the Index Store + +The Index Store is generated by passing the `-index-store-path` flag to a swiftc or clang invocation. Usually developers don’t need to deal with this themselves but the build system takes care of adding the flag. For example, SwiftPM has the `--enable-index-store/--disable-index-store` flags to control whether the Index Store is created or not. + +Additionally, an Index Store may be updated in the background using background indexing, eg. by SourceKit-LSP. In a nutshell, it invokes the compiler with additional flags that avoid work not needed for Index Store generation, such as CodeGen. [SourceKit-LSP’s documentation](https://github.com/swiftlang/sourcekit-lsp/blob/main/Contributor%20Documentation/Background%20Indexing.md) contains more implementation details. + +## Format of the Index Store + +The Index Store is stored as a collection of binary files in a directory structure. To read the data in the Index Store, each Swift toolchain contains a `libIndexStore` dynamic library, eg. at `usr/lib/libIndexStore.dylib` inside macOS Swift toolchains. The `IndexStore` library wraps the low-level `libIndexStore.dylib` reader and provides an ergonomic Swift API on top. While only a `libIndexStore` that from the toolchain that produced the Index Store is guaranteed to be able to read it, in practice `libIndexStore` should be able to also read older Index Stores. + +Let us consider the directory structure of an Index Store generated by a package with two source files: `test.swift` and `other.swift`: + +``` +v5 +├── records +│   ├── 0Y +│   │   └── arm64e-apple-macos.swiftinterface_Reflection-18WOU8H896L0Y +│   ├── 12 +│   │   └── arm64e-apple-macos.swiftinterface_Assert-2ORWYHBJ1VA12 +│   ├── 17 +│   │   └── arm64e-apple-macos.swiftinterface_Protocols-14TBUBXQLKV17 +│   ├── 2C +│   │   └── other.swift-2GDE3ZEL99U2C +│   ├── 4Y +│   │   └── arm64e-apple-macos.swiftinterface_Math-8CIA6M3B8X4Y +│   ├── 53 +│   │   └── arm64e-apple-macos.swiftinterface_Collection_Lazy_Views-LPA2ZHOSK53 +│   ├── 7J +│   │   └── arm64e-apple-macos.swiftinterface_Result-3DRT6J7QMI07J +│   ├── AV +│   │   └── arm64e-apple-macos.swiftinterface_C-35DU15S4PWNAV +│   ├── DH +│   │   └── _SwiftConcurrency.h-3TLE8ZVCLBTDH +│   ├── E4 +│   │   └── test.swift-3AGR279RFNIE4 +⋮ ⋮ +└── units + ├── _SwiftConcurrencyShims-3EE9MJCB8LQ6I.pcm-2GRRIA0ZOUJTH + ├── arm64e-apple-macos.swiftinterface-1DTAHJ3J2D26N + ├── arm64e-apple-macos.swiftinterface-2YCQFVE5WOSLZ + ├── arm64e-apple-macos.swiftinterface-BSSPO6V0PB1L + ├── other.swift.o-1SJNUR2KLQFKN + └── test.swift.o-7DHNZXNL535L +``` + +The top level folder is called `v5`. This allows future evolution of the Index Store’s layout. Inside of it, there are two types of files: Units and records. At a high level, a record represents the contents of a source file, as seen during compilation. Unit files correspond to a single compilation unit, which may consist of multiple source files in the case of header files being included by a C-based file. + +### Unit files + +Let us look at a unit contained in the Index Store above: + +`test.swift.o-7DHNZXNL535L` has the following contents + +``` +provider: swift- +is-system: 0 +is-module: 0 +module-name: test +has-main: 1 +main-path: /private/tmp/test/Sources/test/test.swift +work-dir: /tmp/test +out-file: /private/tmp/test/.build/arm64-apple-macosx/debug/test.build/test.swift.o +target: arm64-apple-macosx10.13 +is-debug: 1 +DEPEND START +Unit | system | Swift | /Applications/Xcode-WE.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX26.4.sdk/usr/lib/swift/Swift.swiftmodule/arm64e-apple-macos.swiftinterface | arm64e-apple-macos.swiftinterface-BSSPO6V0PB1L +Unit | system | Swift | /Applications/Xcode-WE.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX26.4.sdk/usr/lib/swift/_StringProcessing.swiftmodule/arm64e-apple-macos.swiftinterface | arm64e-apple-macos.swiftinterface-2YCQFVE5WOSLZ +Unit | system | _SwiftConcurrencyShims | /private/tmp/test/.build/arm64-apple-macosx/debug/ModuleCache/2AJGORME06Q68/_SwiftConcurrencyShims-3EE9MJCB8LQ6I.pcm | _SwiftConcurrencyShims-3EE9MJCB8LQ6I.pcm-2GRRIA0ZOUJTH +Unit | system | Swift | /Applications/Xcode-WE.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX26.4.sdk/usr/lib/swift/_Concurrency.swiftmodule/arm64e-apple-macos.swiftinterface | arm64e-apple-macos.swiftinterface-1DTAHJ3J2D26N +Record | user | /private/tmp/test/Sources/test/test.swift | test.swift-3AGR279RFNIE4 +DEPEND END (5) +INCLUDE START +INCLUDE END (0) +``` + +A couple things to highlight here are: +- The source file that this unit was generated for was `/private/tmp/test/Sources/test/test.swift`. For a C-based project, the main files will be `.c`, `.m`, `.cpp` and similar files, not header files. +- The output path is set to `/private/tmp/test/.build/arm64-apple-macosx/debug/test.build/test.swift.o`. We identify unit files by their output path and the output path is also what determines the hash at the end of the unit’s filename. See more on this below. +- The unit declares a dependency on the record file `test.swift-3AGR279RFNIE4` and thus the record file with this name contains the symbol occurrences of the source file as it was viewed during this compilation. +- The unit depends on four other units because their modules were imported: The interfaces of the `Swift` standard library module with its peers `_StringProcessing` and `_Concurrency` as well as the `_SwiftConcurrencyShims` precompiled clang module (pcm). + +As we just saw, the Index Store not only indexes the project’s source files but also the SDK modules. For example the `arm64e-apple-macos.swiftinterface_Result-3DRT6J7QMI07J` record will contain the definition of the standard library’s `Result` type. + +#### Output path + +Using the output path instead of the source path to identify a unit allows us maintain an index for two different compilation units of the same source file. For example, consider a Swift file that is built for both iOS and watchOS within the same project and which contains `#if os(watchOS)` conditional compilation directives. The Index Store of this file will look different for the iOS and watchOS because any declarations within `#if` will only be present in the watchOS compilation of the file. Since the compile jobs will put object files in different locations, we can maintain two unit files with different output paths, referencing the same source file. Since the contents of the record files will be different in that case, two record files with different hashes will be created and the unit files will each reference a different record file. + +It also has to be noted that while the output path used to correspond to actual object files on disk generated during compilation, it has been generalized to be a unique identifier for a particular build configuration. It may not correspond to any path on disk. It is thus advisable to treat this value as an opaque string. + +### Record files + +We already saw an example of a record file’s contents in the *Overview* section. We will thus not dive any deeper into its contents. + +While unit files represent an entire compilation unit, record files represent the contents of individual source files. For the most part these are `.swift`, `.c`. `.h` and similar files but inside the SDK also `.swiftinterface` files may be indexed and thus generate a record file. + +Each record file contains a hash of its contents at the end of the file name. Consider a particular record file as the interpretation of its contents during a compilation. As long as all declarations and references remain the same and at the same location, a source file’s record doesn’t change. This means that eg. adding content to an end of line comment doesn’t change the record. If, however, a declaration is modified, an overload resolves differently, or any other significant modification is made, then the source file will have a different record. + +## Miscellaneous + +### Incremental builds + +At the moment, the Index Store is only ever appended to and no unit or record files are every removed. Apart from using disk space, this does not pose an issue for record files if only those record files referenced by unit files are iterated. The Index Store may, however, contain stale unit files: +1. A source file was deleted from the project but the unit file of its last build is still present in the Index Store. This can be easily checked for by ensuring a unit’s main file exists on disk. +2. A source file used to be part of an iOS and watchOS target but was removed from the watchOS target and then modified. The outdated unit file of the last build in the watchOS target still exist in the Index Store and the check in (1) will not catch this because the source file also still exists. SourceKit-LSP solves this issue by querying the build system for the output paths it produces and filtering unit files based on their output path. Depending on the complexity of your project, it may be acceptable to not consider this niece use case. Otherwise a similar integration with the build system is necessary. + +None of this is a consideration if the Index Store was created by a clean build because no old unit or record files exist in this case. + +### Multiple definitions of the same symbol + +Note that a symbol may be defined more than once, eg. in the following examples. + +- `foo()` is defined in line 2 when compiling for Windows but in line 4 for all other platforms +```swift +#if os(Windows) +func foo() { ... } +#else +func foo() { ... } +#endif +``` +- Different targets that have a `main` function will all have separate definitions of those `main` functions with the same USR. + +### Role of indexstore-db + +As mentioned in the overview, the Index Store is not efficiently queryable. indexstore-db serves this purpose by effectively providing an index on top of the Index Store and eg. storing which record files contain certain USRs. + +### Dumping an Index Store + +The dumps of the unit and record files were generated by running `c-index-test core -print-unit path/to/unit` or `-print-record`. To build `c-index-test`, you need to build [llvm-project](https://github.com/swiftlang/llvm-project). + +Writing an Index Store dumper inside this repository based on the `IndexStore` Swift library may be a good and useful exercise. diff --git a/Sources/IndexStore/README.md b/Sources/IndexStore/README.md new file mode 100644 index 0000000..d2db704 --- /dev/null +++ b/Sources/IndexStore/README.md @@ -0,0 +1,42 @@ +# IndexStore Swift library + +The `IndexStore` Swift library is a wrapper around `libIndexStore.dylib` to iterate through an Index Store. An understanding of the Index Store’s structure is assumed to use this library. Read [Index Store.md](Index%20Store.md) for an overview. + +One thing to note is that the `IndexStore` library does not provide efficient query access into the Index Store, its purpose is to iterate through an Index Store. + +Furthermore, the `IndexStore` is designed to provide type-safe access to an Index Store with the least to no overhead over the C API provided by `libIndexStore.dylib`. Speed is generally favored over the most convenient APIs, which in particular, manifests in non-Escapable types and the existence of `forEach` methods to iterate over collections. + +## Example + +The following iterates through all records inside an Index Store to print the definitions within them. + +```swift +// Get a reference to the IndexStore library that is used to read Index Stores +let indexStoreLibrary = try await IndexStoreLibrary.at(dylibPath: URL(filePath: "/path/to/usr/lib/libIndexStore.dylib")) + +// Open an Index Store +let indexStore = try indexStoreLibrary.indexStore(at: "/path/to/index/store") + +// Iterate through all unit names and retrieve the record names within the Index Store +let recordNames = try indexStore.unitNames(sorted: false).map { unit in + let unit = try indexStore.unit(named: unit) + return unit.dependencies.compactMap { dependency in + if dependency.kind == .record { + return dependency.name.string + } + return nil + } +}.flatMap(\.self) + +// Iterate through all record names and print the definitions within them. +for recordName in recordNames { + let record = try indexStore.record(named: recordName) + record.occurrences.forEach { occurrence in + guard occurrence.roles.contains(.definition) else { + return .continue + } + print("\(occurrence.position.line):\(occurrence.position.column): \(occurrence.symbol.name.string)") + return .continue + } +} +```