[WIP]Caching exposed modules for Hackage packages in Pantry DB#4624
[WIP]Caching exposed modules for Hackage packages in Pantry DB#4624
Conversation
|
Fixes #4536 |
|
@snoyberg could you take a look at this variant? On my machine for
From logs it looks like lock files could give a bit more improvement and probably we could get something like 2x slowdown comparing to the current stable version. Also I have an idea to store module list as a blob (maybe even in |
snoyberg
left a comment
There was a problem hiding this comment.
We can discuss this if the overall idea isn't clear here. But: instead of performing as much logic in the Haskell side, what about pushing the lookup logic into the SQLite database instead? I'm picturing something like this:
SnapshotModuleCache
someUniqueSnapshotIdentifier Text
SnapshotPackage
snapshot SnapshotModuleCacheId
cabal BlobId
UniqueSnapshotPackage snapshot cabal
PackageExposedModule
cabal BlobId
module ModuleId
UniquePackageExposedModule
Then on the Haskell side:
-- Ensures that the tables are filled in for a snapshot
populateSnapshotModuleCache :: Snapshot -> RIO env ()
-- Find all packages which contain the given module name
findPackagesWithModule :: Snapshot -> ModuleName -> RIO env [PackageName]I have the details very wrong above, but I think with this you could:
- Have a relatively slow
populateaction, with a sticky log message explaining "populating module name cache." This will happen once, and the first usage of a snapshot is already expected to be a bit slow, what with downloading the snapshot file and such. - Have a much smaller surface area for data transfer between Haskell and SQLite for each module name lookup
| - resourcet | ||
| - rio-prettyprint | ||
| - mtl | ||
| - extra |
There was a problem hiding this comment.
I think we've been trying to avoid adding dependencies on these kinds of packages.
| -- Cache of modules exposed by a Hackage package | ||
| HackageExposedModules | ||
| cabal BlobId | ||
| moduleName P.ModuleNameP |
There was a problem hiding this comment.
Since there are likely to be a lot of duplicated module names, this would probably benefit from normalization, the same way we normalize package names. How about:
moduleName ModuleId
Module
name P.ModuleName
UniqueModule name
Or something like that?
Note: Documentation fixes for https://docs.haskellstack.org/en/stable/ should target the "stable" branch, not master.
Please include the following checklist in your PR:
Tested manually.