Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

[db] Optimize bulk insertions using fixed-size statement caches and base expansion #263

@ncalexan

Description

@ncalexan

This ticket tracks making bulk insertions (like at https://github.com/ncalexan/mentat/blob/db-tests/db/src/db.rs#L631) more efficient. This is follow-up to #214 (comment).

I don't know the best way to do this, so there's some experimentation needed. I can think of the following considerations:

  • average transaction size is > 1 assertion (since there's always a :db/tx :db/txInstant MILLISECONDS datom)
  • most transactions will be small, say < 5 assertions
  • some transactions will be huge, say 10,000 or 100,000 assertions
  • SQLite will accept only a limited number of bindings per statement handle (usually 1000, but possibly as many 500,000 on some platforms!)
  • there's a finite store of SQLite statement handles available

So we want to cache a certain number of insertion statements for some number of assertions, and then chunk our input into those insertion statements. If we had unlimited bindings per statement and unlimited statement handles, we might do a base-2 expansion and lazily cache insertions of 1, 2, 4, 8, 16, assertions. However, given our considerations, it might be better to cache insertions for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, (999 / bindings-per-insertion) assertions, effectively dividing the parameter space into "little" and "huge" regimes.

If we wanted to be really fancy, we could determine our caching strategy dynamically at initialization time, or we could generate different strategies at compile time for different hardware profiles (like most FFI implementations do).

This seems to me similar to "Montgomery ladders" for fast multiplication and "base phi expansions" for doing arithmetic in algebraic groups that have non-trivial computable morphisms, with some additional restrictions thrown in.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions