This ticket tracks making bulk insertions (like at https://github.com/ncalexan/mentat/blob/db-tests/db/src/db.rs#L631) more efficient. This is follow-up to #214 (comment).
I don't know the best way to do this, so there's some experimentation needed. I can think of the following considerations:
- average transaction size is > 1 assertion (since there's always a
:db/tx :db/txInstant MILLISECONDS datom)
- most transactions will be small, say < 5 assertions
- some transactions will be huge, say 10,000 or 100,000 assertions
- SQLite will accept only a limited number of bindings per statement handle (usually 1000, but possibly as many 500,000 on some platforms!)
- there's a finite store of SQLite statement handles available
So we want to cache a certain number of insertion statements for some number of assertions, and then chunk our input into those insertion statements. If we had unlimited bindings per statement and unlimited statement handles, we might do a base-2 expansion and lazily cache insertions of 1, 2, 4, 8, 16, assertions. However, given our considerations, it might be better to cache insertions for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, (999 / bindings-per-insertion) assertions, effectively dividing the parameter space into "little" and "huge" regimes.
If we wanted to be really fancy, we could determine our caching strategy dynamically at initialization time, or we could generate different strategies at compile time for different hardware profiles (like most FFI implementations do).
This seems to me similar to "Montgomery ladders" for fast multiplication and "base phi expansions" for doing arithmetic in algebraic groups that have non-trivial computable morphisms, with some additional restrictions thrown in.
This ticket tracks making bulk insertions (like at https://github.com/ncalexan/mentat/blob/db-tests/db/src/db.rs#L631) more efficient. This is follow-up to #214 (comment).
I don't know the best way to do this, so there's some experimentation needed. I can think of the following considerations:
:db/tx :db/txInstant MILLISECONDSdatom)So we want to cache a certain number of insertion statements for some number of assertions, and then chunk our input into those insertion statements. If we had unlimited bindings per statement and unlimited statement handles, we might do a base-2 expansion and lazily cache insertions of 1, 2, 4, 8, 16, assertions. However, given our considerations, it might be better to cache insertions for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, (999 / bindings-per-insertion) assertions, effectively dividing the parameter space into "little" and "huge" regimes.
If we wanted to be really fancy, we could determine our caching strategy dynamically at initialization time, or we could generate different strategies at compile time for different hardware profiles (like most FFI implementations do).
This seems to me similar to "Montgomery ladders" for fast multiplication and "base phi expansions" for doing arithmetic in algebraic groups that have non-trivial computable morphisms, with some additional restrictions thrown in.