[db] Optimize bulk insertions using fixed-size statement caches and base expansion

This ticket tracks making bulk insertions (like at https://github.com/ncalexan/mentat/blob/db-tests/db/src/db.rs#L631) more efficient.  This is follow-up to https://github.com/mozilla/mentat/pull/214#discussion_r99420334.

I don't know the best way to do this, so there's some experimentation needed.  I can think of the following considerations:

- average transaction size is > 1 assertion (since there's always a `:db/tx :db/txInstant MILLISECONDS` datom)
- most transactions will be small, say < 5 assertions
- some transactions will be huge, say 10,000 or 100,000 assertions
- SQLite will accept only a limited number of bindings per statement handle (usually 1000, but possibly as many 500,000 on some platforms!)
- there's a finite store of SQLite statement handles available

So we want to cache a certain number of insertion statements for some number of assertions, and then chunk our input into those insertion statements.  If we had unlimited bindings per statement and unlimited statement handles, we might do a base-2 expansion and lazily cache insertions of 1, 2, 4, 8, 16, assertions.  However, given our considerations, it might be better to cache insertions for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, (999 / bindings-per-insertion) assertions, effectively dividing the parameter space into "little" and "huge" regimes.

If we wanted to be _really_ fancy, we could determine our caching strategy dynamically at initialization time, or we could generate different strategies at compile time for different hardware profiles (like most FFI implementations do).

This seems to me similar to "Montgomery ladders" for fast multiplication and "base phi expansions" for doing arithmetic in algebraic groups that have non-trivial computable morphisms, with some additional restrictions thrown in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[db] Optimize bulk insertions using fixed-size statement caches and base expansion #263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[db] Optimize bulk insertions using fixed-size statement caches and base expansion #263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions