Consider alternative disk representations to minimize database fragmentation

Our write approach goes something like this:

- Do some reads to figure out upserts etc.
- Do some writes to temporary tables and `fulltext_values`.
- Flush changes into the `transactions` table.
- Flush changes into the `datoms` table.
- Drop the temporary tables.
- Read back from `transactions` to build a transaction report.
- Apply changes — parts, idents, and attributes — to the `schema`, `idents`, and `parts` tables.

All of this happens in one database transaction.

One concern about all of this is that there will be significant fragmentation: in each `transact` we write to `datoms` and `transactions`, interleaving changes (and to other tables, too). Even with WAL, this is a long way from an optimized append-only workflow

SQLite allows for [attached databases](https://www.sqlite.org/lang_attach.html), and writes to attached databases are atomic if not using WAL, and atomic but not crash-safe otherwise.

We could thus split our data, _literally_ making our materialized `datoms` table and its indices a separate materialized store. (And we could do this via our transaction listener if we weren't so concerned about atomicity and speed.)

So long as the transaction to the transaction log database completes first, we can replay transacted datoms into the datoms database in the event of necessary crash recovery.

This approach would allow each database to be tuned independently, would prevent the writer's readback of the transaction log from impacting the datoms-oriented page cache used by readers, and potentially allow us to scale better.

The transactor would have a connection to both DBs, attached. Readers would typically only use a connection to the datoms database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider alternative disk representations to minimize database fragmentation #129

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider alternative disk representations to minimize database fragmentation #129

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions