Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Consider alternative disk representations to minimize database fragmentation #129

@rnewman

Description

@rnewman

Our write approach goes something like this:

  • Do some reads to figure out upserts etc.
  • Do some writes to temporary tables and fulltext_values.
  • Flush changes into the transactions table.
  • Flush changes into the datoms table.
  • Drop the temporary tables.
  • Read back from transactions to build a transaction report.
  • Apply changes — parts, idents, and attributes — to the schema, idents, and parts tables.

All of this happens in one database transaction.

One concern about all of this is that there will be significant fragmentation: in each transact we write to datoms and transactions, interleaving changes (and to other tables, too). Even with WAL, this is a long way from an optimized append-only workflow

SQLite allows for attached databases, and writes to attached databases are atomic if not using WAL, and atomic but not crash-safe otherwise.

We could thus split our data, literally making our materialized datoms table and its indices a separate materialized store. (And we could do this via our transaction listener if we weren't so concerned about atomicity and speed.)

So long as the transaction to the transaction log database completes first, we can replay transacted datoms into the datoms database in the event of necessary crash recovery.

This approach would allow each database to be tuned independently, would prevent the writer's readback of the transaction log from impacting the datoms-oriented page cache used by readers, and potentially allow us to scale better.

The transactor would have a connection to both DBs, attached. Readers would typically only use a connection to the datoms database.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions