Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Basic timelines support#783

Merged
grigoryk merged 7 commits into
masterfrom
grisha/moving-timelines
Jul 27, 2018
Merged

Basic timelines support#783
grigoryk merged 7 commits into
masterfrom
grisha/moving-timelines

Conversation

@grigoryk
Copy link
Copy Markdown
Contributor

@grigoryk grigoryk commented Jul 12, 2018

This PR adds support for basic timelines to the transactor along with ability to move ranges of transactions off-of the main timeline.

  • Default timeline for all local transactions is the main timeline, 0.
  • A timeline of a transaction is indicated by an additional single field (same type as e).
  • Moving of transactions off of the main timeline is implemented as a "rewind": assertions are reversed and transacted in a descending order in order to achieve correct state of materialized views

@grigoryk grigoryk requested a review from ncalexan July 12, 2018 22:28
Copy link
Copy Markdown
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts of this are looking good, but there's a fundamental issue that needs to be addressed first. We'll work on that together. Can you spin off an issue for it?

Comment thread db/src/db.rs Outdated
fn test_timeline_move() {
let mut conn = TestConn::default();

assert_transact!(conn, r#"[
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this pass by manually allocating in the :db.part/user partition, like:

        assert_eq!((65536..65539),
                   conn.partition_map.allocate_entids(":db.part/user", 3));

(Well, pass up until the actual broken part. See below.)

Comment thread db/src/timelines.rs Outdated
}

let reversed_terms = reversed_terms_after(conn, smallest_tx - 1)?;
let lowest_e = match reversed_terms.last() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, it's not possible to determine the set of known partitions, or the current "next index" of a partition, by examining datoms or any fixed length set of transactions. You could have:

  • tx 1 could allocate 1000 entids
  • tx 2 ... N only reference the first, say, 10 entids
  • tx N + 1 retracts entid 1000
    You'll never "discover" the real range of valid entids.

Thus, the transactor needs to be able to tell you exactly the state of the partition map at each transaction. I've started working on this and will push something up when I can, but you can also start thinking about this, if you'd like.

@grigoryk grigoryk force-pushed the grisha/moving-timelines branch 6 times, most recently from 2160923 to 7427106 Compare July 18, 2018 01:03
@grigoryk grigoryk force-pushed the grisha/moving-timelines branch 5 times, most recently from 68ca62d to 8f981eb Compare July 19, 2018 00:24
@grigoryk grigoryk changed the title WIP moving transactions off of main timeline Basic timelines support Jul 19, 2018
@grigoryk grigoryk force-pushed the grisha/moving-timelines branch from 8f981eb to 4df29da Compare July 19, 2018 00:29
@grigoryk
Copy link
Copy Markdown
Contributor Author

Tests are failing only on 1.25.0 since I'm pattern matching a reference w/ a non-reference pattern.

Copy link
Copy Markdown
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a review of Part 0 (nits) and Part 1 (fundamental questions).

I see what you have done, and I think it solves your problem. That's good! But I'm not sure it's the right way to think about this problem. Perhaps the input to this part of the transactor is the transaction to apply in some temporary table. For rewinding, that means reversing transactions WHERE tx = ? into temp.transaction. For regular transactions, that means turning entities into searches and populating temp.transaction. Then we figure out how to use temp.transaction to update the materialized datoms view.

Does that make sense? It's a subtle shift in world view. For historical reasons, we (I) focused on updating datoms and then bolted on writing transactions after that was done. We could turn things around.

One thing that I think might be hard with this approach is retracting from datoms: that really needs rowids to be efficient. So we might need to have temp.transaction have richer information, and that might be too expensive to "reverse". (That's something that might not be true if we managed EAVT, AEVT, etc as separate tables ourselves. Trade-offs.)

I think it would be helpful to talk this through in person, and I'll try to play more directly with the approach this afternoon.

Comment thread db/src/db.rs
// Can retract all of characterists of an installed attribute in one go.
assert_transact!(conn,
"[[:db/retract 100 :db/cardinality :db.cardinality/many]
[:db/retract 100 :db/valueType :db.type/long]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks aligned on my end;

Comment thread db/src/metadata.rs Outdated
@@ -105,7 +105,38 @@ impl MetadataReport {
/// Returns a report summarizing the mutations that were applied.
pub fn update_attribute_map_from_entid_triples<A, R>(attribute_map: &mut AttributeMap, assertions: A, retractions: R) -> Result<MetadataReport>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these always Vec, or could they always be Vec?

Comment thread db/src/metadata.rs Outdated
// Process retractions of schema attributes first. It's allowed to retract a schema attribute
// if all of the schema-defining schema attributes are being retracted.
// A defining set of attributes is :db/ident, :db/valueType, :db/cardinality.
let retractions: Vec<(Entid, Entid, TypedValue)> = retractions.collect();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this into the code, next to SCHEMA_SQL_LIST -- say, SCHEMA_RETRACTION_SQL_LIST.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're saying :db/ident, but you're not consider :db/ident below. Why not?

Comment thread db/src/metadata.rs
R: IntoIterator<Item=(Entid, Entid, TypedValue)> {
R: Iterator<Item=(Entid, Entid, TypedValue)> {

// Process retractions of schema attributes first. It's allowed to retract a schema attribute
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For excision, I built a little EAV trie -- see 03dd4a3#diff-929f073706d06052e936ffa3c71e523aR217. I'd like to standardize these types of filtering and searching operations around AEVTrie and EAVTrie. Take a look. I think it would make this a little more pleasant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tied going down that path initially, but couldn't make the ownership and lifetimes work.

Comment thread db/src/db.rs Outdated
"[[:db/retract 100 :db/cardinality :db.cardinality/many]]",
Err("bad schema assertion: Retracting attribute 8 for entity 100 not permitted."));

// Cannot retract a characteristic of an installed attribute.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "a single characteristic", or "a lone characteristic".

Comment thread db/src/db.rs Outdated
Ok(())
}

// TODO it's not clear that we need this distinction at all (other than 'comitted' is faster).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: co_m_mitted.

Comment thread db/src/db.rs Outdated

// TODO: use concat! to avoid creating String instances.
let mut stmt = conn.prepare_cached(&sql_stmt)?;
let m: Result<Vec<_>> = stmt.query_and_then(&params[..], |row| -> Result<(Entid, Entid, TypedValue, bool)> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's extract this to a helper function -- db::row_to_transaction_assertion. Maybe add db::row_to_datom as well (without added).

Comment thread db/src/tx.rs
TransactWatcher,
};

pub(crate) enum TransactorAction {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the nice comment explaining what is happening in the two cases goes :)

Comment thread db/src/db.rs Outdated
Ok(())
}

fn metadata_assertions(conn: &rusqlite::Connection, tx_id: Entid, comitted: bool) -> Result<Vec<(Entid, Entid, TypedValue, bool)>> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a function with a bool parameter, it's two functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting functions around queries (where db operation is a common bit) makes more sense than the current approach, I think.

Comment thread db/src/db.rs Outdated
format!(r#"
SELECT e, a, v, value_type_tag, added
FROM transactions
WHERE tx = ? AND a IN {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a table walk, since we don't have an index on transactions.tx. (Or maybe we do?) I have been thinking of using a different table format, but need to benchmark some things first, so I'm okay with this for now. (It'll get better in the newer format.)

Comment thread db/src/lib.rs
USER0,
};

pub static TIMELINE_MAIN: i64 = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lift this to the earlier patch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this stuff needs some cleanup. I was moving code across commits and things got lost in the noise.

Comment thread db/src/db.rs

// TODO: store entid instead of ident for partition name.
r#"CREATE TABLE parts (part TEXT NOT NULL PRIMARY KEY, start INTEGER NOT NULL, end INTEGER NOT NULL, idx INTEGER NOT NULL, allow_excision SMALLINT NOT NULL)"#,
r#"CREATE TABLE known_parts (part TEXT NOT NULL PRIMARY KEY, start INTEGER NOT NULL, end INTEGER NOT NULL, allow_excision SMALLINT NOT NULL)"#,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you transacted :db.part/{start,end} and, say, :db/part :db.part/{name} (and hard-coded excision, since I don't think we want to make it settable) then you could really pull this from datoms, just like we can materialize the schema on demand.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought of doing that and agree that this is what we'd like ideally - but it seemed just complicated enough (given my understanding of the internals) to punt on that work right now.

This would open up a rabbit hole down the "user-defined partitions" path which I'm not ready to follow yet.

Comment thread db/src/db.rs

/// Creates a partition map view for the main timeline based on partitions
/// defined in 'known_parts'.
fn create_current_partition_view(conn: &rusqlite::Connection) -> Result<()> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're trying to achieve with this patch. Why do we need the parts view at all? All we care about is materializing the partition map at read time (and potentially after rewinding). Is this better than just running the relevant query at those times? Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using it simple as an alias for that query. The nice thing about having an actual view is that we won't need to do the work necessary to create the query every time. The downside is that whenever partitions will change (if they will, in the future) we'd need to change the view as well.

Comment thread db/src/timelines.rs Outdated

pub static MAIN_TIMELINE: Entid = 0;

fn ensure_on_tail_of(conn: &rusqlite::Connection, sorted_tx_ids: &[Entid], timeline: Entid) -> Result<()> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Range<Entid>? Non-contiguous timeline moving is probably not a good idea.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Enforcing that at an interface layer is good.

Comment thread db/src/timelines.rs Outdated
NullWatcher,
};

pub static MAIN_TIMELINE: Entid = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the global defined in earlier patches, please.

Comment thread db/src/timelines.rs Outdated
"UPDATE timelined_transactions SET timeline = {} WHERE tx IN ({})",
new_timeline,
::repeat_values(tx_ids.len(), 1)
), &(tx_ids.iter().map(|x| x as &rusqlite::types::ToSql).collect::<Vec<_>>())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ranges would make this nicer too.

Comment thread db/src/internal_types.rs Outdated
}
}

// TODO this seems like a sign that I'm doing something wrong...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I think this is just the Into implementation chaining TermWithoutTempIds -> TermWithTempIds -> TermWithTempIdsAndLookupRefs. Break that out as a little Pre:, please.

Comment thread db/src/timelines.rs Outdated

let mut terms = vec![];

while let Some(row) = rows.next() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rows.collect() will let you turn Vec<Result> into Result<Vec>.

Comment thread db/src/timelines.rs Outdated
[?e2 :db/cardinality :db.cardinality/many]]
"#);
assert_matches!(conn.transactions(), r#"
[[[?e1 :db/ident :test/one ?tx1 true]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation looks off here as well.

Comment thread db/src/timelines.rs Outdated

let partition_map_after_bootstrap = conn.partition_map.clone();

assert_eq!((65536..65538),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame to have to do this allocation in these tests. I have a plan to improve this, however: our matcher is rich, and knows what it matched. We just need to grow assert_matches! to also match the matcher, so that we can feed the tempids into the assert_matches! and verify that we really got what we expected. Then it's all symbolic :)

@grigoryk
Copy link
Copy Markdown
Contributor Author

One thing that might be a bit dicey is making sure transactions stack up well on the target timeline.

The way we're currently thinking of this - as, essentially, a one-way scratch pad for purposes of sync - this isn't necessary. In fact, it's probably not possible to "make sure transactions stack up well", since we're moving just partial slices of the transaction log. What does it mean for one slice to be compatible (i.e. non-conflicting) with another?

Perhaps a good follow-up would be to ensure we can't move transactions onto a non-empty timeline, since there's a good chance that would be quite meaningless. That change would firm up this a little.

I'll address this in the work that will start using timelines.

@grigoryk
Copy link
Copy Markdown
Contributor Author

@ncalexan I've addressed your feedback in a few separate "review" commits; take a look when you get a chance.

@grigoryk grigoryk force-pushed the grisha/moving-timelines branch from e75805c to ea3a063 Compare July 25, 2018 22:07
@grigoryk grigoryk mentioned this pull request Jul 25, 2018
@grigoryk
Copy link
Copy Markdown
Contributor Author

Err, tests are passing, failure on rust beta seems to be a fluke of sorts.

@grigoryk grigoryk force-pushed the grisha/moving-timelines branch from ea3a063 to e589650 Compare July 26, 2018 21:29
@ncalexan ncalexan self-requested a review July 26, 2018 22:02
Copy link
Copy Markdown
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is looking really good. I think some things will evolve but I'm happy with this basically as is. Please fold your patches, incorporating my micro patch that I pushed up; and please change the file you added (db/src/timelines.rs) to have UNIX line endings (right now, it has DOS line endings). After that, bombs away!

Comment thread db/src/types.rs Outdated
pub type AVPair = (Entid, TypedValue);

/// Used to represent assertions and retractions.
pub type EAV = (Entid, Entid, TypedValue);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be pub, right?

@grigoryk grigoryk force-pushed the grisha/moving-timelines branch from f865b07 to eb875eb Compare July 26, 2018 23:31
Grisha Kruglov added 5 commits July 26, 2018 17:13
This is necessary for the timelines work ahead. When schema is being
moved off of a main timeline, we need to be able to retract it cleanly.

Retractions are only processed if the whole defining attribute set
is being retracted at once (:db/ident, :db/valueType, :db/cardinality).
Normally we want to both materialize our changes (into 'datoms')
as well as commit source transactions into 'transactions' table.

However, when moving transactions from timeline to timeline
we don't want to persist artifacts (rewind assertions), just their
materializations.

This patch expands the 'db' interface to allow for this split,
and changes transactor's functions to take a crate-private 'action'
which defines desired behaviour.
This records transactions onto a default timeline (0).
Being able to derive partition map from partition definitions and current
state of the world (transactions), segmented by timelines, is useful
because it lets us not worry about keeping materialized partition maps
up-to-date - since there's no need for materialized partition maps at that point.

This comes in very handy when we start moving chunks of transactions off of our mainline.
Alternative to this work would look like materializing partition maps per timeline,
growing support for incremental "backwards update" of the materialized maps, etc.

Our core partitions are defined in 'known_parts' table during bootstrap,
and what used to be 'parts' table is a generated view that operates over
transactions to figure out partition index.

'parts' is defined for the main timeline. Querying parts for other timelines
or for particular timeline+tx combinations will look similar.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants