Basic timelines support by grigoryk · Pull Request #783 · mozilla/mentat

grigoryk · 2018-07-12T22:28:15Z

This PR adds support for basic timelines to the transactor along with ability to move ranges of transactions off-of the main timeline.

Default timeline for all local transactions is the main timeline, 0.
A timeline of a transaction is indicated by an additional single field (same type as e).
Moving of transactions off of the main timeline is implemented as a "rewind": assertions are reversed and transacted in a descending order in order to achieve correct state of materialized views

ncalexan

Parts of this are looking good, but there's a fundamental issue that needs to be addressed first. We'll work on that together. Can you spin off an issue for it?

ncalexan · 2018-07-13T00:25:41Z

+    fn test_timeline_move() {
+        let mut conn = TestConn::default();
+
+        assert_transact!(conn, r#"[


You can make this pass by manually allocating in the :db.part/user partition, like:

assert_eq!((65536..65539), conn.partition_map.allocate_entids(":db.part/user", 3));

(Well, pass up until the actual broken part. See below.)

ncalexan · 2018-07-13T00:29:32Z

+    }
+
+    let reversed_terms = reversed_terms_after(conn, smallest_tx - 1)?;
+    let lowest_e = match reversed_terms.last() {


Fundamentally, it's not possible to determine the set of known partitions, or the current "next index" of a partition, by examining datoms or any fixed length set of transactions. You could have:

tx 1 could allocate 1000 entids

tx 2 ... N only reference the first, say, 10 entids

tx N + 1 retracts entid 1000
You'll never "discover" the real range of valid entids.

Thus, the transactor needs to be able to tell you exactly the state of the partition map at each transaction. I've started working on this and will push something up when I can, but you can also start thinking about this, if you'd like.

…tions.

grigoryk · 2018-07-19T20:00:54Z

Tests are failing only on 1.25.0 since I'm pattern matching a reference w/ a non-reference pattern.

ncalexan

This is a review of Part 0 (nits) and Part 1 (fundamental questions).

I see what you have done, and I think it solves your problem. That's good! But I'm not sure it's the right way to think about this problem. Perhaps the input to this part of the transactor is the transaction to apply in some temporary table. For rewinding, that means reversing transactions WHERE tx = ? into temp.transaction. For regular transactions, that means turning entities into searches and populating temp.transaction. Then we figure out how to use temp.transaction to update the materialized datoms view.

Does that make sense? It's a subtle shift in world view. For historical reasons, we (I) focused on updating datoms and then bolted on writing transactions after that was done. We could turn things around.

One thing that I think might be hard with this approach is retracting from datoms: that really needs rowids to be efficient. So we might need to have temp.transaction have richer information, and that might be too expensive to "reverse". (That's something that might not be true if we managed EAVT, AEVT, etc as separate tables ourselves. Trade-offs.)

I think it would be helpful to talk this through in person, and I'll try to play more directly with the approach this afternoon.

ncalexan · 2018-07-20T17:03:13Z

+        // Can retract all of characterists of an installed attribute in one go.
+        assert_transact!(conn,
+                         "[[:db/retract 100 :db/cardinality :db.cardinality/many]
+                         [:db/retract 100 :db/valueType :db.type/long]


nit: indentation.

Everything looks aligned on my end;

ncalexan · 2018-07-20T17:04:28Z

@@ -105,7 +105,38 @@ impl MetadataReport {
 /// Returns a report summarizing the mutations that were applied.
 pub fn update_attribute_map_from_entid_triples<A, R>(attribute_map: &mut AttributeMap, assertions: A, retractions: R) -> Result<MetadataReport>


Are these always Vec, or could they always be Vec?

ncalexan · 2018-07-20T17:12:08Z

+    // Process retractions of schema attributes first. It's allowed to retract a schema attribute
+    // if all of the schema-defining schema attributes are being retracted.
+    // A defining set of attributes is :db/ident, :db/valueType, :db/cardinality.
+    let retractions: Vec<(Entid, Entid, TypedValue)> = retractions.collect();


Let's put this into the code, next to SCHEMA_SQL_LIST -- say, SCHEMA_RETRACTION_SQL_LIST.

Here you're saying :db/ident, but you're not consider :db/ident below. Why not?

ncalexan · 2018-07-20T17:16:42Z

-          R: IntoIterator<Item=(Entid, Entid, TypedValue)> {
+          R: Iterator<Item=(Entid, Entid, TypedValue)> {
+
+    // Process retractions of schema attributes first. It's allowed to retract a schema attribute


For excision, I built a little EAV trie -- see 03dd4a3#diff-929f073706d06052e936ffa3c71e523aR217. I'd like to standardize these types of filtering and searching operations around AEVTrie and EAVTrie. Take a look. I think it would make this a little more pleasant.

I tied going down that path initially, but couldn't make the ownership and lifetimes work.

ncalexan · 2018-07-20T17:17:34Z

                         "[[:db/retract 100 :db/cardinality :db.cardinality/many]]",
                         Err("bad schema assertion: Retracting attribute 8 for entity 100 not permitted."));

+        // Cannot retract a characteristic of an installed attribute.


nit: "a single characteristic", or "a lone characteristic".

ncalexan · 2018-07-20T17:21:37Z

        Ok(())
    }

+    // TODO it's not clear that we need this distinction at all (other than 'comitted' is faster).


nit: co_m_mitted.

ncalexan · 2018-07-20T17:25:46Z

+
+    // TODO: use concat! to avoid creating String instances.
+    let mut stmt = conn.prepare_cached(&sql_stmt)?;
+    let m: Result<Vec<_>> = stmt.query_and_then(&params[..], |row| -> Result<(Entid, Entid, TypedValue, bool)> {


Let's extract this to a helper function -- db::row_to_transaction_assertion. Maybe add db::row_to_datom as well (without added).

ncalexan · 2018-07-20T17:33:50Z

    TransactWatcher,
 };

+pub(crate) enum TransactorAction {


This is where the nice comment explaining what is happening in the two cases goes :)

ncalexan · 2018-07-20T17:34:49Z

    Ok(())
 }

+fn metadata_assertions(conn: &rusqlite::Connection, tx_id: Entid, comitted: bool) -> Result<Vec<(Entid, Entid, TypedValue, bool)>> {


This isn't a function with a bool parameter, it's two functions.

Splitting functions around queries (where db operation is a common bit) makes more sense than the current approach, I think.

ncalexan · 2018-07-20T17:36:04Z

+                format!(r#"
+                    SELECT e, a, v, value_type_tag, added
+                    FROM transactions
+                    WHERE tx = ? AND a IN {}


This is a table walk, since we don't have an index on transactions.tx. (Or maybe we do?) I have been thinking of using a different table format, but need to benchmark some things first, so I'm okay with this for now. (It'll get better in the newer format.)

ncalexan · 2018-07-20T17:47:19Z

    USER0,
 };

+pub static TIMELINE_MAIN: i64 = 0;


Lift this to the earlier patch.

Yeah, this stuff needs some cleanup. I was moving code across commits and things got lost in the noise.

ncalexan · 2018-07-20T17:52:36Z

+
        // TODO: store entid instead of ident for partition name.
-        r#"CREATE TABLE parts (part TEXT NOT NULL PRIMARY KEY, start INTEGER NOT NULL, end INTEGER NOT NULL, idx INTEGER NOT NULL, allow_excision SMALLINT NOT NULL)"#,
+        r#"CREATE TABLE known_parts (part TEXT NOT NULL PRIMARY KEY, start INTEGER NOT NULL, end INTEGER NOT NULL, allow_excision SMALLINT NOT NULL)"#,


If you transacted :db.part/{start,end} and, say, :db/part :db.part/{name} (and hard-coded excision, since I don't think we want to make it settable) then you could really pull this from datoms, just like we can materialize the schema on demand.

Yeah, I thought of doing that and agree that this is what we'd like ideally - but it seemed just complicated enough (given my understanding of the internals) to punt on that work right now.

This would open up a rabbit hole down the "user-defined partitions" path which I'm not ready to follow yet.

ncalexan · 2018-07-20T17:55:52Z


+/// Creates a partition map view for the main timeline based on partitions
+/// defined in 'known_parts'.
+fn create_current_partition_view(conn: &rusqlite::Connection) -> Result<()> {


I don't understand what you're trying to achieve with this patch. Why do we need the parts view at all? All we care about is materializing the partition map at read time (and potentially after rewinding). Is this better than just running the relevant query at those times? Why?

I'm using it simple as an alias for that query. The nice thing about having an actual view is that we won't need to do the work necessary to create the query every time. The downside is that whenever partitions will change (if they will, in the future) we'd need to change the view as well.

ncalexan · 2018-07-20T17:56:57Z

+
+pub static MAIN_TIMELINE: Entid = 0;
+
+fn ensure_on_tail_of(conn: &rusqlite::Connection, sorted_tx_ids: &[Entid], timeline: Entid) -> Result<()> {


Why not use Range<Entid>? Non-contiguous timeline moving is probably not a good idea.

Great suggestion! Enforcing that at an interface layer is good.

ncalexan · 2018-07-20T17:58:25Z

+    NullWatcher,
+};
+
+pub static MAIN_TIMELINE: Entid = 0;


Use the global defined in earlier patches, please.

ncalexan · 2018-07-20T18:07:07Z

+        "UPDATE timelined_transactions SET timeline = {} WHERE tx IN ({})",
+            new_timeline,
+            ::repeat_values(tx_ids.len(), 1)
+        ), &(tx_ids.iter().map(|x| x as &rusqlite::types::ToSql).collect::<Vec<_>>())


Ranges would make this nicer too.

ncalexan · 2018-07-20T18:09:24Z

    }
 }

+// TODO this seems like a sign that I'm doing something wrong...


Not really. I think this is just the Into implementation chaining TermWithoutTempIds -> TermWithTempIds -> TermWithTempIdsAndLookupRefs. Break that out as a little Pre:, please.

ncalexan · 2018-07-20T18:10:47Z

+
+    let mut terms = vec![];
+
+    while let Some(row) = rows.next() {


rows.collect() will let you turn Vec<Result> into Result<Vec>.

ncalexan · 2018-07-20T18:29:01Z

+             [?e2 :db/cardinality :db.cardinality/many]]
+        "#);
+        assert_matches!(conn.transactions(), r#"
+            [[[?e1 :db/ident :test/one ?tx1 true]


nit: indentation looks off here as well.

ncalexan · 2018-07-20T18:30:51Z

+
+        let partition_map_after_bootstrap = conn.partition_map.clone();
+
+        assert_eq!((65536..65538),


It's a shame to have to do this allocation in these tests. I have a plan to improve this, however: our matcher is rich, and knows what it matched. We just need to grow assert_matches! to also match the matcher, so that we can feed the tempids into the assert_matches! and verify that we really got what we expected. Then it's all symbolic :)

…pIds -> TermWithTempIdsAndLookupRefs

grigoryk · 2018-07-25T02:09:41Z

One thing that might be a bit dicey is making sure transactions stack up well on the target timeline.

The way we're currently thinking of this - as, essentially, a one-way scratch pad for purposes of sync - this isn't necessary. In fact, it's probably not possible to "make sure transactions stack up well", since we're moving just partial slices of the transaction log. What does it mean for one slice to be compatible (i.e. non-conflicting) with another?

Perhaps a good follow-up would be to ensure we can't move transactions onto a non-empty timeline, since there's a good chance that would be quite meaningless. That change would firm up this a little.

I'll address this in the work that will start using timelines.

grigoryk · 2018-07-25T02:11:31Z

@ncalexan I've addressed your feedback in a few separate "review" commits; take a look when you get a chance.

grigoryk · 2018-07-25T22:39:19Z

Err, tests are passing, failure on rust beta seems to be a fluke of sorts.

ncalexan

OK, this is looking really good. I think some things will evolve but I'm happy with this basically as is. Please fold your patches, incorporating my micro patch that I pushed up; and please change the file you added (db/src/timelines.rs) to have UNIX line endings (right now, it has DOS line endings). After that, bombs away!

ncalexan · 2018-07-26T22:09:51Z

 pub type AVPair = (Entid, TypedValue);

+/// Used to represent assertions and retractions.
+pub type EAV = (Entid, Entid, TypedValue);


This shouldn't be pub, right?

This is necessary for the timelines work ahead. When schema is being moved off of a main timeline, we need to be able to retract it cleanly. Retractions are only processed if the whole defining attribute set is being retracted at once (:db/ident, :db/valueType, :db/cardinality).

Normally we want to both materialize our changes (into 'datoms') as well as commit source transactions into 'transactions' table. However, when moving transactions from timeline to timeline we don't want to persist artifacts (rewind assertions), just their materializations. This patch expands the 'db' interface to allow for this split, and changes transactor's functions to take a crate-private 'action' which defines desired behaviour.

This records transactions onto a default timeline (0).

Being able to derive partition map from partition definitions and current state of the world (transactions), segmented by timelines, is useful because it lets us not worry about keeping materialized partition maps up-to-date - since there's no need for materialized partition maps at that point. This comes in very handy when we start moving chunks of transactions off of our mainline. Alternative to this work would look like materializing partition maps per timeline, growing support for incremental "backwards update" of the materialized maps, etc. Our core partitions are defined in 'known_parts' table during bootstrap, and what used to be 'parts' table is a generated view that operates over transactions to figure out partition index. 'parts' is defined for the main timeline. Querying parts for other timelines or for particular timeline+tx combinations will look similar.

grigoryk requested a review from ncalexan July 12, 2018 22:28

ncalexan reviewed Jul 13, 2018

View reviewed changes

grigoryk force-pushed the grisha/moving-timelines branch 6 times, most recently from 2160923 to 7427106 Compare July 18, 2018 01:03

Pre: Differentiate bad attribute retractions from unrecognized retrac…

ec5ab96

…tions.

grigoryk force-pushed the grisha/moving-timelines branch 5 times, most recently from 68ca62d to 8f981eb Compare July 19, 2018 00:24

grigoryk changed the title ~~WIP moving transactions off of main timeline~~ Basic timelines support Jul 19, 2018

grigoryk force-pushed the grisha/moving-timelines branch from 8f981eb to 4df29da Compare July 19, 2018 00:29

ncalexan reviewed Jul 20, 2018

View reviewed changes

Pre: 'Into' implementation chaining TermWithoutTempIds -> TermWithTem…

8d054ae

…pIds -> TermWithTempIdsAndLookupRefs

grigoryk mentioned this pull request Jul 21, 2018

Expand internal representation of an Attribute to represent idea of absence #796

Open

grigoryk force-pushed the grisha/moving-timelines branch 4 times, most recently from bb07064 to e75805c Compare July 25, 2018 01:53

grigoryk force-pushed the grisha/moving-timelines branch from e75805c to ea3a063 Compare July 25, 2018 22:07

grigoryk mentioned this pull request Jul 25, 2018

Basic sync support #563

Merged

grigoryk force-pushed the grisha/moving-timelines branch from ea3a063 to e589650 Compare July 26, 2018 21:29

ncalexan self-requested a review July 26, 2018 22:02

ncalexan approved these changes Jul 26, 2018

View reviewed changes

grigoryk force-pushed the grisha/moving-timelines branch from f865b07 to eb875eb Compare July 26, 2018 23:31

Grisha Kruglov added 5 commits July 26, 2018 17:13

Part 2: Add basic support for timelines to the transactor

5cc4e63

This records transactions onto a default timeline (0).

Part 4: Add support for moving transactions off of main timeline

65d4b95

grigoryk force-pushed the grisha/moving-timelines branch from eb875eb to 65d4b95 Compare July 27, 2018 00:13

grigoryk merged commit 536d40a into master Jul 27, 2018

grigoryk deleted the grisha/moving-timelines branch July 27, 2018 00:14

grigoryk mentioned this pull request Jul 27, 2018

Representing transformation events and timelines #594

Open

		@@ -105,7 +105,38 @@ impl MetadataReport {
		/// Returns a report summarizing the mutations that were applied.
		pub fn update_attribute_map_from_entid_triples<A, R>(attribute_map: &mut AttributeMap, assertions: A, retractions: R) -> Result<MetadataReport>


		pub static MAIN_TIMELINE: Entid = 0;

		fn ensure_on_tail_of(conn: &rusqlite::Connection, sorted_tx_ids: &[Entid], timeline: Entid) -> Result<()> {


		let partition_map_after_bootstrap = conn.partition_map.clone();

		assert_eq!((65536..65538),

Conversation

grigoryk commented Jul 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ncalexan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grigoryk commented Jul 19, 2018

Uh oh!

ncalexan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grigoryk commented Jul 25, 2018

Uh oh!

grigoryk commented Jul 25, 2018

Uh oh!

grigoryk commented Jul 25, 2018

Uh oh!

ncalexan left a comment

Choose a reason for hiding this comment

grigoryk commented Jul 12, 2018 •

edited

Loading