Basic timelines support#783
Conversation
ncalexan
left a comment
There was a problem hiding this comment.
Parts of this are looking good, but there's a fundamental issue that needs to be addressed first. We'll work on that together. Can you spin off an issue for it?
| fn test_timeline_move() { | ||
| let mut conn = TestConn::default(); | ||
|
|
||
| assert_transact!(conn, r#"[ |
There was a problem hiding this comment.
You can make this pass by manually allocating in the :db.part/user partition, like:
assert_eq!((65536..65539),
conn.partition_map.allocate_entids(":db.part/user", 3));(Well, pass up until the actual broken part. See below.)
| } | ||
|
|
||
| let reversed_terms = reversed_terms_after(conn, smallest_tx - 1)?; | ||
| let lowest_e = match reversed_terms.last() { |
There was a problem hiding this comment.
Fundamentally, it's not possible to determine the set of known partitions, or the current "next index" of a partition, by examining datoms or any fixed length set of transactions. You could have:
- tx 1 could allocate 1000 entids
- tx 2 ... N only reference the first, say, 10 entids
- tx N + 1 retracts entid 1000
You'll never "discover" the real range of valid entids.
Thus, the transactor needs to be able to tell you exactly the state of the partition map at each transaction. I've started working on this and will push something up when I can, but you can also start thinking about this, if you'd like.
2160923 to
7427106
Compare
68ca62d to
8f981eb
Compare
8f981eb to
4df29da
Compare
|
Tests are failing only on 1.25.0 since I'm pattern matching a reference w/ a non-reference pattern. |
ncalexan
left a comment
There was a problem hiding this comment.
This is a review of Part 0 (nits) and Part 1 (fundamental questions).
I see what you have done, and I think it solves your problem. That's good! But I'm not sure it's the right way to think about this problem. Perhaps the input to this part of the transactor is the transaction to apply in some temporary table. For rewinding, that means reversing transactions WHERE tx = ? into temp.transaction. For regular transactions, that means turning entities into searches and populating temp.transaction. Then we figure out how to use temp.transaction to update the materialized datoms view.
Does that make sense? It's a subtle shift in world view. For historical reasons, we (I) focused on updating datoms and then bolted on writing transactions after that was done. We could turn things around.
One thing that I think might be hard with this approach is retracting from datoms: that really needs rowids to be efficient. So we might need to have temp.transaction have richer information, and that might be too expensive to "reverse". (That's something that might not be true if we managed EAVT, AEVT, etc as separate tables ourselves. Trade-offs.)
I think it would be helpful to talk this through in person, and I'll try to play more directly with the approach this afternoon.
| // Can retract all of characterists of an installed attribute in one go. | ||
| assert_transact!(conn, | ||
| "[[:db/retract 100 :db/cardinality :db.cardinality/many] | ||
| [:db/retract 100 :db/valueType :db.type/long] |
There was a problem hiding this comment.
Everything looks aligned on my end;
| @@ -105,7 +105,38 @@ impl MetadataReport { | |||
| /// Returns a report summarizing the mutations that were applied. | |||
| pub fn update_attribute_map_from_entid_triples<A, R>(attribute_map: &mut AttributeMap, assertions: A, retractions: R) -> Result<MetadataReport> | |||
There was a problem hiding this comment.
Are these always Vec, or could they always be Vec?
| // Process retractions of schema attributes first. It's allowed to retract a schema attribute | ||
| // if all of the schema-defining schema attributes are being retracted. | ||
| // A defining set of attributes is :db/ident, :db/valueType, :db/cardinality. | ||
| let retractions: Vec<(Entid, Entid, TypedValue)> = retractions.collect(); |
There was a problem hiding this comment.
Let's put this into the code, next to SCHEMA_SQL_LIST -- say, SCHEMA_RETRACTION_SQL_LIST.
There was a problem hiding this comment.
Here you're saying :db/ident, but you're not consider :db/ident below. Why not?
| R: IntoIterator<Item=(Entid, Entid, TypedValue)> { | ||
| R: Iterator<Item=(Entid, Entid, TypedValue)> { | ||
|
|
||
| // Process retractions of schema attributes first. It's allowed to retract a schema attribute |
There was a problem hiding this comment.
For excision, I built a little EAV trie -- see 03dd4a3#diff-929f073706d06052e936ffa3c71e523aR217. I'd like to standardize these types of filtering and searching operations around AEVTrie and EAVTrie. Take a look. I think it would make this a little more pleasant.
There was a problem hiding this comment.
I tied going down that path initially, but couldn't make the ownership and lifetimes work.
| "[[:db/retract 100 :db/cardinality :db.cardinality/many]]", | ||
| Err("bad schema assertion: Retracting attribute 8 for entity 100 not permitted.")); | ||
|
|
||
| // Cannot retract a characteristic of an installed attribute. |
There was a problem hiding this comment.
nit: "a single characteristic", or "a lone characteristic".
| Ok(()) | ||
| } | ||
|
|
||
| // TODO it's not clear that we need this distinction at all (other than 'comitted' is faster). |
|
|
||
| // TODO: use concat! to avoid creating String instances. | ||
| let mut stmt = conn.prepare_cached(&sql_stmt)?; | ||
| let m: Result<Vec<_>> = stmt.query_and_then(¶ms[..], |row| -> Result<(Entid, Entid, TypedValue, bool)> { |
There was a problem hiding this comment.
Let's extract this to a helper function -- db::row_to_transaction_assertion. Maybe add db::row_to_datom as well (without added).
| TransactWatcher, | ||
| }; | ||
|
|
||
| pub(crate) enum TransactorAction { |
There was a problem hiding this comment.
This is where the nice comment explaining what is happening in the two cases goes :)
| Ok(()) | ||
| } | ||
|
|
||
| fn metadata_assertions(conn: &rusqlite::Connection, tx_id: Entid, comitted: bool) -> Result<Vec<(Entid, Entid, TypedValue, bool)>> { |
There was a problem hiding this comment.
This isn't a function with a bool parameter, it's two functions.
There was a problem hiding this comment.
Splitting functions around queries (where db operation is a common bit) makes more sense than the current approach, I think.
| format!(r#" | ||
| SELECT e, a, v, value_type_tag, added | ||
| FROM transactions | ||
| WHERE tx = ? AND a IN {} |
There was a problem hiding this comment.
This is a table walk, since we don't have an index on transactions.tx. (Or maybe we do?) I have been thinking of using a different table format, but need to benchmark some things first, so I'm okay with this for now. (It'll get better in the newer format.)
| USER0, | ||
| }; | ||
|
|
||
| pub static TIMELINE_MAIN: i64 = 0; |
There was a problem hiding this comment.
Lift this to the earlier patch.
There was a problem hiding this comment.
Yeah, this stuff needs some cleanup. I was moving code across commits and things got lost in the noise.
|
|
||
| // TODO: store entid instead of ident for partition name. | ||
| r#"CREATE TABLE parts (part TEXT NOT NULL PRIMARY KEY, start INTEGER NOT NULL, end INTEGER NOT NULL, idx INTEGER NOT NULL, allow_excision SMALLINT NOT NULL)"#, | ||
| r#"CREATE TABLE known_parts (part TEXT NOT NULL PRIMARY KEY, start INTEGER NOT NULL, end INTEGER NOT NULL, allow_excision SMALLINT NOT NULL)"#, |
There was a problem hiding this comment.
If you transacted :db.part/{start,end} and, say, :db/part :db.part/{name} (and hard-coded excision, since I don't think we want to make it settable) then you could really pull this from datoms, just like we can materialize the schema on demand.
There was a problem hiding this comment.
Yeah, I thought of doing that and agree that this is what we'd like ideally - but it seemed just complicated enough (given my understanding of the internals) to punt on that work right now.
This would open up a rabbit hole down the "user-defined partitions" path which I'm not ready to follow yet.
|
|
||
| /// Creates a partition map view for the main timeline based on partitions | ||
| /// defined in 'known_parts'. | ||
| fn create_current_partition_view(conn: &rusqlite::Connection) -> Result<()> { |
There was a problem hiding this comment.
I don't understand what you're trying to achieve with this patch. Why do we need the parts view at all? All we care about is materializing the partition map at read time (and potentially after rewinding). Is this better than just running the relevant query at those times? Why?
There was a problem hiding this comment.
I'm using it simple as an alias for that query. The nice thing about having an actual view is that we won't need to do the work necessary to create the query every time. The downside is that whenever partitions will change (if they will, in the future) we'd need to change the view as well.
|
|
||
| pub static MAIN_TIMELINE: Entid = 0; | ||
|
|
||
| fn ensure_on_tail_of(conn: &rusqlite::Connection, sorted_tx_ids: &[Entid], timeline: Entid) -> Result<()> { |
There was a problem hiding this comment.
Why not use Range<Entid>? Non-contiguous timeline moving is probably not a good idea.
There was a problem hiding this comment.
Great suggestion! Enforcing that at an interface layer is good.
| NullWatcher, | ||
| }; | ||
|
|
||
| pub static MAIN_TIMELINE: Entid = 0; |
There was a problem hiding this comment.
Use the global defined in earlier patches, please.
| "UPDATE timelined_transactions SET timeline = {} WHERE tx IN ({})", | ||
| new_timeline, | ||
| ::repeat_values(tx_ids.len(), 1) | ||
| ), &(tx_ids.iter().map(|x| x as &rusqlite::types::ToSql).collect::<Vec<_>>()) |
There was a problem hiding this comment.
Ranges would make this nicer too.
| } | ||
| } | ||
|
|
||
| // TODO this seems like a sign that I'm doing something wrong... |
There was a problem hiding this comment.
Not really. I think this is just the Into implementation chaining TermWithoutTempIds -> TermWithTempIds -> TermWithTempIdsAndLookupRefs. Break that out as a little Pre:, please.
|
|
||
| let mut terms = vec![]; | ||
|
|
||
| while let Some(row) = rows.next() { |
There was a problem hiding this comment.
rows.collect() will let you turn Vec<Result> into Result<Vec>.
| [?e2 :db/cardinality :db.cardinality/many]] | ||
| "#); | ||
| assert_matches!(conn.transactions(), r#" | ||
| [[[?e1 :db/ident :test/one ?tx1 true] |
There was a problem hiding this comment.
nit: indentation looks off here as well.
|
|
||
| let partition_map_after_bootstrap = conn.partition_map.clone(); | ||
|
|
||
| assert_eq!((65536..65538), |
There was a problem hiding this comment.
It's a shame to have to do this allocation in these tests. I have a plan to improve this, however: our matcher is rich, and knows what it matched. We just need to grow assert_matches! to also match the matcher, so that we can feed the tempids into the assert_matches! and verify that we really got what we expected. Then it's all symbolic :)
…pIds -> TermWithTempIdsAndLookupRefs
bb07064 to
e75805c
Compare
|
One thing that might be a bit dicey is making sure transactions stack up well on the target timeline. The way we're currently thinking of this - as, essentially, a one-way scratch pad for purposes of sync - this isn't necessary. In fact, it's probably not possible to "make sure transactions stack up well", since we're moving just partial slices of the transaction log. What does it mean for one slice to be compatible (i.e. non-conflicting) with another? Perhaps a good follow-up would be to ensure we can't move transactions onto a non-empty timeline, since there's a good chance that would be quite meaningless. That change would firm up this a little. I'll address this in the work that will start using timelines. |
|
@ncalexan I've addressed your feedback in a few separate "review" commits; take a look when you get a chance. |
e75805c to
ea3a063
Compare
|
Err, tests are passing, failure on |
ea3a063 to
e589650
Compare
ncalexan
left a comment
There was a problem hiding this comment.
OK, this is looking really good. I think some things will evolve but I'm happy with this basically as is. Please fold your patches, incorporating my micro patch that I pushed up; and please change the file you added (db/src/timelines.rs) to have UNIX line endings (right now, it has DOS line endings). After that, bombs away!
| pub type AVPair = (Entid, TypedValue); | ||
|
|
||
| /// Used to represent assertions and retractions. | ||
| pub type EAV = (Entid, Entid, TypedValue); |
f865b07 to
eb875eb
Compare
This is necessary for the timelines work ahead. When schema is being moved off of a main timeline, we need to be able to retract it cleanly. Retractions are only processed if the whole defining attribute set is being retracted at once (:db/ident, :db/valueType, :db/cardinality).
Normally we want to both materialize our changes (into 'datoms') as well as commit source transactions into 'transactions' table. However, when moving transactions from timeline to timeline we don't want to persist artifacts (rewind assertions), just their materializations. This patch expands the 'db' interface to allow for this split, and changes transactor's functions to take a crate-private 'action' which defines desired behaviour.
This records transactions onto a default timeline (0).
Being able to derive partition map from partition definitions and current state of the world (transactions), segmented by timelines, is useful because it lets us not worry about keeping materialized partition maps up-to-date - since there's no need for materialized partition maps at that point. This comes in very handy when we start moving chunks of transactions off of our mainline. Alternative to this work would look like materializing partition maps per timeline, growing support for incremental "backwards update" of the materialized maps, etc. Our core partitions are defined in 'known_parts' table during bootstrap, and what used to be 'parts' table is a generated view that operates over transactions to figure out partition index. 'parts' is defined for the main timeline. Querying parts for other timelines or for particular timeline+tx combinations will look similar.
eb875eb to
65d4b95
Compare
This PR adds support for basic timelines to the transactor along with ability to move ranges of transactions off-of the
main timeline.main timeline,0.transactionis indicated by an additional single field (same type ase).