Implement upsert resolution algorithm. by ncalexan · Pull Request #283 · mozilla/mentat

ncalexan · 2017-02-10T17:27:18Z

This PR will end up implementing #184. Right now it's just a place-holder for some issues I want to file.

ncalexan · 2017-02-12T22:25:49Z

@rnewman, @jsantell this is ready for initial review. There are some loose ends like collecting tempids (and testing them) and several follow up tickets (supporting lookup refs, map notation). But it's best to get some review and hopefully get this stuff landed to get other people involved.

ncalexan · 2017-02-12T22:36:02Z

It's worth mentioning that I do have a vision for the relationship between DB, Tx, and (future) Conn, but I'm not close enough to the implementation to worry too much about getting it right just yet. So Tx and DB remain inter-twingled for now; the division of responsibilities will become clearer in time.

rnewman

In-progress.

rnewman · 2017-02-13T18:53:51Z

 use mentat_tx::entities::{Entity, OpType};
 use errors::{ErrorKind, Result, ResultExt};
-use types::{Attribute, AttributeBitFlags, DB, Entid, IdentMap, Partition, PartitionMap, Schema, TypedValue, ValueType};
+use types::{AVMap, AVPair, Attribute, AttributeBitFlags, DB, Entid, IdentMap, Partition, PartitionMap, Schema, TypedValue, TxReport, ValueType};


Perhaps

{ Break, This, Line, };

?

Doesn't that just replace a "too wide" problem with a "too high" problem? Or maybe I'm taking you to literally?

use stuff::{ Long, List, Of, Things, To, Import, Split, Over, Several, Lines, };

?

It leaves us in Java's import situation: a sorted line-list with one diff line per insert or removal. Not the most compact representation, but good at minimally and clearly representing change over time.

I'm happy to push these onto multiple lines for clean diffs. I'll do it throughout.

rnewman · 2017-02-13T18:54:28Z


-    let bootstrap_db = DB::new(bootstrap_partition_map, bootstrap::bootstrap_schema());
-    bootstrap_db.transact_internal(&tx, &bootstrap::bootstrap_entities()[..], bootstrap::TX0, now())?;
+    // TODO: return to transact_internal to self manage the encompassing SQLite transaction.


Nit: "self-manage"

rnewman · 2017-02-13T18:55:16Z

+        }).collect();
+
+        // TODO: only cache the latest of these statements.  Changing the set of partitions isn't
+        // supported in the Clojure implementationat all, and might not be supported in Mentat soon,


Nit: "implementation at"

rnewman · 2017-02-13T18:58:37Z

+    }
+
+    /// Allocate a single fresh entid in the given `partition`.
+    fn allocate_entid(&mut self, partition: String) -> i64 {


Can you use a &str here instead of a String? After all, you can't allocate an entid in a partition that isn't already in the partition map…

rnewman · 2017-02-13T18:58:47Z

+    }
+
+    /// Allocate `n` fresh entids in the given `partition`.
+    fn allocate_entids(&mut self, partition: String, n: usize) -> Range<i64> {


rnewman · 2017-02-13T18:59:04Z

+    // TODO: move this to the transactor layer.
+    pub fn transact(&mut self, conn: &rusqlite::Connection, entities: &[Entity]) -> Result<TxReport> {
+        let tx_instant = now(); // Label the transaction with the timestamp when we first see it: leading edge.
+        let tx = self.allocate_entid(":db.part/tx".to_string());


rnewman · 2017-02-13T19:09:59Z

    /// This approach is explained in https://github.com/mozilla/mentat/wiki/Transacting.
    // TODO: move this to the transactor layer.
-    pub fn transact_internal(&self, conn: &rusqlite::Connection, entities: &[Entity], tx_id: Entid, tx_instant: i64) -> Result<TxReport> {
+    pub fn transact_internal(&mut self, conn: &rusqlite::Connection, entities: &[Entity], tx_id: Entid, tx_instant: i64) -> Result<TxReport> {


This is indeed starting to solidify on a wonky abstraction: transact_internal should take a reference to a DB as input, and return a report that refers to a (potentially changed) DB, no? Or, rather, altering the current internal DB of the conn is a side-effect.

Its contract is that the DB must match the contents of the database, so it can't take just any DB…

I see you commented further, but I think a duality in many Rust expressions:

&mut self -> () and caller owns coordination;

&self -> Self and callee owns coordination.

I elected for the former since I expect to build a Conn to manage the coordination in #296.

rnewman · 2017-02-13T19:15:07Z

+/// A transaction on its way to being applied.
+#[derive(Debug)]
+pub struct Tx<'conn> {
+    /// The metadata to use to interpret the transaction entities with.


Ah, gotcha. I just didn't get this far yet.

rnewman

I'd like to see the domain concepts firm up sooner rather than later; having a mutating transact operation on a db seems super wrong…

rnewman · 2017-02-13T19:15:30Z

+    /// The transaction ID of the transaction.
+    pub tx_id: Entid,
+
+    /// The timestamp when the transaction began to be commited.


"committed"

Fixed, here and another place.

rnewman · 2017-02-13T19:15:47Z

+
+impl<'conn> Tx<'conn> {
+    pub fn new(db: &'conn mut DB, conn: &'conn rusqlite::Connection, tx_id: Entid, tx_instant: i64) -> Tx<'conn> {
+        Tx { db: db,


Newline after {

rnewman · 2017-02-13T19:15:59Z

+        Tx { db: db,
+             conn: conn,
+             tx_id: tx_id,
+             tx_instant: tx_instant }


Trailing comma and } on its own line

rnewman · 2017-02-13T19:18:18Z

-                    let added = false;
+    /// Update the current partition map materialized view.
+    // TODO: only update changed partitions.
+    pub fn update_partition_map(&self, conn: &rusqlite::Connection) -> Result<()> {


Is it sensible to use &mut rusqlite::Connection here as a signal that this is a mutating operation?

Indeed, only one thread should ever be using the same Connection, so perhaps we want type SQLite = &mut rusqlite::Connection;…?

I haven't really worked out the details, but I think there's an entirely different division of responsibilities in the future. Right now, the data structures DB and Tx "own" the work of updating the SQLite store and processing an incoming transaction. My vision is that there's a split between the "updating the SQLite gubbins" and "term rewriting, resolution, etc" on top.

Following your suggestion, we might indeed define a type

type SQLiteConnection = &mut rusqlite::Connection;

and a trait that defines the "low level interface", like

trait ConcreteRepresentation { pub fn lookup_avs(...); pub fn insert_non_fts_one(...); ... }

Then, we might

impl ConcreteRepresentation for SQLiteConnection { pub fn lookup_avs(...) { // Using SQLite-specific features, etc. ... }

That would separate the DB type from the implementation details that are specific to SQLite. You could imagine that some ConcreteRepresentation implementations might not support fulltext indexing at all; or that there's an implementation that uses Postgres or some other backing store.

Does that seem sensible?

I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported.

This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor.

This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193.

This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet.

jaredhirsch added the in progress label Feb 10, 2017

ncalexan mentioned this pull request Feb 10, 2017

[tx] Accept vector values for :db/cardinality :db.cardinality/many attributes #284

Closed

ncalexan force-pushed the upsert-resolution branch from 23ea4c9 to 6831925 Compare February 12, 2017 22:22

ncalexan requested review from jsantell and rnewman February 12, 2017 22:23

rnewman reviewed Feb 13, 2017

View reviewed changes

rnewman approved these changes Feb 13, 2017

View reviewed changes

ncalexan added 13 commits February 14, 2017 14:42

Pre: Implement batch [a v] pair lookup.

0eaa822

Pre: Add InternSet for sharing ref-counted handles to large values.

e5e3779

Pre: Derive more for Entity.

4fd7760

Pre: Return DB from creating; return TxReport from transact.

8ae2b4a

I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported.

Persist partitions to SQL store; allocate transaction ID. (mozilla#186)

73eead0

Implement tempid upsert resolution algorithm. (mozilla#184)

2ca0fa5

Post: Test upserting with vectors.

96fd119

This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193.

Post: Separate Tx out of DB.

32c9f31

This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet.

Post: Comment on implementation choices in the transactor.

558d419

Review comment: Put long use lists on separate lines.

f2c56cb

Review comment: Accept String: Borrow<S> instead of just String.

97a647b

Review comment: Address nits.

f715a27

ncalexan force-pushed the upsert-resolution branch from 67cb18f to f715a27 Compare February 15, 2017 00:49

ncalexan merged commit 16e9740 into mozilla:rust Feb 15, 2017

jaredhirsch removed the in progress label Feb 15, 2017

ncalexan mentioned this pull request Feb 15, 2017

[tx] Test collecting tempids after upsert resolution #299

Closed

ncalexan deleted the upsert-resolution branch February 15, 2017 00:54

ncalexan mentioned this pull request Feb 15, 2017

[doc] Simplify the upsert resolution Wiki page to agree with the actual implementation #313

Open

gburd mentioned this pull request Aug 6, 2020

[doc] Simplify the upsert resolution Wiki page to agree with the actual implementation qpdb/mentat#128

Open

Conversation

ncalexan commented Feb 10, 2017

Uh oh!

ncalexan commented Feb 12, 2017

Uh oh!

ncalexan commented Feb 12, 2017

Uh oh!

rnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncalexan Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ncalexan Feb 14, 2017 •

edited

Loading