Start installing SQL schema.#171
Conversation
|
|
||
| const V2_STATEMENTS: [&'static str; 19] = V1_STATEMENTS; | ||
|
|
||
| fn set_user_version(conn: &rusqlite::Connection, version: i32) -> Result<i32> { |
There was a problem hiding this comment.
This is a good idea, so I've filed #190 to track it.
| conn.execute(&format!("PRAGMA user_version = {}", version), &[]) | ||
| } | ||
|
|
||
| fn get_user_version(conn: &rusqlite::Connection) -> Result<i32> { |
80c1838 to
691192a
Compare
|
@rnewman, @joewalker perhaps you two could be the official reviewers for this patch bomb. @victorporof, @jsantell if one or both of you would care to review this, please steal. Sorry for the giant code bomb, but it's not worth slicing and dicing this further. There's lots of TODOs and unimplemented pieces, but I want to get the types and database reading code out there, and circulate some patterns around using |
|
I'll take a look tomorrow and Monday. I'm keen to get bits on disk! |
|
|
||
| /edn/target/ | ||
| /fixtures/*.db-shm | ||
| /fixtures/*.db-wal |
There was a problem hiding this comment.
Hah, I just landed this on rust.
| :db/noHistory {:db/valueType :db.type/boolean | ||
| :db/cardinality :db.cardinality/one}}"#; | ||
| edn::parse::value(s) | ||
| .map_err(|_| ErrorKind::BadBootstrapDefinition("Unable to parse V1_SYMBOLIC_SCHEMA".into())) |
There was a problem hiding this comment.
Does this have a runtime or code-size penalty? Seems like we could just unwrap and rely on the line number in the panic to tell us what went wrong in the unlikely event that this stops working, no?
There was a problem hiding this comment.
I imagine it must have both a runtime and a code-size penalty, since there are instructions involved.
However, Rust in general (and unwrap in particular) provides almost no backtrace help, so I think it's worth including until we prove otherwise.
There was a problem hiding this comment.
Would expect be the right thing here? It's basically "unwrap with a better error message".
There was a problem hiding this comment.
I think so, particularly in the case of Option<T> -> Result<T>, which I do (awkwardly) a lot. Much appreciated.
| .into_iter() | ||
| .map(|&(ident, _)| { | ||
| let value = Value::NamespacedKeyword(to_namespaced_keyword(&ident).unwrap()); | ||
| Value::Vector(vec![values::DB_ADD.clone(), value.clone(), values::DB_IDENT.clone(), value.clone()]) |
There was a problem hiding this comment.
This might be worth discussing.
DB_ADD is a lazy static instance of a NamespacedKeyword. That itself owns two strings.
These cloned instances aren't immediately used and discarded, so I don't think the compiler can determine that it doesn't need to clone them.
What's the idiomatic way to avoid having to do this cloning?
I can think of:
- Using
Rcto track multiple uses of a single instance of a keyword. - Seeing if you can use
Into<&NamespacedKeyword>in the signature somehow. - Using an enum or other sentinel instead of
DB_ADDdirectly — that is, the only allowable values here areDB_ADDandDB_RETRACT, so we don't need to represent these inValueas keywords, only as a simple enum (which the compiler will turn into0/1). This (as with some of the other solutions) doesn't help for duplication of other keywords. - If you can guarantee the scope in which this function and its consumers runs, you can use
&NamespacedKeywordinstead. TheVec<Value>could never escape the context of the parsed representation.
There was a problem hiding this comment.
I raise this because a big import might end up with 100,000 copies of "db" and 100,000 copies of "add" on the heap, which is obviously not an ideal use of half a meg of RAM.
There was a problem hiding this comment.
I suppose servo/string-cache might help here…
| Value::Map(ref m) => { | ||
| for (ident, mp) in m { | ||
| assertions.push(Value::Vector(vec![values::DB_ADD.clone(), | ||
| values::DB_PART_DB.clone(), |
There was a problem hiding this comment.
Similarly…
I wonder if there's an over-arching architectural way to avoid all this.
|
|
||
| pub fn bootstrap_partition_map() -> PartitionMap { | ||
| V1_PARTS[..].into_iter() | ||
| .chain(V2_PARTS[..].into_iter()) |
There was a problem hiding this comment.
Can you not just make V2_PARTS be a superset of V1_PARTS? That's less complected; only a v1->v2 migrator needs to know what the difference is.
There was a problem hiding this comment.
Sure, I'll duplicate this. I had it that way originally, and did this to be similar to what happened elsewhere; I'll instead decomplect V1 from V2 in the code, and make the variables encode the extensions. (Assuming I can figure out the correct expression of that idea.)
| /// *Unique-value* means that there is at most one assertion with the attribute and a | ||
| /// particular value in the datom store. | ||
| pub unique_value: bool, | ||
| /// `true` if this attribute is unique-identity, i.e., it is `:db/unique :db.unique/identity`. |
There was a problem hiding this comment.
I'd prefer newlines between code and doc blocks.
There was a problem hiding this comment.
Me too, but Rust standard seems split on this. I've added the newlines.
| /// :db.cardinality/many`. `false` if this attribute is single-valued (the default), i.e., it | ||
| /// is `:db/cardinality :db.cardinality/one`. | ||
| pub multival: bool, | ||
| /// `true` if this attribute is unique-value, i.e., it is `:db/unique :db.unique/value`. |
There was a problem hiding this comment.
Can you add an is_valid method to Attribute?
assert!(!self.multival && self.unique_value));
if self.unique_identity {
assert!(self.value_type == ValueType::Ref);
…
etc.
(Code won't compile, but you get my drift.)
There was a problem hiding this comment.
This is what John Regehr calls checkRep in
There was a problem hiding this comment.
Then you can call check_rep from inside the struct ::new method I suggest you define, ensuring that only valid attributes can be represented without the system panicking.
There was a problem hiding this comment.
There needs to be a lazy approach here, since one can specify :db/fulltext true before :db/valueType :db.type/string. I can think of lots of ways of doing this, but few that are better than manually doing check_rep when we create the Schema from the SchemaMap. One possibility would be to encapsulate all the changes to the Attribute into a builder pattern, sort the builders before application, and then do incremental changes. It's a lot of work for a little pay off.
For this ticket, I'll type-check on Schema construction, and if we need to do more later we can do it.
| } | ||
|
|
||
| /// Map `String` idents (`:db/ident`) to positive integer entids (`1`). | ||
| pub type IdentMap = BTreeMap<String, Entid>; |
There was a problem hiding this comment.
Do we care about these being ordered?
If not, I think you should use HashMap, per https://doc.rust-lang.org/std/collections/index.html#when-should-you-use-which-collection
| /// Map `String` idents (`:db/ident`) to positive integer entids (`1`). | ||
| pub type IdentMap = BTreeMap<String, Entid>; | ||
|
|
||
| /// Map positive integer entids (`1`) to `String` idents (`:db/ident`). |
There was a problem hiding this comment.
There's a better representation for this, too — a sparse array map — but that's fine to leave as a TODO!
There was a problem hiding this comment.
I decided to use BTreeMap simply because we're using it elsewhere (in edn). Let's handle this in follow-up: #192
| } | ||
|
|
||
| /// Represents the metadata required to query from, or apply transactions to, a Mentat store. | ||
| #[derive(Clone,Debug,Default,Eq,Hash,Ord,PartialOrd,PartialEq)] |
There was a problem hiding this comment.
Perhaps leave a pointer to https://github.com/mozilla/mentat/wiki/Thoughts:-modeling-db-conn-in-Rust here, because eventually people will wonder why DB looks the way it does…
|
Three things I really care about:
Excellent work, and lots of painful edges apparently hit! The more knowledge we can pull out and share from this, the better. |
|
After a little reading and thinking, I think I can summarize my position on lots of cloning. We have five main scopes:
Now, we typically parse our query input from strings (creating new instances with new That makes me think that within 2 and 3 — that is, when transacting or querying — we should be able to exclusively work with refs to keywords and (Even if we didn't create new The way back out — query results and the tx-report — might be different. The values will come from the database, and we don't want to impose on callers in 4 and 5 the requirement to clone everything when some of the values will be new and originally owned. To fix that we probably need some kind of interning/shared data/Rc. Perhaps a query result or a Thoughts? |
|
OBTW, I found that http://xion.io/post/code/rust-borrowchk-tricks.html clarified my understanding of this set of tradeoffs still further. |
|
I missed the review ping, I'll go through this as well, but mostly hoping to learn stuff rather than offer any valuable criticism. |
888dcbd to
7bf8d99
Compare
The paths that clone are only in the bootstrap code path, which is both bounded size and infrequently hit. The hot-path through the transactor isn't really present in this patch (and will require additional work to not clone). We'll handle this as I bone out the approach. There's definitely a good ticket here to not go via EDN for bootstrapping at all -- I did that simply 'cuz I got tired of trying to define static structures in Rust's verbose language. I've filed #193 to do better.
I've dropped V1 support entirely, and filed #194 to fill it in.
Your point is well taken. We have an injection
Thank you. |
I started to dig further into this, and have opened https://github.com/jgallagher/rusqlite/issues/211 to try to understand our situation more clearly. I know how to address our case without changing |
| set_user_version(&tx, CURRENT_VERSION)?; | ||
| let user_version = get_user_version(&tx)?; | ||
|
|
||
| // TODO: use the drop semantics to do this automagically? |
There was a problem hiding this comment.
I wouldn't recommend this (i.e., I'd keep it explicit as written). By default dropping tx would attempt to perform a rollback. You could change that with set_drop_behavior, but then you don't get to see if the commit succeeded or failed.
There was a problem hiding this comment.
Thanks for the guidance. I'll drop the TODO instead :)
In general, I have been preferring |
|
@rnewman OK, this is ready for another pass. I think the introduction of I realize now that I didn't move the documentation notes to the Wiki; I'll do that either before or just after landing. @victorporof @jsantell Let me know what y'all think! There's lots of tests waiting to be written for this work; I'll file a few tickets to track areas that need love later today. |
| [dependencies] | ||
| error-chain = "0.8.0" | ||
| lazy_static = "0.2.2" | ||
| # TODO: don't depend on num and ordered-float; expose helpers in edn abstracting necessary constructors. |
There was a problem hiding this comment.
Issue, please! This affects multiple consumers, I think.
There was a problem hiding this comment.
Yep, I filed #198 while reviewing your code to track this.
| // bootstrap symbolic schema, or by representing the initial bootstrap | ||
| // schema directly as Rust data. | ||
| let typed_value = match TypedValue::from_edn_value(value) { | ||
| Some(TypedValue::Keyword(ref s)) => TypedValue::Ref(*ident_map.get(s).ok_or(ErrorKind::UnrecognizedIdent(s.clone()))?), |
There was a problem hiding this comment.
let typed_value = TypedValue::from_edn_value(value).map(|x|
if let TypedValue::Keyword(ref s) {
…
} else {
x
}).expect(…);
is perhaps neater?
There was a problem hiding this comment.
I think it could be made neater, but I want to keep the error handling I've written. Next time!
| // share a tag. | ||
| (5, &rusqlite::types::Value::Integer(ref x)) => Ok(TypedValue::Long(*x)), | ||
| (5, &rusqlite::types::Value::Real(ref x)) => Ok(TypedValue::Double((*x).into())), | ||
| (10, &rusqlite::types::Value::Text(ref x)) => Ok(TypedValue::String(x.clone())), |
There was a problem hiding this comment.
I think we probably want move semantics for this, no? That is: if you want to keep a copy of the EDN value, pass a clone to from_sql_value_pair.
There was a problem hiding this comment.
Maybe. I thought about this and I'm okay with into_sql_value_pair; let me try it on for size.
There was a problem hiding this comment.
Oh, wait, into is the wrong direction.
| (0, &rusqlite::types::Value::Integer(ref x)) => Ok(TypedValue::Ref(*x)), | ||
| (1, &rusqlite::types::Value::Integer(ref x)) => Ok(TypedValue::Boolean(0 != *x)), | ||
| // SQLite distinguishes integral from decimal types, allowing long and double to | ||
| // share a tag. |
There was a problem hiding this comment.
TODO: 4 = date, 11 = UUID.
| &Value::Integer(x) => Some(TypedValue::Long(x)), | ||
| &Value::Float(ref x) => Some(TypedValue::Double(x.clone())), | ||
| &Value::Text(ref x) => Some(TypedValue::String(x.clone())), | ||
| &Value::NamespacedKeyword(ref x) => Some(TypedValue::Keyword(x.to_string())), |
There was a problem hiding this comment.
I think TypedValue::Keyword should contain a Keyword, preserving the namespace/name distinction and avoiding consing up a new string. This might mean you need to extract a keyword crate out of edn for reuse. Please file an issue.
There was a problem hiding this comment.
I think there will be a grand reckoning where we do a lot of String to &str rewriting, so I've filed #203 to keep this around.
| &TypedValue::Long(x) => (rusqlite::types::Value::Integer(x).into(), 5), | ||
| &TypedValue::Double(x) => (rusqlite::types::Value::Real(x.into_inner()).into(), 5), | ||
| &TypedValue::String(ref x) => (rusqlite::types::ValueRef::Text(x.as_str()).into(), 10), | ||
| &TypedValue::Keyword(ref x) => (rusqlite::types::ValueRef::Text(x.as_str()).into(), 13), |
There was a problem hiding this comment.
This is the point at which we'd produce a str directly from a Keyword.
There was a problem hiding this comment.
By keeping a String in the Keyword, this doesn't copy data until it hits SQLite. If we keep a Keyword (really a NamespacedKeyword, since I'm being strict in my parser for now) then we have to cons strings and can't use the ValueRef in this way.
| pub const DB_PART_TX: Entid = 17; | ||
| pub const DB_EXCISE: Entid = 18; | ||
| pub const DB_EXCISE_ATTRS: Entid = 19; | ||
| pub const DB_EXCISE_BEFORET: Entid = 20; |
There was a problem hiding this comment.
I think this should be DB_EXCISE_BEFORE_T for clarity.
There was a problem hiding this comment.
Agreed, this was an oversight.
This patch factors the fundamental SQL conversion maps
between (rusqlite::Value, value_type_tag) and (edn::Value, ValueType)
through a new Mentat TypedValue. (A future patch might rename this
fundamental type mentat::Value.)
To make certain conversion functions infallible, I removed
placeholders for :db.type/{instant,uuid,uri}. (We could panic
instead, but there's no need to do that right now.)
This avoids (runtime) failures in Travis CI due to old SQLite versions. See jgallagher/rusqlite@432966a.
ced04f0 to
1fe9041
Compare
* Start installing the SQLite store and bootstrapping the datom store.
* Review comment: Decomplect V2_IDENTS.
* Review comment: Decomplect V2_PARTS.
* Review comment: Pre: Expose Clojure's merge on Value instances.
* Review comment: Decomplect V2_SYMBOLIC_SCHEMA.
* Review comment: Decomplect V1_STATEMENTS.
* Review comment: Prefer ? to try!.
* Review comment: Fix typos; format; add TODOs.
* Review comment: Assert that Mentat `Schema` is valid upon creation.
* Review comment: Improve conversion to and from SQL values.
This patch factors the fundamental SQL conversion maps
between (rusqlite::Value, value_type_tag) and (edn::Value, ValueType)
through a new Mentat TypedValue. (A future patch might rename this
fundamental type mentat::Value.)
To make certain conversion functions infallible, I removed
placeholders for :db.type/{instant,uuid,uri}. (We could panic
instead, but there's no need to do that right now.)
* Review comment: Always uses bundled SQLite in rusqlite.
This avoids (runtime) failures in Travis CI due to old SQLite
versions. See jgallagher/rusqlite@432966a.
* Review comment: Move semantics in `from_sql_value_pair`.
* Review comment: DB_EXCISE_BEFORE_T instead of ...BEFORET (no underscore).
* Review comment: Move overview notes to the Wiki.
WIP in progress branch for #170.