Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Start installing SQL schema.#171

Merged
ncalexan merged 14 commits into
mozilla:rustfrom
ncalexan:rusqlite-schema
Jan 26, 2017
Merged

Start installing SQL schema.#171
ncalexan merged 14 commits into
mozilla:rustfrom
ncalexan:rusqlite-schema

Conversation

@ncalexan
Copy link
Copy Markdown
Member

WIP in progress branch for #170.

Comment thread db/src/db.rs Outdated

const V2_STATEMENTS: [&'static str; 19] = V1_STATEMENTS;

fn set_user_version(conn: &rusqlite::Connection, version: i32) -> Result<i32> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsigned int?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, so I've filed #190 to track it.

Comment thread db/src/db.rs
conn.execute(&format!("PRAGMA user_version = {}", version), &[])
}

fn get_user_version(conn: &rusqlite::Connection) -> Result<i32> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

@ncalexan ncalexan force-pushed the rusqlite-schema branch 5 times, most recently from 80c1838 to 691192a Compare January 20, 2017 23:22
@ncalexan ncalexan requested a review from joewalker January 20, 2017 23:23
@ncalexan
Copy link
Copy Markdown
Member Author

@rnewman, @joewalker perhaps you two could be the official reviewers for this patch bomb.

@victorporof, @jsantell if one or both of you would care to review this, please steal.

Sorry for the giant code bomb, but it's not worth slicing and dicing this further. There's lots of TODOs and unimplemented pieces, but I want to get the types and database reading code out there, and circulate some patterns around using error_chain and rusqlite. We can fill things in and improve things as we iterate.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 21, 2017

I'll take a look tomorrow and Monday. I'm keen to get bits on disk!

Comment thread .gitignore

/edn/target/
/fixtures/*.db-shm
/fixtures/*.db-wal
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I just landed this on rust.

Comment thread db/src/bootstrap.rs
:db/noHistory {:db/valueType :db.type/boolean
:db/cardinality :db.cardinality/one}}"#;
edn::parse::value(s)
.map_err(|_| ErrorKind::BadBootstrapDefinition("Unable to parse V1_SYMBOLIC_SCHEMA".into()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have a runtime or code-size penalty? Seems like we could just unwrap and rely on the line number in the panic to tell us what went wrong in the unlikely event that this stops working, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine it must have both a runtime and a code-size penalty, since there are instructions involved.

However, Rust in general (and unwrap in particular) provides almost no backtrace help, so I think it's worth including until we prove otherwise.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would expect be the right thing here? It's basically "unwrap with a better error message".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, particularly in the case of Option<T> -> Result<T>, which I do (awkwardly) a lot. Much appreciated.

Comment thread db/src/bootstrap.rs
.into_iter()
.map(|&(ident, _)| {
let value = Value::NamespacedKeyword(to_namespaced_keyword(&ident).unwrap());
Value::Vector(vec![values::DB_ADD.clone(), value.clone(), values::DB_IDENT.clone(), value.clone()])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be worth discussing.

DB_ADD is a lazy static instance of a NamespacedKeyword. That itself owns two strings.

These cloned instances aren't immediately used and discarded, so I don't think the compiler can determine that it doesn't need to clone them.

What's the idiomatic way to avoid having to do this cloning?

I can think of:

  • Using Rc to track multiple uses of a single instance of a keyword.
  • Seeing if you can use Into<&NamespacedKeyword> in the signature somehow.
  • Using an enum or other sentinel instead of DB_ADD directly — that is, the only allowable values here are DB_ADD and DB_RETRACT, so we don't need to represent these in Value as keywords, only as a simple enum (which the compiler will turn into 0/1). This (as with some of the other solutions) doesn't help for duplication of other keywords.
  • If you can guarantee the scope in which this function and its consumers runs, you can use &NamespacedKeyword instead. The Vec<Value> could never escape the context of the parsed representation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raise this because a big import might end up with 100,000 copies of "db" and 100,000 copies of "add" on the heap, which is obviously not an ideal use of half a meg of RAM.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose servo/string-cache might help here…

Comment thread db/src/bootstrap.rs
Value::Map(ref m) => {
for (ident, mp) in m {
assertions.push(Value::Vector(vec![values::DB_ADD.clone(),
values::DB_PART_DB.clone(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly…

I wonder if there's an over-arching architectural way to avoid all this.

Comment thread db/src/bootstrap.rs Outdated

pub fn bootstrap_partition_map() -> PartitionMap {
V1_PARTS[..].into_iter()
.chain(V2_PARTS[..].into_iter())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not just make V2_PARTS be a superset of V1_PARTS? That's less complected; only a v1->v2 migrator needs to know what the difference is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll duplicate this. I had it that way originally, and did this to be similar to what happened elsewhere; I'll instead decomplect V1 from V2 in the code, and make the variables encode the extensions. (Assuming I can figure out the correct expression of that idea.)

Comment thread db/src/types.rs
/// *Unique-value* means that there is at most one assertion with the attribute and a
/// particular value in the datom store.
pub unique_value: bool,
/// `true` if this attribute is unique-identity, i.e., it is `:db/unique :db.unique/identity`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer newlines between code and doc blocks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too, but Rust standard seems split on this. I've added the newlines.

Comment thread db/src/types.rs
/// :db.cardinality/many`. `false` if this attribute is single-valued (the default), i.e., it
/// is `:db/cardinality :db.cardinality/one`.
pub multival: bool,
/// `true` if this attribute is unique-value, i.e., it is `:db/unique :db.unique/value`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an is_valid method to Attribute?

assert!(!self.multival && self.unique_value));
if self.unique_identity {
  assert!(self.value_type == ValueType::Ref);
  …

etc.

(Code won't compile, but you get my drift.)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what John Regehr calls checkRep in

http://blog.regehr.org/archives/1091

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can call check_rep from inside the struct ::new method I suggest you define, ensuring that only valid attributes can be represented without the system panicking.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a lazy approach here, since one can specify :db/fulltext true before :db/valueType :db.type/string. I can think of lots of ways of doing this, but few that are better than manually doing check_rep when we create the Schema from the SchemaMap. One possibility would be to encapsulate all the changes to the Attribute into a builder pattern, sort the builders before application, and then do incremental changes. It's a lot of work for a little pay off.

For this ticket, I'll type-check on Schema construction, and if we need to do more later we can do it.

Comment thread db/src/types.rs
}

/// Map `String` idents (`:db/ident`) to positive integer entids (`1`).
pub type IdentMap = BTreeMap<String, Entid>;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about these being ordered?

If not, I think you should use HashMap, per https://doc.rust-lang.org/std/collections/index.html#when-should-you-use-which-collection

Comment thread db/src/types.rs
/// Map `String` idents (`:db/ident`) to positive integer entids (`1`).
pub type IdentMap = BTreeMap<String, Entid>;

/// Map positive integer entids (`1`) to `String` idents (`:db/ident`).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a better representation for this, too — a sparse array map — but that's fine to leave as a TODO!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use BTreeMap simply because we're using it elsewhere (in edn). Let's handle this in follow-up: #192

Comment thread db/src/types.rs
}

/// Represents the metadata required to query from, or apply transactions to, a Mentat store.
#[derive(Clone,Debug,Default,Eq,Hash,Ord,PartialOrd,PartialEq)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps leave a pointer to https://github.com/mozilla/mentat/wiki/Thoughts:-modeling-db-conn-in-Rust here, because eventually people will wonder why DB looks the way it does…

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 22, 2017

Three things I really care about:

  • Whether we're doing the wrong thing with having a linearly increasing number of clones of simple things hanging around in memory while we process input. This doesn't seem right.
  • V2 vs V1. I'd be fine seeing only V2 and the very straightforward upgrade from V1 to V2, which you can copy out of master. Simpler.
  • Seeing if we can get the valueType conversion slightly more correct from the start. I'm a week or four away from starting to hit this, and I can see myself getting mired down if I have to juggle more than just my own code. (I'm going very slowly in Rust so far.)

Excellent work, and lots of painful edges apparently hit! The more knowledge we can pull out and share from this, the better.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 22, 2017

After a little reading and thinking, I think I can summarize my position on lots of cloning.

We have five main scopes:

  1. static. We have all of our default keywords here as lazy_static.
  2. Within a call to q.
  3. Within a call to transact.
  4. The scope that receives query results. This is larger than 2.
  5. The scope that receives a tx-report. This is larger than 3.

Now, we typically parse our query input from strings (creating new instances with new Strings), or we're given programmatic input from scope 4 or scope 5.

That makes me think that within 2 and 3 — that is, when transacting or querying — we should be able to exclusively work with refs to keywords and strs. The enclosing scope, or the parsed input, provides the real data, and we only have to work with references all the way down to the database.

(Even if we didn't create new Strings, we could require that the string input sticks around so we could refer to bits of it, even within the parser… but that doesn't work so well with a work queue and arbitrary threads and prepared statements. So let's say that the query or the transact input stick around until we're done pushing them into the database. Parsing results in a new, owned structure.)

The way back out — query results and the tx-report — might be different. The values will come from the database, and we don't want to impose on callers in 4 and 5 the requirement to clone everything when some of the values will be new and originally owned.

To fix that we probably need some kind of interning/shared data/Rc. Perhaps a query result or a tx-report is automatically interning, using Rc for all keywords…

Thoughts?

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 22, 2017

OBTW, I found that http://xion.io/post/code/rust-borrowchk-tricks.html clarified my understanding of this set of tradeoffs still further.

@victorporof
Copy link
Copy Markdown
Contributor

victorporof commented Jan 23, 2017

I missed the review ping, I'll go through this as well, but mostly hoping to learn stuff rather than offer any valuable criticism.

@ncalexan
Copy link
Copy Markdown
Member Author

Three things I really care about:

Whether we're doing the wrong thing with having a linearly increasing number of clones of simple things hanging around in memory while we process input. This doesn't seem right.

The paths that clone are only in the bootstrap code path, which is both bounded size and infrequently hit. The hot-path through the transactor isn't really present in this patch (and will require additional work to not clone). We'll handle this as I bone out the approach.

There's definitely a good ticket here to not go via EDN for bootstrapping at all -- I did that simply 'cuz I got tired of trying to define static structures in Rust's verbose language. I've filed #193 to do better.

V2 vs V1. I'd be fine seeing only V2 and the very straightforward upgrade from V1 to V2, which you can copy out of master. Simpler.

I've dropped V1 support entirely, and filed #194 to fill it in.

Seeing if we can get the valueType conversion slightly more correct from the start. I'm a week or four away from starting to hit this, and I can see myself getting mired down if I have to juggle more than just my own code. (I'm going very slowly in Rust so far.)

Your point is well taken. We have an injection (ValueType, edn::Value) -> (i32, rusqlite::Value), but we cannot factor further in the first component. Let me see what the encoding feel of that feels like.

Excellent work, and lots of painful edges apparently hit! The more knowledge we can pull out and share from this, the better.

Thank you.

@ncalexan
Copy link
Copy Markdown
Member Author

Your point is well taken. We have an injection (ValueType, edn::Value) -> (i32, rusqlite::Value), but we cannot factor further in the first component. Let me see what the encoding feel of that feels like.

I started to dig further into this, and have opened https://github.com/jgallagher/rusqlite/issues/211 to try to understand our situation more clearly. I know how to address our case without changing rusqlite, but it may just be a library oversight that we should try to upstream, in which case I'll put the effort there.

Comment thread db/src/db.rs
set_user_version(&tx, CURRENT_VERSION)?;
let user_version = get_user_version(&tx)?;

// TODO: use the drop semantics to do this automagically?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't recommend this (i.e., I'd keep it explicit as written). By default dropping tx would attempt to perform a rollback. You could change that with set_drop_behavior, but then you don't get to see if the commit succeeded or failed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance. I'll drop the TODO instead :)

@ncalexan
Copy link
Copy Markdown
Member Author

FWIW I just fixed this in rusqlite (PR); the get here will now panic instead of silently truncating. get_checked is also available if you want a Result instead.

In general, I have been preferring get_checked(...)? to avoid the "bare panic" (but not specifically to avoid truncating). I read the code and must have mis-interpreted; I thought they would panic rather than truncate.

@ncalexan
Copy link
Copy Markdown
Member Author

@rnewman OK, this is ready for another pass. I think the introduction of TypedValue really improved the code.

I realize now that I didn't move the documentation notes to the Wiki; I'll do that either before or just after landing.

@victorporof @jsantell Let me know what y'all think! There's lots of tests waiting to be written for this work; I'll file a few tickets to track areas that need love later today.

Comment thread db/Cargo.toml
[dependencies]
error-chain = "0.8.0"
lazy_static = "0.2.2"
# TODO: don't depend on num and ordered-float; expose helpers in edn abstracting necessary constructors.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue, please! This affects multiple consumers, I think.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I filed #198 while reviewing your code to track this.

Comment thread db/src/bootstrap.rs
// bootstrap symbolic schema, or by representing the initial bootstrap
// schema directly as Rust data.
let typed_value = match TypedValue::from_edn_value(value) {
Some(TypedValue::Keyword(ref s)) => TypedValue::Ref(*ident_map.get(s).ok_or(ErrorKind::UnrecognizedIdent(s.clone()))?),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let typed_value = TypedValue::from_edn_value(value).map(|x|
  if let TypedValue::Keyword(ref s) {
    …
  } else {
    x
  }).expect(…);

is perhaps neater?

Copy link
Copy Markdown
Member Author

@ncalexan ncalexan Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be made neater, but I want to keep the error handling I've written. Next time!

Comment thread db/src/db.rs Outdated
// share a tag.
(5, &rusqlite::types::Value::Integer(ref x)) => Ok(TypedValue::Long(*x)),
(5, &rusqlite::types::Value::Real(ref x)) => Ok(TypedValue::Double((*x).into())),
(10, &rusqlite::types::Value::Text(ref x)) => Ok(TypedValue::String(x.clone())),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want move semantics for this, no? That is: if you want to keep a copy of the EDN value, pass a clone to from_sql_value_pair.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I thought about this and I'm okay with into_sql_value_pair; let me try it on for size.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, into is the wrong direction.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this.

Comment thread db/src/db.rs
(0, &rusqlite::types::Value::Integer(ref x)) => Ok(TypedValue::Ref(*x)),
(1, &rusqlite::types::Value::Integer(ref x)) => Ok(TypedValue::Boolean(0 != *x)),
// SQLite distinguishes integral from decimal types, allowing long and double to
// share a tag.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: 4 = date, 11 = UUID.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread db/src/db.rs
&Value::Integer(x) => Some(TypedValue::Long(x)),
&Value::Float(ref x) => Some(TypedValue::Double(x.clone())),
&Value::Text(ref x) => Some(TypedValue::String(x.clone())),
&Value::NamespacedKeyword(ref x) => Some(TypedValue::Keyword(x.to_string())),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TypedValue::Keyword should contain a Keyword, preserving the namespace/name distinction and avoiding consing up a new string. This might mean you need to extract a keyword crate out of edn for reuse. Please file an issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be a grand reckoning where we do a lot of String to &str rewriting, so I've filed #203 to keep this around.

Comment thread db/src/db.rs
&TypedValue::Long(x) => (rusqlite::types::Value::Integer(x).into(), 5),
&TypedValue::Double(x) => (rusqlite::types::Value::Real(x.into_inner()).into(), 5),
&TypedValue::String(ref x) => (rusqlite::types::ValueRef::Text(x.as_str()).into(), 10),
&TypedValue::Keyword(ref x) => (rusqlite::types::ValueRef::Text(x.as_str()).into(), 13),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the point at which we'd produce a str directly from a Keyword.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By keeping a String in the Keyword, this doesn't copy data until it hits SQLite. If we keep a Keyword (really a NamespacedKeyword, since I'm being strict in my parser for now) then we have to cons strings and can't use the ValueRef in this way.

Comment thread db/src/entids.rs Outdated
pub const DB_PART_TX: Entid = 17;
pub const DB_EXCISE: Entid = 18;
pub const DB_EXCISE_ATTRS: Entid = 19;
pub const DB_EXCISE_BEFORET: Entid = 20;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be DB_EXCISE_BEFORE_T for clarity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this was an oversight.

This patch factors the fundamental SQL conversion maps
between (rusqlite::Value, value_type_tag) and (edn::Value, ValueType)
through a new Mentat TypedValue.  (A future patch might rename this
fundamental type mentat::Value.)

To make certain conversion functions infallible, I removed
placeholders for :db.type/{instant,uuid,uri}.  (We could panic
instead, but there's no need to do that right now.)
This avoids (runtime) failures in Travis CI due to old SQLite
versions.  See jgallagher/rusqlite@432966a.
@ncalexan ncalexan merged commit 81af295 into mozilla:rust Jan 26, 2017
victorporof pushed a commit that referenced this pull request Feb 2, 2017
* Start installing the SQLite store and bootstrapping the datom store.

* Review comment: Decomplect V2_IDENTS.

* Review comment: Decomplect V2_PARTS.

* Review comment: Pre: Expose Clojure's merge on Value instances.

* Review comment: Decomplect V2_SYMBOLIC_SCHEMA.

* Review comment: Decomplect V1_STATEMENTS.

* Review comment: Prefer ? to try!.

* Review comment: Fix typos; format; add TODOs.

* Review comment: Assert that Mentat `Schema` is valid upon creation.

* Review comment: Improve conversion to and from SQL values.

This patch factors the fundamental SQL conversion maps
between (rusqlite::Value, value_type_tag) and (edn::Value, ValueType)
through a new Mentat TypedValue.  (A future patch might rename this
fundamental type mentat::Value.)

To make certain conversion functions infallible, I removed
placeholders for :db.type/{instant,uuid,uri}.  (We could panic
instead, but there's no need to do that right now.)

* Review comment: Always uses bundled SQLite in rusqlite.

This avoids (runtime) failures in Travis CI due to old SQLite
versions.  See jgallagher/rusqlite@432966a.

* Review comment: Move semantics in `from_sql_value_pair`.

* Review comment: DB_EXCISE_BEFORE_T instead of ...BEFORET (no underscore).

* Review comment: Move overview notes to the Wiki.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants