Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Add tx{-parser} crates; start parsing transactions.#164

Merged
ncalexan merged 2 commits into
mozilla:rustfrom
ncalexan:parse-tx
Jan 13, 2017
Merged

Add tx{-parser} crates; start parsing transactions.#164
ncalexan merged 2 commits into
mozilla:rustfrom
ncalexan:parse-tx

Conversation

@ncalexan
Copy link
Copy Markdown
Member

This depends on edn and uses the combine parser combinator library.

@rnewman, you can see my progress here. There's an abstraction around parsing the internals of vector to be worked out, but it's actually quite nice to work with (once you've absorbed some combine magic). I think some of the constructors mappings might be able to be shortened, but it's not worth waiting for.

@ncalexan
Copy link
Copy Markdown
Member Author

I don't see these errors with

⋊> ~/M/mentat on parse-tx  rustc --version                                                                                                                    16:53:55
rustc 1.16.0-nightly (468227129 2017-01-03)

Will try with stable.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 12, 2017

Builds for me locally with stable.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 12, 2017

Oh, no it doesn't. mentat_tx is fine; mentat_tx_parser fails.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 12, 2017

Copy link
Copy Markdown
Collaborator

@rnewman rnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll need rebasing on top of keywords (please, so I don't have to 'rebase' this when I land!), but otherwise it looks excellent.

Comment thread tx-parser/src/lib.rs Outdated

use combine::{any, eof, many, optional, parser, satisfy_map, token, Parser, ParseResult, Stream};
use combine::combinator::{Expected, FnParser};
// TODO: understand why this is self::edn rather than just edn.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in my branch.

Comment thread tx-parser/src/lib.rs
}

fn integer_(input: I) -> ParseResult<i64, I> {
return satisfy_map(|x: Value| if let Value::Integer(y) = x {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read combinator.rs, and do not claim to understand it. But this is a nice abstraction for allowing failure to arbitrary depth.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combinator.rs is tricky; it took me three days of digging to really understand what is happening :)

Comment thread tx-parser/src/lib.rs Outdated
fn entid_(input: I) -> ParseResult<EntId, I> {
let p = Tx::<I>::integer()
.map(|x| EntId::EntId(x))
.or(Tx::<I>::keyword().map(|x| EntId::Ident(x)))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting — you're unwrapping the keyword, right?

This will need to change a little on top of #163, but in a good way — you'll still be unwrapping, but you'll get e.g., a NamespacedKeyword here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am unwrapping, although there's no real reason for this.

Comment thread tx-parser/src/lib.rs Outdated
}

fn keyword_(input: I) -> ParseResult<String, I> {
return satisfy_map(|x: Value| if let Value::Keyword(y) = x {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really makes you want ??, doesn't it?

Comment thread tx-parser/src/lib.rs

fn lookup_ref_(input: I) -> ParseResult<LookupRef, I> {
return satisfy_map(|x: Value| if let Value::Vector(y) = x {
let mut p = (Tx::<&[Value]>::entid(), any(), eof())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the parser library memoize the construction of these parsers? Should I be worried about multiple invocations of fn_parser?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and there's a subtle thing happening with consuming Parser instances that I don't yet understand. You'll note all these parsers are mutable; that's because they're consumed by parse and friends. So you can't avoid the invocations and allocations. It's unclear how this impacts performance overall.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because FnParsers (and probably other parsers, too) are implicitly stateful — they wrap a function, and there's no concept in Rust of a pure function?

(Indeed, with 'try' for lookahead, one expects LL(n) parser implementations to be stateful.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the parser library memoize the construction of these parsers? Should I be worried about multiple invocations of fn_parser?

Author of combine here. Constructing parsers should be free/very cheap since Parser follow the same model as Iterator so its all stack allocations and since most parsers are either zero-sized or just a few bytes for the function or parameters they take to construct them. As long as LLVM inlines properly there should be zero overhead.

(Indeed, with 'try' for lookahead, one expects LL(n) parser implementations to be stateful.)

That is almost true but the try parser has no state itself, that is all contained in Parser::Input (try, the only thing it contains is the parser it wraps).

Comment thread tx/src/entities.rs Outdated
use self::edn::types::Value;

#[derive(Clone, Debug, PartialEq)]
pub enum EntId {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you steal/clean up the EntId stuff in the top-level repo?

Comment thread tx/src/entities.rs Outdated

extern crate edn;

// TODO: understand why this is self::edn rather than just edn.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier note.

Also try ::edn::types::Value, see how that differs…

Comment thread tx/src/entities.rs
#[derive(Clone, Debug, PartialEq)]
pub struct LookupRef {
pub a: EntId,
// TODO: consider boxing to allow recursive lookup refs.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You crazy.

Comment thread tx/src/entities.rs Outdated
#[derive(Clone, Debug, PartialEq)]
pub enum EntId {
EntId(i64),
Ident(String),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll eventually be Ident(NamespacedOrPlainKeyword), I expect.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just ban PlainKeyword, since it's bad form in general.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair. That's one reason I split, actually: the 'syntax' of the query parser is the main place I expect Keywords (:find, :in), with all of the 'data' being namespaced.

Comment thread tx/src/entities.rs
Add {
e: EntIdOrLookupRef,
a: EntId,
v: ValueOrLookupRef,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly typed cat approves of your strongly typed format.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a strongly typed cat meme generator, stat!

@ncalexan
Copy link
Copy Markdown
Member Author

An identical function exists in the std lib.

Yes, I eventually found it. Sadly, it requires a type annotation in this case, which I couldn't quite make work. (There are TODO's later in the commit). If you can work this out, please do! Thanks!

@ncalexan
Copy link
Copy Markdown
Member Author

Worth calling out that the combine dependency should be safe: combine is MIT and its dependency ascii is MIT/Apache and it's dev dependency byteorder is Unlicense/MIT. (As of combine version 2.1.1, from https://crates.io/crates/combine and GitHub.)

@ncalexan
Copy link
Copy Markdown
Member Author

@rnewman final stamp before I fold? I'm going to punt on the entid simplification until we have a motivation to fix it. There are some decisions about crate organization to be made that don't block this landing.

Comment thread tx-parser/src/lib.rs

// TODO: abstract the "match Vector, parse internal stream" pattern to remove this boilerplate.
fn add_(input: I) -> ParseResult<Entity, I> {
return satisfy_map(|x: Value| -> Option<Entity> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may, I believe you can change this parser into this to preserve errors from parsing the inner value.

satisfy_map(|x: Value| -> Option<Entity> {
 if let Value::Vector(y) = x { Some(y) } else { None }
}).flat_map(|y| {
    let  mut p = ...;
    p.parse(&y[..]).map(|t| t.0)
})

https://docs.rs/combine/2.1.1/combine/trait.Parser.html#method.flat_map

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I try this, I get errors like:

error[E0271]: type mismatch resolving `<parse::combine::combinator::SatisfyMap<&[error::edn::Value], [closure@query-parser/src/parse.rs:95:20: 95:94]> as parse::combine::Parser>::Input == I`
   --> query-parser/src/parse.rs:101:9
    |
101 |        .parse_stream(input)
    |         ^^^^^^^^^^^^ expected reference, found type parameter
    |
    = note: expected type `&[error::edn::Value]`
    = note:    found type `I`
    = note: required because of the requirements on the impl of `parse::combine::Parser` for `parse::combine::combinator::FlatMap<parse::combine::combinator::SatisfyMap<&[error::edn::Value], [closure@query-parser/src/parse.rs:95:20: 95:94]>, [closure@query-parser/src/parse.rs:96:22: 100:9]>`

so this isn't a trivial change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In order to do this, I needed to either use SliceStream or work with the ::Range type, so that the lifetime was preserved. It can be done, but it's awkward. See also the discussion in Marwes/combine#74 (comment).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me gives up, runs git checkout -- .

@ncalexan
Copy link
Copy Markdown
Member Author

Wonder if it's worth consting these defined keyword tokens at the top level.

I think it is, but it's not possible! You can only construct enums and structs, and the NamespacedKeyword members are String instances, which you can't construct at compile time. Frustrating!

@ncalexan ncalexan merged commit b11b9b9 into mozilla:rust Jan 13, 2017
@ncalexan
Copy link
Copy Markdown
Member Author

I think it is, but it's not possible! You can only construct enums and structs, and the NamespacedKeyword members are String instances, which you can't construct at compile time. Frustrating!

@rnewman static has the same restriction, so that's out too.

@rnewman
Copy link
Copy Markdown
Collaborator

rnewman commented Jan 13, 2017 via email

victorporof pushed a commit that referenced this pull request Feb 2, 2017
This depends on edn and uses the combine parser combinator library.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants