Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Mar 17, 2020

This PR necessitated adding Scalar::Parse, MakeArrayFromScalar, and ArrayFromJSON cases for dictionary type.

@github-actions
Copy link

@jorisvandenbossche
Copy link
Member

@bkietz I added a python test case as well

@bkietz
Copy link
Member Author

bkietz commented Mar 18, 2020

Thanks @jorisvandenbossche

@bkietz bkietz requested a review from fsaintjacques March 19, 2020 12:37
Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the Json representation, I think we should shoot for the most obvious which in this case is parsing primitive values with dict encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the ArrayFromJson to just works with primitive values, e.g. ArrayFromJson(dict(uf8(), int8()), R"["horse", "cat", null]"); should just work. We don't need to encode the physical representation, unless I'm missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to encode the indices as well to support comparison with sliced dictionary arrays. For example:

// indices ordered as if assembled with DictionaryBuilder
// from ["whiskey", "tango", "foxtrot"]
auto array_wtf = ArrayFromJSON(dictionary(int8(), utf8()), R"([
      [0, ["whiskey", "tango", "foxtrot"]],
      [1],
      [2]
  ])");

// decoded values are ["tango", "foxtrot"], but indices *are not*
// as if assembled with DictionaryBuilder from ["tango", "foxtrot"]
auto array_tf = array_wtf->Slice(1);

assert(ArrayFromJSON(dictionary(int8(), utf8()), R"([
      [1, ["whiskey", "tango", "foxtrot"]],
      [2]
  ])")->Equals(array_tf));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glancing the string example and I can't intuitively understand what it maps to.

auto dict_type = dictionary(int8(), utf8());
// Obvious with a glance
auto dict_array = ArrayFromJSON(dict_type, R"(["whiskey", "tango", "foxtrot"])");
// Require cognitive overhead and reading the decoder to understand what's happening.
auto dict_array = ArrayFromJSON(dict_type, R"([
      [0, ["whiskey", "tango", "foxtrot"]],
      [1],
      [2]
  ])");

When precise physical layout is needed, I recommend adding a family of methods:

DictArrayFromJSON(type, std::shared_ptr<Array> values, std::string indices_string);
DictArrayFromJSON(type, std::string values_string, std::string indices_string);

I'm suggesting this because we want tests to be trivial and succinct to read. The end result is that 80% of tests which don't need exact layout will be easy to read, while the edge case will have the special method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll extract DictArrayFromJSON

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the ArrayFromJSON behavior you describe is highly non trivial to implement, so I'll leave that for a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants