Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Mar 5, 2020

FieldRef is a new utility class which represents a reference to a field. It is intended to replace parameters like int field_index and const std::string& name; it can be implicitly constructed from either a field index or a name.

Nested fields can be referenced as well:

// the following all indicate schema->GetFieldByName("alpha")->type()->child(0)
FieldRef ref1({FieldRef("alpha"), FieldRef(0)});
FieldRef ref2("alpha", 0);
ARROW_ASSIGN_OR_RAISE(FieldRef ref3,
                      FieldRef::FromDotPath(".alpha[0]"));

FieldRefs provide a number of accessors for drilling down to potentially nested children. They are overloaded for convenience to support Schema (returns a field), DataType (returns a child field), Field (returns a child field of this field's type) Array (returns a child array), RecordBatch (returns a column), ChunkedArray (returns a ChunkedArray where each chunk is a child array of the corresponding original chunk) and Table (returns a column).

// Field names can match multiple fields in a Schema
Schema a_is_ambiguous({field("a", null()), field("a", null())});
auto matches = FieldRef("a").FindAll(a_is_ambiguous);
assert(matches.size() == 2);
assert_ok_and_eq(FieldRef::Get(match, a_is_ambiguous), a_is_ambiguous.field(0));

// Convenience accessor raises a helpful error if the field is not found or ambiguous
ARROW_ASSIGN_OR_RAISE(auto column, FieldRef("struct", "field_i32").GetOne(some_table));

@bkietz bkietz requested a review from pitrou March 5, 2020 15:14
@github-actions
Copy link

github-actions bot commented Mar 5, 2020

@bkietz bkietz force-pushed the 7412-Dataset-Ensure-that-datas branch from 19ac3c1 to 43d3c51 Compare March 5, 2020 15:40
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm mostly concerned about the variant injection into widely-used headers, and the lack of tests for small_vector.
I haven't looked into the FieldRef implementation closely.

#include "arrow/type_fwd.h" // IWYU pragma: export
#include "arrow/util/checked_cast.h"
#include "arrow/util/macros.h"
#include "arrow/util/small_vector.h"
#include "arrow/util/variant.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's a pity to start pulling the variant header in such a widely-used header...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkietz bkietz force-pushed the 7412-Dataset-Ensure-that-datas branch 2 times, most recently from 2b1fd40 to cb5aa1b Compare March 7, 2020 00:16
@nealrichardson
Copy link
Member

@fsaintjacques you need this for your logical plan? Can you please review/merge?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. Two remaining questions below.

column_indices_[i] = kNoMatch;
} else {
RETURN_NOT_OK(ref.CheckNonMultiple(matches, *from_));
int matching_index = matches[0][0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ignoring other potential matches here? FindAll could return multiple matches if there are columns with the same name, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 103, I assert that there are not multiple matches

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, looks like I forgot my glasses somewhere... where?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckNonMultiple returns an error status if there are multiple matches

@fsaintjacques
Copy link
Contributor

After offline conversation with Ben, Indices will be extracted in it's own class, representing an absolute path in a field.

@bkietz bkietz force-pushed the 7412-Dataset-Ensure-that-datas branch from cb5aa1b to 08013e8 Compare March 13, 2020 15:32
Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment documentation is A+ for an important class like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants