-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7412: [C++][Dataset] Provide FieldRef to disambiguate field references #6545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
19ac3c1 to
43d3c51
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm mostly concerned about the variant injection into widely-used headers, and the lack of tests for small_vector.
I haven't looked into the FieldRef implementation closely.
cpp/src/arrow/type.h
Outdated
| #include "arrow/type_fwd.h" // IWYU pragma: export | ||
| #include "arrow/util/checked_cast.h" | ||
| #include "arrow/util/macros.h" | ||
| #include "arrow/util/small_vector.h" | ||
| #include "arrow/util/variant.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's a pity to start pulling the variant header in such a widely-used header...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2b1fd40 to
cb5aa1b
Compare
|
@fsaintjacques you need this for your logical plan? Can you please review/merge? |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. Two remaining questions below.
cpp/src/arrow/dataset/projector.cc
Outdated
| column_indices_[i] = kNoMatch; | ||
| } else { | ||
| RETURN_NOT_OK(ref.CheckNonMultiple(matches, *from_)); | ||
| int matching_index = matches[0][0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ignoring other potential matches here? FindAll could return multiple matches if there are columns with the same name, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 103, I assert that there are not multiple matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, looks like I forgot my glasses somewhere... where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CheckNonMultiple returns an error status if there are multiple matches
|
After offline conversation with Ben, |
cb5aa1b to
08013e8
Compare
fsaintjacques
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment documentation is A+ for an important class like this.
FieldRefis a new utility class which represents a reference to a field. It is intended to replace parameters likeint field_indexandconst std::string& name; it can be implicitly constructed from either a field index or a name.Nested fields can be referenced as well:
FieldRefs provide a number of accessors for drilling down to potentially nested children. They are overloaded for convenience to support Schema (returns a field), DataType (returns a child field), Field (returns a child field of this field's type) Array (returns a child array), RecordBatch (returns a column), ChunkedArray (returns a ChunkedArray where each chunk is a child array of the corresponding original chunk) and Table (returns a column).