-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13742: [C++][Go] Expose Dataset API to Go via CGO #10995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Some high-level comments:
In the end, the C API might look like this: #include "arrow/c/abi.h"
struct ArrowDatasetFactory;
enum ArrowDatasetFormat {
ARROW_DATASET_PARQUET, ARROW_DATASET_CSV, ARROW_DATASET_IPC
};
int ArrowDatasetFactoryFromUri(const char* uri,
struct ArrowDatasetFormat format,
struct ArrowDatasetFactory** out);
int ArrowDatasetFactoryInspect(struct ArrowDatasetFactory* factory,
int num_fragments_to_inspect,
struct ArrowSchema* out);
int ArrowDatasetFactoryCreateDataset(struct ArrowDatasetFactory* factory,
struct ArrowSchema* optional_schema,
struct ArrowArrayStream* out);
const char* ArrowDatasetFactoryGetLastError();
void ArrowDatasetFactoryDestroy(struct ArrowDatasetFactory*);cc @bkietz for advice |
| #define DISABLE_INSPECT_FRAGMENTS 0 | ||
|
|
||
| // Analagous to arrow::dataset::Scanner | ||
| struct Scanner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @pitrou - the binding should be as minimal as possible since the classes of the dataset and compute APIs are experimental and changed frequently. In particular, we've discussed removing Scanner altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair assessment. In particular since doing this, i've decided to try a different approach and see how it goes, see #11037 where I've implemented consumer functions for the C Data interface for go. I'm looking into trying to just do a minimal binding to the Compute API directly instead and the building on top of that. If that ends up being viable, I'll close out this PR in favor of that approach.
|
Closing this in favor of #11037 and continuing that approach which ensures I'll be able to keep the bindings as minimal as possible and only bind directly to the compute api as minimally as I can rather than the entire dataset api. |
This supercedes some portions of #10995 as I think it's better to centralize the cdata interface interactions in a specific module that is part of the arrow module rather than it being a part of the dataset module. So this should get reviewed and merged first. CC @pitrou @emkornfield Closes #11037 from zeroshade/go-cdata Lead-authored-by: Matthew Topol <mtopol@factset.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Matt Topol <mtopol@factset.com> Signed-off-by: Matthew Topol <mtopol@factset.com>
This supercedes some portions of apache#10995 as I think it's better to centralize the cdata interface interactions in a specific module that is part of the arrow module rather than it being a part of the dataset module. So this should get reviewed and merged first. CC @pitrou @emkornfield Closes apache#11037 from zeroshade/go-cdata Lead-authored-by: Matthew Topol <mtopol@factset.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Matt Topol <mtopol@factset.com> Signed-off-by: Matthew Topol <mtopol@factset.com>
This adds a new workflow to build the Dataset Go Module with CGO which first builds the C++ library before building the Go Dataset library and running the tests. This only exposes a basic ability to create a FilesystemDataset and scan it to get record batches. It can then be built upon more in order to add more functionality. I drew inspiration from the existing implementation of the C Data interface and the JNI Dataset library in order to build this but I'm looking for any feedback at all.
@pitrou This is what I was talking about on the dev mailing list the other day.