-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13185: [MATLAB] Create a single MEX gateway function which delegates to specific C++ functions #12004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
featherreadmex or featherwritemex
added to the build folder - not the source folder.
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
feather_functions.cc and feather_functions.h
to featherread in feather_functions.h
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Store the Configuration Type value for multi-configuration generators in a txt file for use during testing Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
…AB test folder. Add directory that contains libmx.dll and libmex.dll to the %PATH% for running CheckNumArgsTestTarget. Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
…shared as a target_link_option for arrow_matlab, as the rpath is not automatically added when linking imported libraries. Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
…release version and add the link libraries to arrow_ep on Windows as BUILD_BYPRODUCTS. Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
|
I apologize for the size of this pull request. We found it tricky to partition our changes into separate pull requests as they involved modifying the build system and refactoring the structure of the source files. However, in the future, our goal is to submit smaller, more incremental pull requests to reflect more isolated changes. |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add GitHub Actions jobs on macOS and Windows to .github/workflows/matlab.yml?
Do you want to work on ARROW-13564 to apply "Co-authored-by" in pull request commits before we merge this?
matlab/CMakeLists.txt
Outdated
| TARGET GTest::gtest_main | ||
| PROPERTY IMPORTED_LOCATION) | ||
|
|
||
| add_custom_target(copy_gtest_to_tests_dir ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't try this but is this target always run even when GTest files aren't rebuilt?
Can we use BYPRODUCTS and DEPENDS to prevent needless execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code is not necessary anymore after adding the path of the shared libraries to the test environment path.
matlab/CMakeLists.txt
Outdated
| set(MATLAB_DLL_DEPENDENCIES_DIR "${Matlab_ROOT_DIR}/bin/win64") | ||
| set_tests_properties(CheckNumArgsTestTarget | ||
| PROPERTIES ENVIRONMENT | ||
| "PATH=${MATLAB_DLL_DEPENDENCIES_DIR}\;$ENV{PATH}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using %PATH% for GTest too instead of copying GTest files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion, it'll be more efficient to skip copying the GTest dlls. I will make this change.
matlab/CMakeLists.txt
Outdated
| POST_BUILD | ||
| COMMAND "${CMAKE_COMMAND}" -E echo $<CONFIG> > | ||
| ${CMAKE_BINARY_DIR}/is_multi_config.txt) | ||
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we deffer this to a follow-up pull request to reduce changes in this pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me! I have removed the code to create is_multi_config.txt. And I will add it in a future pull request for ARROW-15182.
…nclude in a separate pull request
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Could you update the description of this pull request for the latest code?
We use the description on merge.
matlab/CMakeLists.txt
Outdated
| COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ARROW_SHARED_LIB} | ||
| ${MATLAB_BUILD_OUTPUT_DIR} | ||
| COMMENT "Start copying arrow.dll from ${ARROW_SHARED_LIB} to ${MATLAB_BUILD_OUTPUT_DIR}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use %PATH% for this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kou, I've updated the description to reflect the changes that I made in response to your feedback.
arrow.dll is not only used by the tests. On Windows, it needs to be copied to the location of arrow_matlab.dll and the MEX function, mexfcn.mexw64, so that they can access the Arrow APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In normal use (not test use), users run cmake --build ... --target install before they use this module, right?
If it's correct, arrow.dll should be located in the same directory as arrow_matlab.dll and mexfcn.mexw64 by cmake --build ... --target install.
If your explanation is correct (we need this for not test use too), we should use install() https://cmake.org/cmake/help/latest/command/install.html instead of add_custom_target().
I think that we can use one of them:
install(IMPORTED_RUNTIME_ARTIFACTS): https://cmake.org/cmake/help/latest/command/install.html#imported-runtime-artifacts- This requires CMake 3.21.
- install(FILES)`: https://cmake.org/cmake/help/latest/command/install.html#files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this feedback, @kou! I have thought more about how we are structuring the source files and libraries for deployment.
As you mentioned, it is advantageous to use install() to move arrow.dll to the installed location for normal use.
With the previous behavior of copying to the build tree, the MATLAB tests were able to find mexfcn and its dependencies by adding the path to the build tree to the MATLAB Search Path. However, enabling the install step allows a user to install the interface files (including source files and mex functions) to any location that the user specifies. Therefore, we enabled options to add the install directory, specified by CMAKE_INSTALL_PREFIX, to the MATLAB Search Path, as part of the install step. I have updated the pull request description with more details to reflect the changes we have made.
For running the C++ tests on Windows and macOS, the paths to arrow.dll/libarrow.dylib are added to the ctest environment to eliminate the need to copy at all.
|
Simply out of curiousity, |
|
@edponce, Depending on the needs of a particular C++ function, developers can pass options in the way that they deem appropriate, in order to get that information to the C++ layer. |
…he directory to the MATLAB Search Path. Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
the MATLAB Search Path. In Github Actions, the installation directory will be made available to the MATLAB tests by setting the MATLABPATH environment variable. Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
matlab/CMakeLists.txt
Outdated
| set(ARROW_LIBRARY_DIR "${ARROW_PREFIX}/lib") | ||
| set(ARROW_SHARED_LIB | ||
| "${ARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow${ARROW_LIBRARY_SUFFIX}") | ||
| set(ARROW_LINK_LIB_FILENAME "${CMAKE_IMPORT_LIBRARY_PREFIX}arrow${ARROW_LINK_SUFFIX}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can always use CMAKE_IMPORT_LIBRARY_SUFFIX here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than you for this suggestion! I will make this change for arrow_shared and the GTest targets.
matlab/CMakeLists.txt
Outdated
| set_target_properties(${ARROW_LIBRARY_TARGET} PROPERTIES IMPORTED_IMPLIB | ||
| ${ARROW_LINK_LIB}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we always set IMPORTED_IMPLIB here?
I expect it's only used on Windows. (It's ignored on non Windows environments such as Linux and macOS.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's true that the IMPORTED_IMPLIB variable is only used on Windows. Because imported libraries are not used on macOS and Linux, CMAKE_IMPORT_LIBRARY_SUFFIX and CMAKE_IMPORT_LIBRARY_PREFIX are not set on these platforms. They are used to create the file path contained by ARROW_LINK_LIB. Therefore, the value of ARROW_LINK_LIB is not set for macOS and Linux.
Because of this, I would prefer to only set IMPORTED_IMPLIB on Windows.
matlab/CMakeLists.txt
Outdated
| set(ARROW_GTEST_LIBRARY_DIR "${ARROW_GTEST_PREFIX}/lib") | ||
| set(ARROW_GTEST_LINK_LIB_DIR "${ARROW_GTEST_PREFIX}/lib") | ||
| set(ARROW_GTEST_LINK_LIB | ||
| "${ARROW_GTEST_LINK_LIB_DIR}/${CMAKE_IMPORT_LIBRARY_PREFIX}gtest${ARROW_GTEST_LINK_SUFFIX}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can always use CMAKE_IMPORT_LIBRARY_SUFFIX here.
…ace source files. Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
…, fix regex for installing on macOS
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Benchmark runs are scheduled for baseline = dc2f678 and contender = 9c5e4b3. 9c5e4b3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
Thank you for code reviewing this change and accepting the pull request, @kou! I am able to see the commit that merged this change into the main branch, however, I see a message on this conversation thread named "Closed with unmerged commits". Do you know what causes this message and are there any action items for me to resolve it? |
|
@lafiona Sorry for confusing you by our merge method. You don't need extra action for this pull request. We use our merge script https://github.com/apache/arrow/blob/master/dev/merge_arrow_pr.py to merge a pull request. It squashes commits in the pull request, pushes the squashed commit to master and updates the correspondent JIRA issue. We don't use GitHub's "merge button". So "Closed with unmerged commits" is showed on GitHub UI. But you don't need extra action because we have changes in this pull request on master. |
|
@kou, no worries, thank you for explaining why this occurs! Good to know that it is expected behavior. Thanks again for your time reviewing this change! |
…o reflect latest CMake build system changes # Overview This pull request: 1. Updates the high level documentation for the MATLAB interface in `arrow/matlab/README.md` to reflect the latest CMake build system changes that were merged upstream as part of [#12004](#12004). # Implementation 1. Restructured the `README.md` to focus on the most common build and test workflows for the MATLAB interface. # Testing N/A # Future Directions 1. To be consistent and in-model with the other language bindings (e.g. Python, R, C++, etc.) - we plan to author in-depth technical documentation over time under [`arrow/docs/source/matlab`](https://github.com/apache/arrow/tree/master/docs/source) using the [reStructedText (.rst) format](https://docutils.sourceforge.io/rst.html). The thought behind this `README.md` documentation is to provide a convenient "landing page" for the MATLAB interface which allows end users to quickly get a sense of where the project is at and to get started immediately with building and running the interface. However, every user and developer will have different use cases and needs for working with the MATLAB interface, so we will also want include in-depth documentation for major topics like building, installing, testing, contributing, usage, etc. As the MATLAB interface continues to develop, we will do our best to keep the documentation in sync with the source code. # Notes 1. Thank you to @lafiona for her help writing this updated documentation! Closes #12389 from kevingurney/ARROW-13204 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Kevin Gurney <5904145+kevingurney@users.noreply.github.com> Co-authored-by: Fiona La <fionala@mathworks.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
We use local "git merge" to merge a pull request in dev/merge_arrow_pr.py. If we use "git merge" to merge a pull request, GitHub's Web UI shows "Closed" mark not "Merged" mark in a pull request page. This sometimes confuses new contributors. "Why was my pull request closed without merging?" See apache#12004 (comment) for example. If we use GitHub API https://docs.github.com/en/rest/pulls/pulls#merge-a-pull-request to merge a pull request, GitHub's Web UI shows "Merged" mark not "Closed" mark. See apache#13180 for example. I used GitHub API to merge the pull request. And we don't need to create a local branch on local repository to merge a pull request. But we must specify ARROW_GITHUB_API_TOKEN to run dev/merge_arrow_pr.py.
We use local "git merge" to merge a pull request in dev/merge_arrow_pr.py. If we use "git merge" to merge a pull request, GitHub's Web UI shows "Closed" mark not "Merged" mark in a pull request page. This sometimes confuses new contributors. "Why was my pull request closed without merging?" See #12004 (comment) for example. If we use GitHub API https://docs.github.com/en/rest/pulls/pulls#merge-a-pull-request to merge a pull request, GitHub's Web UI shows "Merged" mark not "Closed" mark. See #13180 for example. I used GitHub API to merge the pull request. And we don't need to create a local branch on local repository to merge a pull request. But we must specify ARROW_GITHUB_API_TOKEN to run dev/merge_arrow_pr.py. Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Overview
This pull request implements a more scalable and sustainable approach to organizing the C++ functionality that needs to be exposed to MATLAB. It adds a new singular MEX gateway function,
mexfcn('<cpp-function-name>', <cpp-function-argument-1>, ..., <cpp-function-argument-N>), which delegates to specific C++ functions. To make use ofmexfcn, the directory containingmexfcn.<mex-extension>must be added to the MATLAB path.Advantages to this approach:
Implementation
mexfcn, defined inmatlab/src/cpp/arrow/matlab/mex/mex_util.h, dispatches to individual C++ implementation files.For example, to invoke featherread functionality from MATLAB, that is implemented in C++:
matlab/src/cpp/arrow/matlab/mex/mex_functions.h.3.1. Split source code to matlab/src/matlab and matlab/src/cpp
3.2 Make packages, namespaces, and directories consistent in terms of naming and hierarchy for simplifying navigation and header inclusion.
3.3 Renamed the MATLAB package name
mlarrowtoarrowas themlis superfluous.matlab/CMakeLists.txt:4.1. Build shared library,
arrow_matlab, that contains C++ functionality for the interface.4.2. macOS: explicitly add the path of
arrowto therpathofarrow_matlab, as paths of libraries output by imported targets are not automatically included.4.3. Windows:
arrow.dllto the directory where the MEX function lives.ENVIRONMENTwhen building tests.installtarget:5.1. Utilize
CMAKE_INSTALL_PREFIXto allow users to customize the install location. The platform-specific default install locations will be used if the user does not specify a custom value.5.2. Once installed, the interface's source files and libraries are relocatable, on all platforms.
5.3. As part of the install step, add the path to the install directory to the MATLAB Search Path by:
addpathandsavepathto modify thepathdef.mfile that MATLAB uses on startup. This option is on by default. However, it can only be used if CMake has the appropriate permissions to modifypathdef.m. This option is toggled on and off by the flag:MATLAB_ADD_INSTALL_DIR_TO_SEARCH_PATH.addpathcommand to thestartup.mfile located at theuserpath. This option can be used if a user does not have the permissions to modify thepathdef.mfile. This option is off, by default, and is toggled on and off by the flag:MATLAB_ADD_INSTALL_DIR_TO_STARTUP_FILE.MATLABPATHenvironment variable. The paths listed inMATLABPATHare added to the MATLAB Search Path on start up.userpathset, nor do we have edit permissions forpathdef.m. Therefore, we use the third option and set theMATLABPATHenvironment variable inmatlab.yml.Testing
Qualified
CMakeLists.txtchanges by building and running all tests:ARROW_HOME, use providedGTEST_ROOT, use bothARROW_HOMEandGTEST_ROOT.Future Directions
ucrtbase.dllversusucrtbased.dll) when building with Ninja on Windows.mexfcn. Currently, only character vectors are supported.mexfcnto use MATLAB Data Arrays (MDAs) and C++ Mex.matlab_add_unit_test.Notes