Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 42 additions & 18 deletions matlab/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
# specific language governing permissions and limitations
# under the License.

cmake_minimum_required(VERSION 3.2)
cmake_minimum_required(VERSION 3.20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit high, is it necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to use 3.20 because earlier versions of the FindMatlab.cmake module had a few bugs we ran into when trying to build our mex functions. I'm open to other approaches, but the hope was to build on the existing work of the cmake community to avoid reinventing the wheel. We also want to share improvements upstream where appropriate.


set(CMAKE_CXX_STANDARD 11)

set(MLARROW_VERSION "5.0.0-SNAPSHOT")
Expand All @@ -29,22 +30,45 @@ if(EXISTS "${CPP_CMAKE_MODULES}")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CPP_CMAKE_MODULES})
endif()

## Arrow is Required
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_SOURCE_DIR}/cmake_modules)

# Arrow is Required
find_package(Arrow REQUIRED)

## MATLAB is required to be installed to build MEX interfaces
set(MATLAB_ADDITIONAL_VERSIONS "R2018a=9.4")
find_package(Matlab REQUIRED MX_LIBRARY)

# Build featherread mex file based on the arrow shared library
matlab_add_mex(NAME featherreadmex
SRC src/featherreadmex.cc src/feather_reader.cc src/util/handle_status.cc
src/util/unicode_conversion.cc
LINK_TO ${ARROW_SHARED_LIB})
target_include_directories(featherreadmex PRIVATE ${ARROW_INCLUDE_DIR})

# Build featherwrite mex file based on the arrow shared library
matlab_add_mex(NAME featherwritemex
SRC src/featherwritemex.cc src/feather_writer.cc src/util/handle_status.cc
LINK_TO ${ARROW_SHARED_LIB})
target_include_directories(featherwritemex PRIVATE ${ARROW_INCLUDE_DIR})
# MATLAB is Required
find_package(Matlab REQUIRED)

# Construct the absolute path to featherread's source files
set(featherread_sources featherreadmex.cc feather_reader.cc util/handle_status.cc
util/unicode_conversion.cc)
list(TRANSFORM featherread_sources PREPEND ${CMAKE_SOURCE_DIR}/src/)

# Build featherreadmex MEX binary
matlab_add_mex(R2018a
NAME featherreadmex
SRC ${featherread_sources}
LINK_TO arrow_shared)

# Construct the absolute path to featherwrite's source files
set(featherwrite_sources featherwritemex.cc feather_writer.cc util/handle_status.cc
util/unicode_conversion.cc)
list(TRANSFORM featherwrite_sources PREPEND ${CMAKE_SOURCE_DIR}/src/)

# Build featherwritemex MEX binary
matlab_add_mex(R2018a
NAME featherwritemex
SRC ${featherwrite_sources}
LINK_TO arrow_shared)

# Ensure the MEX binaries are placed in the src directory on all platforms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering. Why do we need to place the binaries in the source directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to execute the MEX function, it has to be discoverable on the MATLAB search path. We also have MATLAB code (featherread.m and featherwrite.m) that we also need to add to the MATLAB search path. Since we need to add the source directory to the path anyway, I thought it makes sense to put the MEX files there as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
I understand. ( https://github.com/apache/arrow/blob/master/matlab/test/tfeathermex.m#L26 is the code that adds the source directory to the MATLAB search path.)

Generally, we should not change anything files in the source directory with out-of-source build. Can we also add the build directory to the MATLAB search path? Is it difficult?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add the build folder to the MATLAB search path. However, this will make harder to run our unit tests automatically because we would require the user to explicitly tell us where their build files are located every time.

Additionally, as the MATLAB interface grows, we will have many MEX files. In order to keep things organized. We see two approaches:

  1. We can encode the relationship of MEX files to MATLAB classes via the MEX file name itself. For example, arrow_array_new.mexw64.

or

  1. The other option is to encode the relationship between the two via the MATLAB packaging mechanism. For example, the folder structure +matlab/+array/+mex exposes a package called matlab.array.mex to MATLAB. If we choose this option, we can reuse common names, such as "new", and keep the names short.

Option 2 seems more scalable, but if you're experience tells us this will lead to issues, we can revisit adding the build folder to the path.

Best,
Sarah

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for describing this.
Could you try one of them (or both)? If we can find a better approach, we can choose it.
It can be done in this pull request or a follow up task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can definitely investigate both approaches and see which one is preferable. I created a jira task to look into this in a future pull request.

if(WIN32)
set_target_properties(featherreadmex PROPERTIES RUNTIME_OUTPUT_DIRECTORY
$<1:${CMAKE_SOURCE_DIR}/src>)
set_target_properties(featherwritemex PROPERTIES RUNTIME_OUTPUT_DIRECTORY
$<1:${CMAKE_SOURCE_DIR}/src>)
else()
set_target_properties(featherreadmex PROPERTIES LIBRARY_OUTPUT_DIRECTORY
$<1:${CMAKE_SOURCE_DIR}/src>)
set_target_properties(featherwritemex PROPERTIES LIBRARY_OUTPUT_DIRECTORY
$<1:${CMAKE_SOURCE_DIR}/src>)
endif()
5 changes: 2 additions & 3 deletions matlab/src/+mlarrow/+util/createMetadataStruct.m
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
function metadata = createMetadataStruct(description, numRows, numVariables)
function metadata = createMetadataStruct(numRows, numVariables)
% CREATEMETADATASTRUCT Helper function for creating Feather MEX metadata
% struct.

Expand All @@ -17,8 +17,7 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

metadata = struct('Description', description, ...
'NumRows', numRows, ...
metadata = struct('NumRows', numRows, ...
'NumVariables', numVariables);
end

3 changes: 1 addition & 2 deletions matlab/src/+mlarrow/+util/table2mlarrow.m
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
%
% Field Name Class Description
% ------------ ------- ----------------------------------------------
% Description char Table description (T.Properties.Description)
% NumRows double Number of table rows (height(T))
% NumVariables double Number of table variables (width(T))
%
Expand Down Expand Up @@ -51,7 +50,7 @@
variables = repmat(createVariableStruct('', [], [], ''), 1, width(t));

% Struct representing table-level metadata.
metadata = createMetadataStruct(t.Properties.Description, height(t), width(t));
metadata = createMetadataStruct(height(t), width(t));

% Iterate over each variable in the given table,
% extracting the underlying array data.
Expand Down
88 changes: 49 additions & 39 deletions matlab/src/feather_reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,21 @@
#include <algorithm>
#include <cmath>

#include "feather_reader.h"

#include <arrow/array/array_base.h>
#include <arrow/array/builder_base.h>
#include <arrow/array/builder_primitive.h>
#include <arrow/io/file.h>
#include <arrow/ipc/feather.h>
#include <arrow/result.h>
#include <arrow/status.h>
#include <arrow/table.h>
#include <arrow/type.h>
#include <arrow/util/bit-util.h>

#include <arrow/type_traits.h>
#include <arrow/util/bitmap_visit.h>
#include <mex.h>

#include "feather_reader.h"
#include "matlab_traits.h"
#include "util/handle_status.h"
#include "util/unicode_conversion.h"
Expand All @@ -52,11 +57,11 @@ mxArray* ReadNumericVariableData(const std::shared_ptr<Array>& column) {
mxArray* variable_data =
mxCreateNumericMatrix(column->length(), 1, matlab_class_id, mxREAL);

std::shared_ptr<ArrowArrayType> integer_array =
auto arrow_numeric_array =
std::static_pointer_cast<ArrowArrayType>(column);

// Get a raw pointer to the Arrow array data.
const MatlabType* source = integer_array->raw_values();
const MatlabType* source = arrow_numeric_array->raw_values();

// Get a mutable pointer to the MATLAB array data and std::copy the
// Arrow array data into it.
Expand Down Expand Up @@ -121,8 +126,7 @@ void BitUnpackBuffer(const std::shared_ptr<Buffer>& source, int64_t length,
// writes to a zero-initialized destination buffer.
// Implements a fast path for the fully-valid and fully-invalid cases.
// Returns true if the destination buffer was successfully populated.
bool TryBitUnpackFastPath(const std::shared_ptr<Array>& array,
mxLogical* destination) {
bool TryBitUnpackFastPath(const std::shared_ptr<Array>& array, mxLogical* destination) {
const int64_t null_count = array->null_count();
const int64_t length = array->length();

Expand Down Expand Up @@ -177,32 +181,24 @@ Status FeatherReader::Open(const std::string& filename,
*feather_reader = std::shared_ptr<FeatherReader>(new FeatherReader());

// Open file with given filename as a ReadableFile.
std::shared_ptr<io::ReadableFile> readable_file(nullptr);

RETURN_NOT_OK(io::ReadableFile::Open(filename, &readable_file));

// TableReader expects a RandomAccessFile.
std::shared_ptr<io::RandomAccessFile> random_access_file(readable_file);

ARROW_ASSIGN_OR_RAISE(auto readable_file, io::ReadableFile::Open(filename));

// Open the Feather file for reading with a TableReader.
RETURN_NOT_OK(ipc::feather::TableReader::Open(random_access_file,
&(*feather_reader)->table_reader_));

// Read the table metadata from the Feather file.
(*feather_reader)->num_rows_ = (*feather_reader)->table_reader_->num_rows();
(*feather_reader)->num_variables_ = (*feather_reader)->table_reader_->num_columns();
(*feather_reader)->description_ =
(*feather_reader)->table_reader_->HasDescription()
? (*feather_reader)->table_reader_->GetDescription()
: "";

if ((*feather_reader)->num_rows_ > internal::MAX_MATLAB_SIZE ||
(*feather_reader)->num_variables_ > internal::MAX_MATLAB_SIZE) {
mexErrMsgIdAndTxt("MATLAB:arrow:SizeTooLarge",
"The table size exceeds MATLAB limits: %u x %u",
(*feather_reader)->num_rows_, (*feather_reader)->num_variables_);
ARROW_ASSIGN_OR_RAISE(auto reader, ipc::feather::Reader::Open(readable_file));

// Set the internal reader_ object.
(*feather_reader)->reader_ = reader;

// Check the feather file version
auto version = reader->version();
if (version == ipc::feather::kFeatherV2Version) {
return Status::NotImplemented("Support for Feather V2 has not been implemented.");
} else if (version != ipc::feather::kFeatherV1Version) {
return Status::Invalid("Unknown Feather format version.");
}

// read the table metadata from the Feather file
(*feather_reader)->num_variables_ = reader->schema()->num_fields();
return Status::OK();
}

Expand All @@ -225,15 +221,11 @@ mxArray* FeatherReader::ReadMetadata() const {
mxSetField(metadata, 0, "NumVariables",
mxCreateDoubleScalar(static_cast<double>(num_variables_)));

// Set the description.
mxSetField(metadata, 0, "Description",
util::ConvertUTF8StringToUTF16CharMatrix(description_));

return metadata;
}

// Read the table variables from the Feather file as a mxArray*.
mxArray* FeatherReader::ReadVariables() const {
mxArray* FeatherReader::ReadVariables() {
const int32_t num_variable_fields = 4;
const char* fieldnames[] = {"Name", "Type", "Data", "Valid"};

Expand All @@ -242,16 +234,34 @@ mxArray* FeatherReader::ReadVariables() const {
mxArray* variables =
mxCreateStructMatrix(1, num_variables_, num_variable_fields, fieldnames);

// Read all the table variables in the Feather file into memory.
std::shared_ptr<arrow::Table> table;
auto status = reader_->Read(&table);
if (!status.ok()) {
mexErrMsgIdAndTxt("MATLAB:arrow:FeatherReader::FailedToReadTable",
"Failed to read arrow::Table from Feather file. Reason: %s",
status.message().c_str());
}

// Set the number of rows
num_rows_ = table->num_rows();

if (num_rows_ > internal::MAX_MATLAB_SIZE ||
num_variables_ > internal::MAX_MATLAB_SIZE) {
mexErrMsgIdAndTxt("MATLAB:arrow:SizeTooLarge",
"The table size exceeds MATLAB limits: %u x %u", num_rows_,
num_variables_);
}

auto column_names = table->ColumnNames();

for (int64_t i = 0; i < num_variables_; ++i) {
std::shared_ptr<ChunkedArray> column;
util::HandleStatus(table_reader_->GetColumn(i, &column));
auto column = table->column(i);
if (column->num_chunks() != 1) {
mexErrMsgIdAndTxt("MATLAB:arrow:FeatherReader::ReadVariables",
"Chunked columns not yet supported");
}
std::shared_ptr<Array> chunk = column->chunk(0);
const std::string column_name = table_reader_->GetColumnName(i);
const std::string column_name = column_names[i];

// set the struct fields data
mxSetField(variables, i, "Name", internal::ReadVariableName(column_name));
Expand Down
6 changes: 2 additions & 4 deletions matlab/src/feather_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
#include <arrow/ipc/feather.h>
#include <arrow/status.h>
#include <arrow/type.h>

#include <matrix.h>

namespace arrow {
Expand Down Expand Up @@ -56,7 +55,7 @@ class FeatherReader {
/// Clients are responsible for freeing the returned mxArray memory
/// when it is no longer needed, or passing it to MATLAB to be managed.
/// \return variables mxArray* struct array containing table variable data
mxArray* ReadVariables() const;
mxArray* ReadVariables();

/// \brief Initialize a FeatherReader object from a given Feather file.
/// \param[in] filename path to a Feather file
Expand All @@ -66,12 +65,11 @@ class FeatherReader {

private:
FeatherReader() = default;
std::unique_ptr<ipc::feather::TableReader> table_reader_;
std::shared_ptr<ipc::feather::Reader> reader_;
int64_t num_rows_;
int64_t num_variables_;
std::string description_;
};

} // namespace matlab
} // namespace arrow

Loading