Skip to content

Conversation

@mrkn
Copy link
Member

@mrkn mrkn commented Dec 2, 2019

I'd like to add CSC sparse matrix support in format and C++ API.

This pull-request changes both format and C++ API.

The changes in format are below:

  • The new field compressedAxis is added to SparseMatrixIndexCSR table; this field used to distinguish CSR and CSC
  • SparseMatrixIndexCSR is renamed to SparseMatrixIndexCSX

The changes in C++ API are below:

  • SparseCSCIndex class is added as a subclass of internal::SparseCSXIndex
  • internal::SparseCSXIndex class is added as a superclass of SparseCSRIndex and SparseCSCIndex classes
  • SparseCSCMatrix class is added as an alias to SparseTensorImpl<SparseCSCIndex> class
  • ReadSparseTensor supports reading SparseCSCMatrix class

@github-actions
Copy link

github-actions bot commented Dec 2, 2019

@mrkn mrkn force-pushed the ARROW-4225 branch 10 times, most recently from f113a35 to a148df6 Compare December 7, 2019 15:50
@mrkn mrkn marked this pull request as ready for review December 8, 2019 14:40
@mrkn mrkn requested review from pitrou and wesm December 9, 2019 01:54
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high level / public API details look OK to me at a glance. I don't have time to scrutinize the implementation -- not sure if someone else wants to take a closer look

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I like the CSR and CSC index type unification.
Some minor comments. Someone with more experience should also take a look.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mrkn. Here are some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, make this kTypeName.

Copy link
Member Author

@mrkn mrkn Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Should I also rename format_id to kFormatId?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Our current convention would be to keep format_id. I'm not sure about kTypeName vs. type_name, TBH. But it definitely shouldn't be all-uppercase TYPE_NAME.
@wesm thoughts?

@mrkn mrkn force-pushed the ARROW-4225 branch 3 times, most recently from 2397c1a to 3dc96fb Compare December 13, 2019 07:51
@mrkn
Copy link
Member Author

mrkn commented Dec 13, 2019

The failure on CI isn't due to this PR.

++num_csr;
break;
case SparseTensorFormat::CSC:
// TODO(mrkn): support csc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a JIRA for this, if not already done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to do it.
Filed https://issues.apache.org/jira/browse/ARROW-7419.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "TODO for ndim <= 1" comment needs a decision, otherwise LGTM.

@pitrou pitrou closed this in 4c63bef Dec 18, 2019
@pitrou
Copy link
Member

pitrou commented Dec 18, 2019

Thank you @mrkn!

@rok
Copy link
Member

rok commented Dec 18, 2019

Thanks @mrkn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants