Improvements to SplitMatrix by MarcAntoineSchmidtQC · Pull Request #91 · Quantco/tabmat

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) · 2021-07-22T00:31:52Z

Allow SplitMatrix to be constructed from another SplitMatrix.
Allow inputs of SplitMatrix to be 1-d
Implement __getitem__ for column subset
Also had to implement column subsetting for CategoricalMatrix
__repr__ uses the __repr__ method of components instead of str()

ToDo:

FIX BUG WITH _split_col_subsets (first confirm that it's a bug)
Add testing for new features

Checklist

Added a CHANGELOG.rst entry

- Allow SplitMatrix to be constructed from another SplitMatrix. - Allow inputs of SplitMatrix to be 1-d - Implement __getitem__ for column subset - Also had to implement column subsetting for CategoricalMatrix - __repr__ uses the __repr__ function of components instead of str()

Luca Bittarello (lbittarello)

🎉

Luca Bittarello (lbittarello) · 2021-07-22T21:02:56Z

+                colmap[idx] = [i, j]
+        return colmap
+
+    def _split_col_subsets_unordered(self, cols):


This function seems occasionally to return empty lists when one indexes columns with a list, which causes is_sorted to throw an error later.

MRE:

import pandas as pd import quantcore.matrix as mx df = pd.DataFrame({"u": ["a", "b"], "v": ["a", "b"]}) X = mx.from_pandas(df, cat_threshold=False, object_as_cat=True) X[:, [0, 1]]

I can't replicate this on macOS. I assume you tried this on Windows. Is that correct? Can you give me some info on your setup?

Also, you said "occasionally". Does it always throw an error when you try your MRE or not?

Does it always throw an error when you try your MRE or not?

Yes. But it doesn't throw an error if we reduce the number of rows from two to one, for example.

I assume you tried this on Windows. Is that correct?

Yes. I'm not embarrassed.

Can you give me some info on your setup?

python : 3.8.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD byteorder : little pandas : 1.3.0 numpy : 1.20.3

Is this helpful? Do you need something else?

Should we perhaps add unit tests for indexing? The Windows CI could come in handy.

Yes, unit testing will be very valuable. I'll temporarily add the Windows CI to this PR for all the pushes. And yes, this is helpful.

👍 on indexing unit tests.

Ben Thompson (tbenthompson)

This is huge!! Thanks Marc for making a really big improvement to SplitMatrix. Let me know if there are any pieces that you would like to hand off. It feels like you started what seemed like a small project and it's creeped outwards a couple times now. So, don't feel like you're committed and stuck finishing this if you have other important stuff going on.

Ben Thompson (tbenthompson) · 2021-07-26T04:36:31Z

+                return CategoricalMatrix(self.cat[row])
+            else:
+                # return a SparseMatrix if we subset columns
+                return SparseMatrix(self.tocsr()[row, col], dtype=self.dtype)


This is quite inefficient because we construct the full sparse matrix with self.tocsr() before subsetting it. I'm fine leaving it like this for now, but I think it'd be good to at least leave a TODO comment or add an issue mentioning this performance bug. Fixing this for the single element case is quite easy. For the [:, cols] case, I guess we need to construct a sparse matrix element by element and I'm guessing that it'll be easiest to do that in Cython.

Ben Thompson (tbenthompson) · 2021-07-26T04:41:06Z

+                colmap[idx] = [i, j]
+        return colmap
+
+    def _split_col_subsets_unordered(self, cols):


👍 on indexing unit tests.

…d-usability

added docstring Co-authored-by: Ben Thompson <t.ben.thompson@gmail.com>

This is a big commit with many changes: - partial support of integer indexing - removed advanced indexing for densematrix and sparsematrix - ensure that indexing of splitmatrix generates basematrix type - partial fix of standardizedmatrix indexing - added indexing tests (currently fails)

Waclaw Kusnierczyk (waclawkusnierczykqc) · 2021-08-26T12:55:21Z

Quick thought: it looks like several loosely related updates within one PR.
Perhaps worth considering pushing them in a few more focused PRs?

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) · 2021-08-26T20:14:20Z

Quick thought: it looks like several loosely related updates within one PR.
Perhaps worth considering pushing them in a few more focused PRs?

Great idea. There are finished features that I would like to merge soon so I'll create more focused PRs with them.

Waclaw Kusnierczyk (waclawkusnierczykqc) · 2021-08-27T07:54:16Z

Quick thought: it looks like several loosely related updates within one PR.
Perhaps worth considering pushing them in a few more focused PRs?

Great idea. There are finished features that I would like to merge soon so I'll create more focused PRs with them.

Yes, being able to push some of the changes independently of others is one advantage.
Among the others, there is clear focus, easier reviewing, and easier reverting if need be.

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) · 2021-08-28T18:01:03Z

This PR has been separated into chunks. See PR #109, PR #110, and PR #111.

What remains is the big mess that is column indexing with splitmatrix.

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) · 2021-10-04T17:31:17Z

closing. Most changes were implemented in other PRs and we will clearly take another approach to dealing with this.

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) added 2 commits July 21, 2021 20:28

removed test checking not 1d

2c75a4f

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) requested a review from Ben Thompson (tbenthompson) as a code owner July 22, 2021 00:43

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) mentioned this pull request Jul 22, 2021

_split_col_subsets ignores columns when non-monotonic #92

Open

column mapping and unordered split_col_subsets

9f44f58

Luca Bittarello (lbittarello) reviewed Jul 22, 2021

View reviewed changes

testing split matrix creation

831ef6d

Ben Thompson (tbenthompson) reviewed Jul 26, 2021

View reviewed changes

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) and others added 7 commits July 26, 2021 12:09

Merge remote-tracking branch 'origin/master' into SplitMatrix-improve…

85f6dc6

…d-usability

don't modify in place + windows CI

dc7b2a9

add Luca's test (temporary)

f45a69c

filter out empty matrices

8ddb627

Update src/quantcore/matrix/split_matrix.py

42b78eb

added docstring Co-authored-by: Ben Thompson <t.ben.thompson@gmail.com>

docstring formatting

cc7afec

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) closed this Oct 4, 2021

Conversation

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) commented Jul 22, 2021

Uh oh!

Luca Bittarello (lbittarello) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Luca Bittarello (lbittarello) Jul 22, 2021

Choose a reason for hiding this comment

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) Jul 22, 2021

Choose a reason for hiding this comment

Uh oh!

Luca Bittarello (lbittarello) Jul 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Luca Bittarello (lbittarello) Jul 23, 2021

Choose a reason for hiding this comment

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) Jul 23, 2021

Choose a reason for hiding this comment

Uh oh!

Ben Thompson (tbenthompson) Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

Ben Thompson (tbenthompson) left a comment

Choose a reason for hiding this comment

Uh oh!

Ben Thompson (tbenthompson) Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ben Thompson (tbenthompson) Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

Waclaw Kusnierczyk (waclawkusnierczykqc) commented Aug 26, 2021

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) commented Aug 26, 2021

Uh oh!

Waclaw Kusnierczyk (waclawkusnierczykqc) commented Aug 27, 2021

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) commented Aug 28, 2021

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) commented Oct 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Luca Bittarello (lbittarello) Jul 23, 2021 •

edited

Loading