Use narwhals to support Polars, cuDF, Modin, etc.#388
Use narwhals to support Polars, cuDF, Modin, etc.#388Marc-Antoine Schmidt (MarcAntoineSchmidtQC) merged 40 commits intomainfrom
Conversation
| - conda: https://conda.anaconda.org/conda-forge/linux-64/binutils_linux-64-2.40-hb3c18ed_1.conda | ||
| - conda: https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-h4bc722e_7.conda | ||
| - conda: https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.33.1-heb4867d_0.conda | ||
| - conda: https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.32.3-h4bc722e_0.conda |
There was a problem hiding this comment.
not sure why it's asking to downgrade here
|
I think this will raise an exception if neither pandas nor polars is installed (which is now a posibility): tabmat/src/tabmat/categorical_matrix.py Lines 249 to 250 in e288b47 We could replace it with some numpy-based solution, which is probably not very efficient, but that's probably fine for now. E.g., else:
categories, indices = np.unique(cat_vec.to_numpy(), return_inverse=True) |
|
Same here, but the solution might be less straightforward for the non-pandas-or-polars case: tabmat/src/tabmat/categorical_matrix.py Line 391 in e288b47 Maybe we could say that the property is deprecated and is only there for backwards compatibility, so we are not implementing it for non-pandas (or polars) input? |
|
I don't think we need this: anymore tabmat/src/tabmat/constructor.py Lines 21 to 24 in e288b47 |
There was a problem hiding this comment.
Looks great! All that remains is fixing these couple of issues in categorical_matrix.py.
It might also be nice to have a test for from_df which is not polars or pandas. Pyarrow is test dependency because of polars already; would it be okay if I added a test for pyarrow dataframes?
|
I made the following updates to fix the issues above: Deconstructing the
The
|
|
Marc-Antoine Schmidt (@MarcAntoineSchmidtQC), if we don't want to add new functionality to the deprecated |
|
Perhaps one final question: all three matrix types have an unpack method that returns the container storing the data. For |
Marco Edward Gorelli (MarcoGorelli)
left a comment
There was a problem hiding this comment.
wow, amazing, seeing how far you got with Narwhals independently here really warms my heart 🤗
If you're happy with this, may I suggest to use import narwhals.stable.v1 as nw instead of import narwhals as nw? This will future-proof you against breaking changes (if we need to make them), see https://narwhals-dev.github.io/narwhals/backcompat/ for an explanation
Well done here, and please always feel free to ping us from Narwhals if you have any questions / requests / comments 🙏
|
Thanks so much for the review and the fantastic library Marco Edward Gorelli (@MarcoGorelli)! 🙏 Using the v1 API is a great idea. The only thing I ran into is that there does not seem to be a |
|
thanks! yup, In [10]: import narwhals.stable.v1 as nw
In [11]: nw.from_native(pl.Series([1,2,3]), series_only=True).dtype.is_numeric()
Out[11]: True
In [12]: nw.from_native(pl.Series(['foo', 'bar']), series_only=True).dtype.is_numeric()
Out[12]: False |
Checklist
CHANGELOG.rstentry