-
-
Notifications
You must be signed in to change notification settings - Fork 748
Check for dtype support in p2p #7425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
fjetter
merged 7 commits into
dask:main
from
hendrikmakait:fail-p2p-on-unsupported-tyes
Feb 22, 2023
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
819b74b
Check fo supported dtypes and raise
hendrikmakait 3e0d012
Remove commented-out code
hendrikmakait 0fc6aaa
Merge branch 'main' into fail-p2p-on-unsupported-tyes
hendrikmakait 00e174b
Simplify
hendrikmakait 23e04dd
Move check
hendrikmakait a68e59e
Fix test
hendrikmakait 50238de
Merge branch 'main' into fail-p2p-on-unsupported-tyes
hendrikmakait File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me likes the explicitness of these checks, but I wonder if they've got full coverage. For example, what happens if someone if using their own custom extension array dtype? My guess is this check wouldn't raise an error and shuffling would fail (though I've not confirmed this).
Maybe a there's a simple
try/exceptwe can do here instead to check that shuffling doesn't work? Would not being able to roundtrip through the shuffle-specific serialization be a sufficient check?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You bring up a good point. Let me check what I need to do to trigger reliably trigger an error that we could utilize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if it's not straightforward to come up with such a check, we can totally include what you have here and follow-up as needed. The current set of changes are already an improvement over the current
mainbranchThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just try to roundtrip
meta? That would not be such a nice message but it would fail earlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this before going out on holidays and it detected some issues but not all. IIRC, the main problem was
objectrequiring actual data. We could combine explicit checks with the roundtrip though. That would give us nice error messages for some stuff and a fallback for anything we haven't explicitly checked.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also
df._meta_nonemptyfor cases like this. Might not catch super funky stuff but that's ok from my POV. After all, the shuffle will fail either way it's just about when it fails, isn't it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at
df._meta_nonempty. It's both about failing early and providing useful feedback to the user.