-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
TYP: hashing #39949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: hashing #39949
Conversation
jbrockmendel
commented
Feb 21, 2021
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
| return _hash_ndarray(vals, encoding, hash_key, categorize) | ||
|
|
||
|
|
||
| def _hash_ndarray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to separate this out, as i couldn't convince mypy to recognize vals = cast(np.ndarray, vals) or assert isinstance(vals, np.ndarray)
WillAyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @simonjayhawkins
|
lgtm. @simonjayhawkins if you have a chance. |
|
if theres no interest here we can close it to clear the queue |
its fine, this is worth merging. |
| FormattersType, | ||
| FrameOrSeriesUnion, | ||
| Frequency, | ||
| IndexKeyFunc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change axis? DataFrame._get_axis accepts Axis.
In general, a function input parameter should be annotated with the widest possible type supported by the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its only called from count, and count calls _get_axis_number just before doing so
pandas/core/util/hashing.py
Outdated
|
|
||
| def hash_tuples(vals, encoding="utf8", hash_key: str = _default_hash_key): | ||
| def hash_tuples( | ||
| vals: Union[MultiIndex, List[Tuple]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List -> Iterable
As a specific application of the “use the widest type possible” rule, libraries should generally use immutable forms of container types instead of mutable forms (unless the function needs to modify the container). Use Sequence rather than List, Mapping rather than Dict, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the rule for Sequence vs Iterable?
In this case, we only get here with MultiIndex and List[Tuple]. I get the wider-is-better for public APIs, but for purely internal code strictness seems beneficial. is that distinction not relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK mypy works by checking each function is isolation. i.e. it checks that all the code within the function is compatible with the function annotation. i.e. checks the types in function calls within the body with the type parameters of the called functions.
so doing the reverse of this, i.e. annotating with the actual types passed seems like extra work imo. (and could be simply achieved using MonkeyType)
Iterable because c_is_list_like accepts an iterable as list-like
Using List is bad practice. function code could modify the contents of the List. If the parameter is typed as an immutable container, mypy won't allow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated
pandas/core/util/hashing.py
Outdated
|
|
||
| def hash_tuples(vals, encoding="utf8", hash_key: str = _default_hash_key): | ||
| def hash_tuples( | ||
| vals: Union[MultiIndex, List[Tuple]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type parameters for Tuple.
pandas/core/util/hashing.py
Outdated
| Parameters | ||
| ---------- | ||
| vals : MultiIndex, list-of-tuples, or single tuple | ||
| vals : MultiIndex or list-of-tuples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list-like of tuples
pandas/core/util/hashing.py
Outdated
| ) -> np.ndarray: | ||
| """ | ||
| Hash an MultiIndex / list-of-tuples efficiently | ||
| Hash an MultiIndex / list-of-tuples efficiently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list-like-of-tuples
|
|
||
| def hash_array( | ||
| vals, | ||
| vals: ArrayLike, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vals in docstring also needs updating.
pandas/core/util/hashing.py
Outdated
| return _hash_categorical(vals, encoding, hash_key) | ||
| elif is_extension_array_dtype(dtype): | ||
| vals, _ = vals._values_for_factorize() | ||
| dtype = vals.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype = vals.dtype not now needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will update
simonjayhawkins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jbrockmendel merge on green
* TYP: hashing * platform compat, mypy fixup * setops fix * port more annotations * update annotations, docstring * List-> Iterable