-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
DOC: Improve the docstring of DataFrame.equals() #22539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improve the docstring of DataFrame.equals() #22539
Conversation
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great docstring, added some comments.
pandas/core/generic.py
Outdated
| Test whether two DataFrame objects contain the same elements. | ||
| This function allows two DataFrame objects to be compared against | ||
| each other to see if they same shape and elements. NaNs in the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they have the same shape...
pandas/core/generic.py
Outdated
| See Also | ||
| -------- | ||
| pandas.Series.eq : Compare two Series objects of the same length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the pandas. prefix, just Series.eq is enough.
I think we need eq and equals of both Series and DataFrame, not just Series.eq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be redundant to mention equals since this is the docstring for that method?
| and return a Series where each element is True if the element | ||
| in each Series is equal, False otherwise. | ||
| numpy.array_equal : Return True if two arrays have the same shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the blank line before this
pandas/core/generic.py
Outdated
| Examples | ||
| -------- | ||
| >>> a = pd.DataFrame({1:[0], 0:[1]}) | ||
| >>> b = pd.DataFrame({1.0:[0], 0.0:[1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with other docstrings, it'd be good to use df for the main DataFrame the ones in which we'll call the method df.equals(), and something more descriptive for the other (e.g. exactly_equal, different_data_type, different_index_type...)
After creating the data, it'd be nice to display the content (i.e. >>> df)
Not sure in this case if it'd be better or worse, but in general we try to have somehow real-world (meaningful) data for the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the same data in the updated PR - felt with text in order to use meaningful data it would be harder/more confusing to demonstrate. Can change that if needed.
pandas/core/generic.py
Outdated
| >>> b = pd.DataFrame({1.0:[0], 0.0:[1]}) | ||
| DataFrames a and b have the same element types and values, but have | ||
| different types for the indices, which will still return True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"indices" are the row labels, which are the same, in this case the different type is in the column labels.
pandas/core/generic.py
Outdated
| """ | ||
| Determines if two NDFrame objects contain the same elements. NaNs in | ||
| the same location are considered equal. | ||
| Test whether two DataFrame objects contain the same elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is used for both Series and DataFrame, we should be generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the DataFrame references in the docstring to NDFrame as originally written, however the validation script returned an error (Private classes (['NDFrame']) should not be mentioned in public docstring.). The updated PR still has NDFrame but can change it to something else (Series/DataFrame?) if that's better.
Codecov Report
@@ Coverage Diff @@
## master #22539 +/- ##
=========================================
Coverage ? 92.04%
=========================================
Files ? 169
Lines ? 50787
Branches ? 0
=========================================
Hits ? 46745
Misses ? 4042
Partials ? 0
Continue to review full report at Codecov.
|
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring looks really good. Just some comments about technicalities.
pandas/core/generic.py
Outdated
| """ | ||
| Determines if two NDFrame objects contain the same elements. NaNs in | ||
| the same location are considered equal. | ||
| Test whether two NDFrame objects contain the same elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave Test whether two objects contain the same elements. for now.
The problem is that NDFrame is not a class we expect users to know about (it's private). So, it shouldn't be used in the documentation. Series or DataFrame or object are probably what we want to use in most cases.
| return a DataFrame where each element is True if the respective | ||
| element in each DataFrame is equal, False otherwise. | ||
| numpy.array_equal : Return True if two arrays have the same shape | ||
| and elements, False otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add the method being documented to the See Also, only in cases where the docstring is being reused by many methods, and we want those methods to reference each other. But this is not the case here. May be you can still add the pandas.utils.testing.assert_frame_equal and assert_series_equal here, but I don't think it adds value to reference .equals itself.
pandas/core/generic.py
Outdated
| Notes | ||
| ----- | ||
| This function requires that the elements have the same dtype as their | ||
| respective elements in the other DataFrame. However, the column labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the other Series or DataFrame
pandas/core/generic.py
Outdated
| Examples | ||
| -------- | ||
| >>> df = pd.DataFrame({1:[0], 0:[1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing spaces after colons. I think using 0 -> 1 and 1 -> 0 is a bit confusing, may be 1 -> 10 and 2 -> 20 makes the example easier to read?
pandas/core/generic.py
Outdated
| types and values, but have different types for the column labels, | ||
| which will still return True. | ||
| >>> different_column_label = pd.DataFrame({1.0:[0], 0.0:[1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different_column_type seems a clearer name to me
…ster' into docstring_DataFrame_equals
|
Hello @seantchan! Thanks for updating the PR.
|
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, really good docstring, thanks for the work on this @seantchan
git diff upstream/master -u -- "*.py" | flake8 --diff