-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
related is #2094
related is #6847 (fixes kind and some arg ordering)
related is #7121 (make sortlevel a part of sort_index by adding level arg)
the sorting API is currently inconsistent and confusing. here is what exists:
Series:
sort: callsSeries.order, in-place, defaultsquicksortorder: do the sort on values, return a new object, defaultsmergesortsort_index: sort by labels, returns new object
Frame:
sort: callssort_indexsort_index: sorts by the index with no args, otherwise a nested sort of the passed columns
The semantics are different between Series and DataFrame. In Series, sort mean in-place, order returns a new object. sort/order sort on the values, while sort_index sorts on the index. For a DataFrame, sort and sort_index are the same and sort on a column/list of columns; inplace is a keyword.
Proposed signature of combined methods. We need to break a Series API here. because sort is an in-place method which is quite inconsistent with everything else.
def sort(self, by=None, axis=0, level=None, ascending=True, inplace=False,
kind='quicksort', na_last=True):
This is what I think we should do:
- make
Series.sort/orderbe the same. - by can take a column/list of columns (as it can now), or an index name /
indexto provide index sorting (which means sort by the specifiied axis) - default is
inplace=False(which is the same as now, except forSeries.sort). Series.sort_indexdoess.sort('index')DataFrame.sort_indexdoesdf.sort('index')- eventually deprecate
Series.order - add
DataFrame.sort_columnsto perform axis=1 sorting
This does switch the argument to the current sort_index, (e.g. axis is currently first), but I think then allows more natural syntax
df.sort()ordf.sort_index()ordf.sort_index('index')sort on the index labelsdf.sort(['A','B'],axis=1)sort on these columns (allow 'index' here as well to sort on the index too)df.sort_columns()ordf.sort('columns')sort on the column labelsdf.sort_columns()defaultsaxis=1, sodf.sort_columns(['A','B'])is equiv of - -df.sort(['A','B'],axis=1)s.sort()sort on the valuess.sort('index')ors.sort_index()sort on the series index