-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Milestone
Description
Is your feature request related to a problem?
The natural sort order is a common use case when working with real-world data. For example, consider the following DataFrame of clinical data where the body temperature of patients was measured:
data = {'Patient_ID': {0: 'ID-1',
1: 'ID-11',
2: 'ID-2'},
'temperature': {0: 37.2, 1: 37.5, 2: 37.2}}
df = pd.DataFrame(data).sort_values(by=['Patient_ID'])
df.head(5)
will yield:
| Patient_ID | temperature | |
|---|---|---|
| 0 | ID-1 | 37.2 |
| 1 | ID-11 | 37.5 |
| 2 | ID-2 | 37.2 |
whereas we would want
| Patient_ID | temperature | |
|---|---|---|
| 0 | ID-1 | 37.2 |
| 2 | ID-2 | 37.2 |
| 1 | ID-11 | 37.5 |
Describe the solution you'd like
- sort_values could get a new parameter
sort_orderthat is by default alphabetical and could be switched to natural. - the implementation could be similar to the natsort package without any of the extra options
natsortbrings.: modify all values and pass them to np.argsort() s.t. then transform them back.
API breaking implications
Since we are only adding a parameter this would not break any existing API.
Describe alternatives you've considered
Currently, one could use the natsort package. However, this seems cumbersome for such a common operation and makes it necessary to reindex the DataFrame. Stackoverflow example.