-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
ConstructorsSeries/DataFrame/Index/pd.array ConstructorsSeries/DataFrame/Index/pd.array ConstructorsNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsgood first issue
Milestone
Description
xref #13421 with a MultiIndex as the columns
Hi,
I encountered an edge case in DataFrame initialization with something like the following:
In [1]: from pandas import *
In [2]: s = Series(1, name='foo')
In [3]: df = DataFrame(s, columns=['bar'])
In [4]: df
Empty DataFrame
Columns: [bar]
Index: []This happens in both 0.14.1 and 0.13.1, but this isn't really a bug as the docs exclude Series as a valid type for data=. That being said, this casting appears to work whenever .name is None or when .name equals what's passed to columns=, so failure in this particular case is rather surprising.
The mechanism appears to be:
DataFrame.__init__upgrades.nameto the column name, if it is notNone- Then, the data columns are sliced with the list passed to
columns=, resulting in an empty data set when the two differ. - This seems to only occur when a
Seriesis directly passed asdata=. I can't get this to occur with[Series, ...]or a dict ofSeries.
The options I see are (in order of my personal preference):
- do the implicit rename (only occurs with single
Series, so no ambiguity) - just don't allow a
Seriesbeing passed asdata=. - throw an exception due to the ambiguity
I don't see just documenting this behavior as being viable, as this edge case effectively leads to data loss.
Metadata
Metadata
Assignees
Labels
ConstructorsSeries/DataFrame/Index/pd.array ConstructorsSeries/DataFrame/Index/pd.array ConstructorsNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsgood first issue