-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Code Sample, a copy-pastable example if possible
Our primary data with two columns with identical name:
df = pd.DataFrame(np.arange(12).reshape(4, 3).T)
df.columns = list('AABC')
print(df)
"""
A A B C
0 0 3 6 9
1 1 4 7 10
2 2 5 8 11
"""Issue 1a: Series.replace throws ValueError when assigning:
print(df['B'].replace(6, np.nan)) # will work as expected with int as well
"""
0 NaN
1 7.0
2 8.0
Name: B, dtype: float64
"""
# ValueError: Buffer has wrong number of dimensions (expected 1, got 0):
df['B'] = df['B'].replace(6, np.nan) # inplace=True does not raise error, but no change
df['B'] = df['B'].replace(6, 5)Issue 1b: Same ValueError as above thrown when assigning np.nan with loc:
# ValueError: Buffer has wrong number of dimensions (expected 1, got 0):
df.loc[df['B'] == 6, 'B'] = np.nan
# Assigning int with loc will however work:
df.loc[df['B'] == 6, 'B'] = 5Issue 2a: assigning np.nan with iloc on column with a duplicate will apply on both columns:
# Assigning np.nan with iloc on column with a duplicate will apply on both columns:
df.iloc[0, 0] = np.nan
print(df)
"""
A A B C
0 NaN NaN 5 9
1 1.0 4.0 7 10
2 2.0 5.0 8 11
"""Issue 2b: assigning int with iloc will work int v0.22.0 but not v0.23.4
df.iloc[0, 0] = 10
print(df)
"""
0.22.0:
A A B C
0 10 3 5 9
1 1 4 7 10
2 2 5 8 11
0.23.4:
A A B C
0 10.0 10.0 5 9
1 1.0 4.0 7 10
2 2.0 5.0 8 11
"""
# Assigning with iloc will not break if BOTH columns contain a nan:
x = pd.DataFrame({'a': np.array([np.nan, 1, 2])})
y = pd.DataFrame({'a': np.array([0, np.nan, 2])})
df = pd.concat([x, y], axis=1)
df.iloc[0, 0] = 10
print(df)
"""
a a
0 10.0 0.0
1 1.0 NaN
2 2.0 2.0
"""Problem description
The main topic for this issue is assigning different values to a DataFrame that contains duplicate column names. List of issues reported:
- Issue 1a:
Series.replacethrowsValueErrorwhen assigning (BUG: iloc.__setitem__ with duplicate columns #32477) - Issue 1b: Same
ValueErrorasIssue 1athrown when assigning np.nan with loc (TST: GH24798 df.replace() with duplicate columns #34302) - Issue 2a: assigning
np.nanwithilocon column with a duplicate will apply on both columns (Replacing a column with iloc replaces another column with same name #22036, BUG: .iloc indexing with duplicates #15686 closed by BUG: iloc.__setitem__ with duplicate columns #32477) - Issue 2b: assigning
intwithilocwill work inv0.22.0but notv0.23.4(Replacing a column with iloc replaces another column with same name #22036, BUG: .iloc indexing with duplicates #15686 closed by BUG: iloc.__setitem__ with duplicate columns #32477)
he issue with iloc and np.nan (above called Issue 2a) was reported and closed as fixed here: #13423 per 0.18.0 but I'm able to recreate that issue with v0.23.4.
Expected Output
Either the same output as we would expect if we had only unique names in our columns or a DuplicateColumnWarning/DuplicateColumnException when DataFrame contains duplicate columns.