-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
According to the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html)
"In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects..."
Well it definitely is there in the docs, but took me several hours to trace down the bug to this "feature".
So I think it would be cleaner either to fully support side effects in apply (e.g. by calling func on a copy of the first column/row in the testing phase ) or ban it completely if technically possible.
I know there are plans to ban modification when using groupby.apply ( #12653 )
I don't see any issues with mutation inside a (non groupby) apply per se, but I may be wrong.
I also have to note, that the above note from the docs is not entirely correct. If result_type is specified the first row/column is not necessarily processed twice.