Skip to content

Conversation

@drammock
Copy link
Member

@drammock drammock commented Jan 14, 2020

closes #7179

  • minor refactor of _check_pandas_index_arguments
  • fix to_data_frame() docstring re: tuples vs lists
  • make index=None work as described
  • fix logic when setting title of columns
  • provide different options for time conversion: keep as float, or convert to integer milliseconds, pd.Timedelta, or (raw only) pd.Timestamp
  • make the datetime/timedelta change work with long_format=True
  • improve tests

cc @jona-sassenhagen @dengemann @choldgraf @larsoner @agramfort

@larsoner

This comment has been minimized.

@drammock

This comment has been minimized.

@larsoner

This comment has been minimized.

@drammock

This comment has been minimized.

@codecov
Copy link

codecov bot commented Jan 18, 2020

Codecov Report

Merging #7206 into master will increase coverage by 0.42%.
The diff coverage is 98.95%.

@@            Coverage Diff             @@
##           master    #7206      +/-   ##
==========================================
+ Coverage   89.74%   90.17%   +0.42%     
==========================================
  Files         447      450       +3     
  Lines       80716    82211    +1495     
  Branches    12876    13696     +820     
==========================================
+ Hits        72442    74137    +1695     
+ Misses       5441     5276     -165     
+ Partials     2833     2798      -35

@drammock drammock marked this pull request as ready for review February 4, 2020 00:45
@drammock
Copy link
Member Author

drammock commented Feb 4, 2020

@larsoner I think this is ready for review. TLDR:

  • the PR is backward compatible in terms of default behavior, except for whether columns end up in the index or not (which was previously a bug IMO; see below).
  • The PR is not backward compatible in terms of method parameters (scale_time is gone, time_format is added).
  • when creating long-format DataFrames, the measurement column name has been changed from "observation" to "value", following the standard practice for wide-to-long transformations in pandas.melt and R's tidyr::pivot_longer.
  • to_data_frame is no longer a Mixin with complicated logic to triage the different instance types; it has separate definitions for Raw, Epochs, Evoked, and SourceEstimate classes, and some shared utils functions to make it DRY.
  • all of the instance methods now give better control over how the time variable is transformed (if at all)

More details:

  • <instance>.to_data_frame(index=None) now works as advertised: the DataFrame will have a sequential integer index (the Pandas default) and the indicator variables will all be in their own columns.
  • There's a new option time_format that can be None (keep time as float), 'ms' (convert to integer milliseconds), timedelta (convert to pd.Timedelta), and datetime (convert to pd.Timestamp; only works for Raw objs, and accounts for meas_date and raw.first_time). This last option is probably not widely useful, but was not hard to add and is in there for completeness.
  • The old scaling_time=1e3 API element is gone. Default behavior is unchanged, and if users ever passed anything other than 1e3, they can still get that result by passing time_format=None and then scaling the resulting dataframe column after the fact. Currently I haven't done any deprecation code for the scaling_time parameter; LMK if you think a deprecation cycle is needed there.
  • The above approach separates the question of time format conversion from whether time is in the DataFrame index or not. It also makes it possible to do a deprecation cycle on the default value of time_format... I still think None is a more sensible default than 'ms', but we can do the deprecation in a separate PR easily enough, and I'm personally content just to have the option of passing time_format=None.

@larsoner
Copy link
Member

larsoner commented Feb 4, 2020

Your described current approach sounds good to me, will look in depth tomorrow

@drammock

This comment has been minimized.

@dengemann
Copy link
Member

dengemann commented Feb 4, 2020 via email

@agramfort

This comment has been minimized.

@larsoner larsoner assigned larsoner and unassigned larsoner Feb 4, 2020
@drammock drammock changed the title FIX, ENH: to_data_frame method MRG, FIX, ENH: overhaul to_data_frame method Feb 10, 2020
@drammock
Copy link
Member Author

Rebased to fix a doc building issue that was fixed elsewhere. @dengemann do you still have time to review this? There's a tutorial PR #7180 that is waiting for it.

@jona-sassenhagen @choldgraf any interest in weighing in?

@dengemann
Copy link
Member

dengemann commented Feb 10, 2020 via email

@drammock
Copy link
Member Author

Summarizing an offline conversation with @dengemann: this PR doesn't break MNE-R, but the rewritten to_data_frame methods don't work cleanly, and some options are causing segfaults. We'll investigate for a day or two to see how hard it will be to fix it.

@dengemann
Copy link
Member

dengemann commented Feb 11, 2020 via email

@dengemann
Copy link
Member

@drammock it looks very clean to me. Thank you for this very nice work and the testing against mne-r ! I'd say let's move on.

@dengemann dengemann merged commit d2a5660 into mne-tools:master Feb 13, 2020
@drammock drammock deleted the ep-to-df-docs branch February 13, 2020 22:43
@larsoner
Copy link
Member

@drammock this appears to have broken three examples:

https://circleci.com/gh/mne-tools/mne-python/17989

Can you look into it?

@drammock
Copy link
Member Author

Will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC: epochs.to_data_frame doc error

4 participants