MRG, FIX, ENH: overhaul to_data_frame method #7206

drammock · 2020-01-14T01:16:33Z

minor refactor of _check_pandas_index_arguments
fix to_data_frame() docstring re: tuples vs lists
make index=None work as described
fix logic when setting title of columns
provide different options for time conversion: keep as float, or convert to integer milliseconds, pd.Timedelta, or (raw only) pd.Timestamp
make the datetime/timedelta change work with long_format=True
improve tests

cc @jona-sassenhagen @dengemann @choldgraf @larsoner @agramfort

codecov · 2020-01-18T00:36:19Z

Codecov Report

Merging #7206 into master will increase coverage by 0.42%.
The diff coverage is 98.95%.

@@            Coverage Diff             @@
##           master    #7206      +/-   ##
==========================================
+ Coverage   89.74%   90.17%   +0.42%     
==========================================
  Files         447      450       +3     
  Lines       80716    82211    +1495     
  Branches    12876    13696     +820     
==========================================
+ Hits        72442    74137    +1695     
+ Misses       5441     5276     -165     
+ Partials     2833     2798      -35

drammock · 2020-02-04T01:18:32Z

@larsoner I think this is ready for review. TLDR:

the PR is backward compatible in terms of default behavior, except for whether columns end up in the index or not (which was previously a bug IMO; see below).
The PR is not backward compatible in terms of method parameters (scale_time is gone, time_format is added).
when creating long-format DataFrames, the measurement column name has been changed from "observation" to "value", following the standard practice for wide-to-long transformations in pandas.melt and R's tidyr::pivot_longer.
to_data_frame is no longer a Mixin with complicated logic to triage the different instance types; it has separate definitions for Raw, Epochs, Evoked, and SourceEstimate classes, and some shared utils functions to make it DRY.
all of the instance methods now give better control over how the time variable is transformed (if at all)

More details:

<instance>.to_data_frame(index=None) now works as advertised: the DataFrame will have a sequential integer index (the Pandas default) and the indicator variables will all be in their own columns.
There's a new option time_format that can be None (keep time as float), 'ms' (convert to integer milliseconds), timedelta (convert to pd.Timedelta), and datetime (convert to pd.Timestamp; only works for Raw objs, and accounts for meas_date and raw.first_time). This last option is probably not widely useful, but was not hard to add and is in there for completeness.
The old scaling_time=1e3 API element is gone. Default behavior is unchanged, and if users ever passed anything other than 1e3, they can still get that result by passing time_format=None and then scaling the resulting dataframe column after the fact. Currently I haven't done any deprecation code for the scaling_time parameter; LMK if you think a deprecation cycle is needed there.
The above approach separates the question of time format conversion from whether time is in the DataFrame index or not. It also makes it possible to do a deprecation cycle on the default value of time_format... I still think None is a more sensible default than 'ms', but we can do the deprecation in a separate PR easily enough, and I'm personally content just to have the option of passing time_format=None.

larsoner · 2020-02-04T01:35:36Z

Your described current approach sounds good to me, will look in depth tomorrow

dengemann · 2020-02-04T06:58:18Z

Sounds good, l’ll take a look asap and check out how this interacts with R.

…

On 4 Feb 2020, at 02:55, Daniel McCloy ***@***.***> wrote: The Azure failures are puzzling me; these all pass locally, and Travis is happy: pytest mne/tests/test_epochs.py -k test_to_data_frame pytest mne/tests/test_evoked.py -k test_to_data_frame pytest mne/tests/test_source_estimate.py -k test_to_data_frame pytest mne/io/fiff/tests/test_raw_fiff.py -k test_to_data_frame pytest mne/io/edf/tests/test_edf.py -k test_to_data_frame Looks like it's an error with isinstance testing against np.int64. Locally it was failing when I tested against plain int which is why I switched the test to int64... is there a better way to do this kind of type checking? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mne/utils/dataframe.py

drammock · 2020-02-10T23:20:37Z

Rebased to fix a doc building issue that was fixed elsewhere. @dengemann do you still have time to review this? There's a tutorial PR #7180 that is waiting for it.

@jona-sassenhagen @choldgraf any interest in weighing in?

dengemann · 2020-02-10T23:53:41Z

Tests pass on MNE-R and examples build. But as we just discovered, the time_format options can cause trouble and hit untested territory. Let’s do an issue in mne-r and in case of doubt break things to move on here. We’ll then patch it up over there. Denis

…

On Feb 11, 2020, at 12:20 AM, Daniel McCloy ***@***.***> wrote: Rebased to fix a doc building issue that was fixed elsewhere. @dengemann do you still have time to review this? There's a tutorial PR #7180 that is waiting for it. @jona-sassenhagen @choldgraf any interest in weighing in? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

drammock · 2020-02-10T23:59:12Z

Summarizing an offline conversation with @dengemann: this PR doesn't break MNE-R, but the rewritten to_data_frame methods don't work cleanly, and some options are causing segfaults. We'll investigate for a day or two to see how hard it will be to fix it.

dengemann · 2020-02-11T00:05:37Z

Too fast, I had a path issue and did not actually run your code … On MNE master MNE-R is fine, on this branch it produces 1failing test. Can you reproduce?

…

On Feb 11, 2020, at 12:59 AM, Daniel McCloy ***@***.***> wrote: Summarizing an offline conversation with @dengemann: this PR doesn't break MNE-R, but the rewritten to_data_frame methods don't work cleanly, and some options are causing segfaults. We'll investigate for a day or two to see how hard it will be to fix it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dengemann · 2020-02-13T22:04:56Z

@drammock it looks very clean to me. Thank you for this very nice work and the testing against mne-r ! I'd say let's move on.

larsoner · 2020-02-14T13:05:21Z

@drammock this appears to have broken three examples:

https://circleci.com/gh/mne-tools/mne-python/17989

Can you look into it?

drammock · 2020-02-14T15:50:52Z

Will do.

This comment has been minimized.

Sign in to view

drammock marked this pull request as ready for review February 4, 2020 00:45

This comment has been minimized.

Sign in to view

larsoner reviewed Feb 4, 2020

View reviewed changes

mne/utils/dataframe.py Outdated Show resolved Hide resolved

larsoner assigned larsoner and unassigned larsoner Feb 4, 2020

drammock mentioned this pull request Feb 5, 2020

MRG, DOC: update tutorial: epochs to pandas DataFrame #7180

Merged

drammock changed the title ~~FIX, ENH: to_data_frame method~~ MRG, FIX, ENH: overhaul to_data_frame method Feb 10, 2020

drammock added 14 commits February 10, 2020 15:05

cleanups to check function

56d20dd

modify docstring to reflect desired behavior

fa07c38

use RangeIndex if index=None

7b8e2f0

fix bad logic

381d056

if "time" in index, convert it to datetime/timedelta

d8aebc8

document datetime/timedelta conversion

754c3fa

refactor ToDataframeMixin

705e0be

remove Mixin in favor of separate method defs

be94f5e

fix docdict

27b6f1b

fixup tests for epochs.to_data_frame

7e71589

remove cruft; fix docstring

bd58290

fix evokeds tests

c8360d6

more tests

f2e87dd

test all time formats for Raw

7b97468

drammock and others added 6 commits February 10, 2020 15:05

improve Epochs test

a66d0e7

update what's new

f9014df

fix codespell

986e9ee

FIX: int64

1061b50

better coverage

c2dc395

fix docstring

de7b528

drammock force-pushed the ep-to-df-docs branch from 430a066 to de7b528 Compare February 10, 2020 23:19

drammock mentioned this pull request Feb 10, 2020

ENH: better support for pandas time datatypes mne-tools/mne-r#12

Open

drammock mentioned this pull request Feb 13, 2020

WIP fixes for to_data_frame mne-tools/mne-r#14

Open

dengemann merged commit d2a5660 into mne-tools:master Feb 13, 2020

drammock deleted the ep-to-df-docs branch February 13, 2020 22:43

drammock mentioned this pull request Feb 14, 2020

API: properly deprecate changes in to_data_frame #7326

Merged

drammock mentioned this pull request Sep 23, 2020

MRG, API: prep for to_data_frame default argument change #8298

Merged

Uh oh!

MRG, FIX, ENH: overhaul to_data_frame method #7206

MRG, FIX, ENH: overhaul to_data_frame method #7206

Uh oh!

Conversation

drammock commented Jan 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

codecov bot commented Jan 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

drammock commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsoner commented Feb 4, 2020

Uh oh!

This comment has been minimized.

dengemann commented Feb 4, 2020 via email

Uh oh!

This comment has been minimized.

Uh oh!

drammock commented Feb 10, 2020

Uh oh!

dengemann commented Feb 10, 2020 via email

Uh oh!

drammock commented Feb 10, 2020

Uh oh!

dengemann commented Feb 11, 2020 via email

Uh oh!

dengemann commented Feb 13, 2020

Uh oh!

larsoner commented Feb 14, 2020

Uh oh!

drammock commented Feb 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

drammock commented Jan 14, 2020 •

edited

Loading

codecov bot commented Jan 18, 2020 •

edited

Loading

drammock commented Feb 4, 2020 •

edited

Loading