-
Notifications
You must be signed in to change notification settings - Fork 12
Description
on "spacetop" dataset (which is yet to be made public) there is .tsv files with n/a in some .tsv's, which is the way BIDS mandates to code "missing values". See https://bids-specification.readthedocs.io/en/stable/common-principles.html#tabular-files which states
String values containing tabs MUST be escaped using double quotes. Missing and non-applicable values MUST be coded as
n/a. Numerical values MUST employ the dot (.) as decimal separator and MAY be specified in scientific notation, usingeorEto separate the significand from the exponent. TSV files MUST be in UTF-8 encoding.
I filed
- [BUG] clarify/re-consider use of "n/a" in numeric columns bids-standard/bids-specification#1938
though.
ATM for my sample dataset (not yet public on openneuro, attn @jungheejung - remind me if we do have it somewhere public meanwhile), a version of hed-validator (#1025) blows with
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ ~/bin/hed-validator .
Using HEDTOOLS version: {'date': '2024-06-14T17:02:33-0500', 'dirty': False, 'error': None, 'full-revisionid': '940e75ddcedd5a14910098b60277413edc3c024e', 'version': '0.5.0'}
Traceback (most recent call last):
File "/home/yoh/bin/hed-validator", line 71, in <module>
main()
File "/home/yoh/bin/hed-validator", line 37, in main
issue_list = bids.validate(check_for_warnings=args.check_for_warnings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/tools/bids/bids_dataset.py", line 84, in validate
issues += files.validate_datafiles(self.schema, check_for_warnings=check_for_warnings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/tools/bids/bids_file_group.py", line 157, in validate_datafiles
issues += data_obj.contents.validate(hed_schema, extra_def_dicts=extra_def_dicts, name=name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/models/base_input.py", line 359, in validate
validation_issues = tab_validator.validate(self, self._mapper.get_def_dict(hed_schema, extra_def_dicts), name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/validator/spreadsheet_validator.py", line 63, in validate
data_new._dataframe = df_util.sort_dataframe_by_onsets(data.dataframe)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/models/df_util.py", line 118, in sort_dataframe_by_onsets
df_copy['_temp_onset_sort'] = df_copy['onset'].astype(float)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/generic.py", line 6643, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 430, in astype
return self.apply(
^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 363, in apply
applied = getattr(b, f)(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/internals/blocks.py", line 758, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 237, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 182, in astype_array
values = _astype_nansafe(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 133, in _astype_nansafe
return arr.astype(dtype, copy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: 'n/a'
note that it is also unclear on which file it blows -- so might be worth providing some feedback (logging ERROR level or just catch/enhance exception while working on a specific file) on which file it happens.
I think it is likely on n/a in onset field in some _events.tsv, e.g.
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git grep '^n/a' | grep _events.tsv | head
sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-01_events.tsv:n/a n/a rating_mouse_trajectory n/a intensity disgust male African old n/a
sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-01_events.tsv:n/a n/a rating_mouse_trajectory n/a intensity happy male EA young n/a
sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-02_events.tsv:n/a n/a rating_mouse_trajectory n/a sex happy male WC old n/a
FWIW -- we did not see similar crash while running deno bids-validator (I guess uses JS version?).