Skip to content

validator would fail on "legit" "n/a" value #1026

@yarikoptic

Description

@yarikoptic

on "spacetop" dataset (which is yet to be made public) there is .tsv files with n/a in some .tsv's, which is the way BIDS mandates to code "missing values". See https://bids-specification.readthedocs.io/en/stable/common-principles.html#tabular-files which states

String values containing tabs MUST be escaped using double quotes. Missing and non-applicable values MUST be coded as n/a. Numerical values MUST employ the dot (.) as decimal separator and MAY be specified in scientific notation, using e or E to separate the significand from the exponent. TSV files MUST be in UTF-8 encoding.

I filed

ATM for my sample dataset (not yet public on openneuro, attn @jungheejung - remind me if we do have it somewhere public meanwhile), a version of hed-validator (#1025) blows with

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ ~/bin/hed-validator .
Using HEDTOOLS version: {'date': '2024-06-14T17:02:33-0500', 'dirty': False, 'error': None, 'full-revisionid': '940e75ddcedd5a14910098b60277413edc3c024e', 'version': '0.5.0'}
Traceback (most recent call last):
  File "/home/yoh/bin/hed-validator", line 71, in <module>
    main()
  File "/home/yoh/bin/hed-validator", line 37, in main
    issue_list = bids.validate(check_for_warnings=args.check_for_warnings)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/tools/bids/bids_dataset.py", line 84, in validate
    issues += files.validate_datafiles(self.schema, check_for_warnings=check_for_warnings)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/tools/bids/bids_file_group.py", line 157, in validate_datafiles
    issues += data_obj.contents.validate(hed_schema, extra_def_dicts=extra_def_dicts, name=name,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/models/base_input.py", line 359, in validate
    validation_issues = tab_validator.validate(self, self._mapper.get_def_dict(hed_schema, extra_def_dicts), name,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/validator/spreadsheet_validator.py", line 63, in validate
    data_new._dataframe = df_util.sort_dataframe_by_onsets(data.dataframe)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/hed/models/df_util.py", line 118, in sort_dataframe_by_onsets
    df_copy['_temp_onset_sort'] = df_copy['onset'].astype(float)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/generic.py", line 6643, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 430, in astype
    return self.apply(
           ^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 363, in apply
    applied = getattr(b, f)(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/internals/blocks.py", line 758, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 237, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 182, in astype_array
    values = _astype_nansafe(values, dtype, copy=copy)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/deno/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 133, in _astype_nansafe
    return arr.astype(dtype, copy=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: 'n/a'

note that it is also unclear on which file it blows -- so might be worth providing some feedback (logging ERROR level or just catch/enhance exception while working on a specific file) on which file it happens.

I think it is likely on n/a in onset field in some _events.tsv, e.g.

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git grep '^n/a' | grep _events.tsv | head
sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-01_events.tsv:n/a   n/a     rating_mouse_trajectory n/a     intensity       disgust male    African old     n/a
sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-01_events.tsv:n/a   n/a     rating_mouse_trajectory n/a     intensity       happy   male    EA      young   n/a
sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-02_events.tsv:n/a   n/a     rating_mouse_trajectory n/a     sex     happy   male    WC      old     n/a

FWIW -- we did not see similar crash while running deno bids-validator (I guess uses JS version?).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions