Skip to content

Reading table blocks fails when there is a comment to the right of the column name row #72

@jfcorbett

Description

@jfcorbett

The reader functions fail when a (perfectly legal) comment is located to the right of the column name row, in a non-transposed table. For example when reading this CSV data:

        **places;
        all
        place;distance;ETA;is_hot;;;---> parser chokes on this perfectly legal comment <---; 
        text;km;datetime;onoff
        home;0.0;2020-08-04 08:00:00;1
        work;1.0;2020-08-04 09:00:00;0
        beach;2.0;2020-08-04 17:00:00;1

This is due to a misconceived "leniency" in pdtable.io.parsers.blocks.preprocess_column_names():

def preprocess_column_names(col_names_raw: Sequence[str], fixer: ParseFixer):
    """
       handle known issues in column_names
    """
    n_names_col = len(col_names_raw)
    for el in reversed(col_names_raw):
        if el is not None and len(el) > 0:
            break
        n_names_col -= 1

    ...

Thus everything on the column name line is counted as a column name up to the last non-blank cell, including any comments and all the empty cells between the actual column names and comments.

This is later passed to a ParseFixer via fixer.fix_missing_column_name(input_columns=column_names) the fixer then assumes that the empty cells are simply column names that the user forgot to write in, and replaces them with placeholder column names 'missing_fixed_000', 'missing_fixed_001', ....

This breaks support for comments. All of this should be ripped out.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions