-
Notifications
You must be signed in to change notification settings - Fork 6
Description
The reader functions fail when a (perfectly legal) comment is located to the right of the column name row, in a non-transposed table. For example when reading this CSV data:
**places;
all
place;distance;ETA;is_hot;;;---> parser chokes on this perfectly legal comment <---;
text;km;datetime;onoff
home;0.0;2020-08-04 08:00:00;1
work;1.0;2020-08-04 09:00:00;0
beach;2.0;2020-08-04 17:00:00;1
This is due to a misconceived "leniency" in pdtable.io.parsers.blocks.preprocess_column_names():
def preprocess_column_names(col_names_raw: Sequence[str], fixer: ParseFixer):
"""
handle known issues in column_names
"""
n_names_col = len(col_names_raw)
for el in reversed(col_names_raw):
if el is not None and len(el) > 0:
break
n_names_col -= 1
...
Thus everything on the column name line is counted as a column name up to the last non-blank cell, including any comments and all the empty cells between the actual column names and comments.
This is later passed to a ParseFixer via fixer.fix_missing_column_name(input_columns=column_names) the fixer then assumes that the empty cells are simply column names that the user forgot to write in, and replaces them with placeholder column names 'missing_fixed_000', 'missing_fixed_001', ....
This breaks support for comments. All of this should be ripped out.