-
Notifications
You must be signed in to change notification settings - Fork 77
Index dataframes tables #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gonzaponte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I would like just one more test to be added.
invisible_cities/io/dst_io.py
Outdated
| from typing import Optional | ||
| from typing import Sequence | ||
|
|
||
| def _decode_df_to_str(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about decode_str_columns?
invisible_cities/io/dst_io_test.py
Outdated
| with pytest.warns(UserWarning, match='does not exist'): | ||
| load_dsts([good_file, wrong_file], group, node) | ||
|
|
||
| def test_load_dst_converts_from_bytest(ICDATADIR, fixed_dataframe): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_load_dst_converts_from_bytest(ICDATADIR, fixed_dataframe): | |
| def test_load_dst_converts_from_bytes(ICDATADIR, fixed_dataframe): |
small typo
cecbcc0 to
eccc348
Compare
b42e5be to
81e02d6
Compare
gonzaponte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An improved version of the dataframe writer with table indexation. The writer is now renamed to df_writer, which is much more comfortable. This PR also includes the indexation of the event column in the output of Beersheva and Esmeralda.
This change suggests that we should ensure each city indexes its output properly, but that is left for a different PR.
The new feature is properly and thoughtfully tested. Nice job!
81e02d6 to
2eb3d05
Compare
The tests check: all elements in function argument are in tables.attrs.columns_to_index; KeyError with column name is raised if input list is not subset of dataframe columns; the argument is not given nothing is flagged.
The writer is now labeling columns_to_be_index. The table is not indexed since it should be done after the file is fully written.
The test where dataframe is read from Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5 file that contains strings is added, and the test_store_pandas_as_tables_exact does not decode explicitly.
Since the writer has different structure than others the auxiliary _df_writer had to be defined
2eb3d05 to
794c684
Compare
Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716
Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716
Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716
The PR adds option to flag tables columns to be indexed.