Index dataframes tables #716

mmkekic · 2020-04-08T14:28:09Z

The PR adds option to flag tables columns to be indexed.

gonzaponte

Cool. I would like just one more test to be added.

invisible_cities/io/dst_io.py

invisible_cities/io/dst_io_test.py

invisible_cities/io/dst_io.py

gonzaponte · 2020-04-08T16:47:18Z

invisible_cities/io/dst_io.py

 from    typing             import Optional
+from    typing             import Sequence
+
+def _decode_df_to_str(df):


how about decode_str_columns?

gonzaponte · 2020-04-08T16:49:01Z

invisible_cities/io/dst_io_test.py

    with pytest.warns(UserWarning, match='does not exist'):
        load_dsts([good_file, wrong_file], group, node)

+def test_load_dst_converts_from_bytest(ICDATADIR, fixed_dataframe):


Suggested change

def test_load_dst_converts_from_bytest(ICDATADIR, fixed_dataframe):

def test_load_dst_converts_from_bytes(ICDATADIR, fixed_dataframe):

small typo

invisible_cities/io/dst_io_test.py

gonzaponte

An improved version of the dataframe writer with table indexation. The writer is now renamed to df_writer, which is much more comfortable. This PR also includes the indexation of the event column in the output of Beersheva and Esmeralda.

This change suggests that we should ensure each city indexes its output properly, but that is left for a different PR.

The new feature is properly and thoughtfully tested. Nice job!

The tests check: all elements in function argument are in tables.attrs.columns_to_index; KeyError with column name is raised if input list is not subset of dataframe columns; the argument is not given nothing is flagged.

The writer is now labeling columns_to_be_index. The table is not indexed since it should be done after the file is fully written.

The test where dataframe is read from Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5 file that contains strings is added, and the test_store_pandas_as_tables_exact does not decode explicitly.

Since the writer has different structure than others the auxiliary _df_writer had to be defined

Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716

gonzaponte requested changes Apr 8, 2020

View reviewed changes

invisible_cities/io/dst_io.py Outdated Show resolved Hide resolved

invisible_cities/io/dst_io_test.py Show resolved Hide resolved

invisible_cities/io/dst_io.py Outdated Show resolved Hide resolved

gonzaponte reviewed Apr 9, 2020

View reviewed changes

mmkekic force-pushed the fix_pandas_to_pytables branch from cecbcc0 to eccc348 Compare April 10, 2020 14:30

mmkekic changed the title ~~Fix pandas to pytables~~ Index dataframes table Apr 10, 2020

mmkekic changed the title ~~Index dataframes table~~ Index dataframes tables Apr 10, 2020

mmkekic force-pushed the fix_pandas_to_pytables branch 2 times, most recently from b42e5be to 81e02d6 Compare April 10, 2020 16:27

gonzaponte approved these changes Apr 10, 2020

View reviewed changes

mmkekic force-pushed the fix_pandas_to_pytables branch from 81e02d6 to 2eb3d05 Compare April 10, 2020 16:33

mmkekic added 10 commits April 10, 2020 19:13

Add tests for columns_to_index kwarg in store_pandas_to_tables

7b0bd7c

The tests check: all elements in function argument are in tables.attrs.columns_to_index; KeyError with column name is raised if input list is not subset of dataframe columns; the argument is not given nothing is flagged.

Add columns_to_index in store_pandas_as_tables

de9e42d

The writer is now labeling columns_to_be_index. The table is not indexed since it should be done after the file is fully written.

Add docstrings for store_pandas_as_tables

3ddae97

Add test to check load_dst converts from bytes

6ec4bf1

The test where dataframe is read from Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5 file that contains strings is added, and the test_store_pandas_as_tables_exact does not decode explicitly.

Change load_dst to decode byte type to strings

f4a2b65

Cosmetis of alingment in load_dst_io_test

8d75cee

Add store_pandas_to_tables writer in indexation_test

edeefb0

Since the writer has different structure than others the auxiliary _df_writer had to be defined

Rename store_pandas_as_tables to df_writer

f990e01

Index columns in all esmeralda writers

7465ceb

Index event column in beersheba deconv_writer

794c684

bpalmeiro force-pushed the fix_pandas_to_pytables branch from 2eb3d05 to 794c684 Compare April 10, 2020 17:14

bpalmeiro merged commit 15c87f3 into next-exp:master Apr 10, 2020

andLaing added a commit to andLaing/IC that referenced this pull request Apr 10, 2020

Adapt to changes made PR next-exp#716

e53d3e1

Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716

andLaing added a commit to andLaing/IC that referenced this pull request May 4, 2020

Adapt to changes made PR next-exp#716

077f631

Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716

andLaing added a commit to andLaing/IC that referenced this pull request May 4, 2020

Adapt to changes made PR next-exp#716

9cb5134

Remove _decode_str_columns which is now executed in load_dst. Change name of function used in mc_writer to new name approved in PR next-exp#716

mmkekic deleted the fix_pandas_to_pytables branch February 22, 2021 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index dataframes tables #716

Index dataframes tables #716

Uh oh!

mmkekic commented Apr 8, 2020

Uh oh!

gonzaponte left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gonzaponte Apr 8, 2020

Uh oh!

gonzaponte Apr 8, 2020

Uh oh!

Uh oh!

gonzaponte left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def test_load_dst_converts_from_bytest(ICDATADIR, fixed_dataframe):
	def test_load_dst_converts_from_bytes(ICDATADIR, fixed_dataframe):

Index dataframes tables #716

Index dataframes tables #716

Uh oh!

Conversation

mmkekic commented Apr 8, 2020

Uh oh!

gonzaponte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gonzaponte Apr 8, 2020

Choose a reason for hiding this comment

Uh oh!

gonzaponte Apr 8, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gonzaponte left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants