make pseudodata_table correctly deal with multiple replicas#2034
Conversation
| replicas_nnseed_fitting_data_dict = collect("replica_nnseed_fitting_data_dict", ("replicas",)) | ||
| groups_replicas_indexed_make_replica = collect( | ||
| "indexed_make_replica", ("group_dataset_inputs_by_experiment", "replicas") | ||
| "indexed_make_replica", ("replicas", "group_dataset_inputs_by_experiment") |
There was a problem hiding this comment.
Why is the change of order necessary?
There was a problem hiding this comment.
I thin it makes what we do in pseudodata_table more readable. There we group the entries to groups_replicas_indexed_make_replica corresponding to a given replica. I think that's easier to understand the way it's done now than if we had to take e.g. index 0 and then skip a number of indexes equal to the number of groups to get the second input corresponding to the same replica
| df = [ | ||
| pd.concat(groups_replicas_indexed_make_replica[i : i + groups_per_replica]) | ||
| for i in range(0, len(groups_replicas_indexed_make_replica), groups_per_replica) | ||
| ] |
There was a problem hiding this comment.
Why can't you achieve this with a reshape or permutation of groups_replicas_indexed_make_replica
There was a problem hiding this comment.
because groups_replicas_indexed_make_replica is a list of indexed_make_replica containing a number of dataframes equal to number_of_replicas x number_of_datagroups. It's not a really clean input.
Here I group the list items (all different data groups) that correspond to the same replica into a single dataframe for each replica
There was a problem hiding this comment.
Just complicating your life here, but would it be possible to do something along the lines of
np.array(groups_replicas_indexed_make_replica).reshape(replicas, groups_per_replica) ?
(or the other way around)
There was a problem hiding this comment.
No because it's a list of dataframes and I do need to retain the information on the labels
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
44f0068 to
ddf7702
Compare
For the thcovmat alphas stuff it makes a big difference whether I use the central data or the average over data replicas. While looking into it I want to use this function with multiple replicas.