General fixes bis by sborms · Pull Request #109 · PythonPredictions/cobra

sborms · 2021-10-01T09:42:31Z

Set of small (documentation) fixes and (!) reconfiguration of the forward feature selection such that it selects model based on the "selection" data split, after training the model on the "train" data split. Can the reviewer (@sandervh14) thoroughly check the commits which have 'ffs(election)' mentioned?

Also now correctly retrieves the model coefficients for linear regression (before it only took the first element - confusion because for logistic regression the coefficients are an array of an array).

sandervh14

Hey Sam,
I left some important review comments, but I can't review again later this afternoon, so I hereby approve and trust that if you think indeed some of my comments indeed require a change, that you'll apply them, and if you think no changes are required, you can merge.

sandervh14 · 2021-10-01T10:19:30Z

        ax.xaxis.set_tick_params(labelsize=14)
        plt.setp(ax.get_xticklabels(),
-                 rotation=45, ha="right", rotation_mode="anchor")
+                 rotation=90, ha="right", rotation_mode="anchor")


Doesn't that make it less readable? Is it because for a dataset you were evaluating, 45° angle didn't display nice for some reason - even longer bin labels than I had, for example? + when rotation=90, ha="center" might be better.

I reconsidered, and put it back to rotation=45. 😀 I had indeed one plot with a very large number of labels, and vertical rotation made it more or less readable. But agree to keep it like this, as it makes the typical cases optimally readable.

sandervh14 · 2021-10-01T10:21:47Z

        # Set labels & ticks
        ax2.set_xlabel('{} bins' ''.format(variable), fontsize=16)
-        ax2.xaxis.set_tick_params(rotation=45, labelsize=14)
+        ax2.xaxis.set_tick_params(rotation=90, labelsize=14)


If modifying X axis label stuff due to my above comment, this will also be impacted.

sandervh14 · 2021-10-01T10:26:09Z


-    def _find_next_best_model(self, train_data: pd.DataFrame,
+    def _find_next_best_model(self,
+                              train_data: pd.DataFrame,


just calling it "basetable" or "basetable_parts" might be better, "train_data" could suggest to people - when they're not scrolling down - that only train data is used for feature selection.

Agree, and also thought about changing it, but instead I emphasized in the documentation of ForwardFeatureSelection.fit() for train_data what the argument should be, and added an assertion to make sure it includes both a train & selection split.

sandervh14 · 2021-10-01T10:30:28Z

-                                     split="train"))
+                           .evaluate(sel_data[current_predictors + [pred]],
+                                     sel_data[target_column_name],
+                                     split="selection"))


it seems like, since sel_data is now passed instead of train_data, split="selection" is redundant, default argument of evaluate apparently is split=None and then evaluation will just run fine on only selection split, since that is the only data split available in sel_data; though, if you want to explicitly leave split="selection" argument there for absolute clarity, I can follow that reasoning.

Indeed, I put it there for clarity and that way it also populates the _eval_metrics_by_split model attribute. But your reasoning is correct.

sandervh14 · 2021-10-01T10:32:41Z

-                 category_size_threshold: int = 5,
-                 p_value_threshold: float = 0.001,
-                 scale_contingency_table: bool = True,
-                 forced_categories: dict = {}):


I'm also used to this style, is that bad towards linters or something?

To be honest, I don't know what the recommended style is. I started doing it at some point, then saw that PyCharm prefers it with spaces, but kept on going. At some point we'll enforce consistent code formatting as put forward in #54.

I'd use what PyCharm suggests, normally it checks for PEP rules, and linters as well. We'll see when we tackle #54.

sborms added 8 commits September 30, 2021 17:32

ffselection not on train data

aefb9e9

fix train-selection split ffs

ffd852e

some cleaning

f97835e

cleaning

b98f300

Merge branch 'develop' into general-fixes-bis

84f3cb1

final PIG plots tweaks

919458b

finetune ffs & tutorials

c0fdab0

add comma

14e5bf2

sborms requested a review from sandervh14 October 1, 2021 09:42

sandervh14 approved these changes Oct 1, 2021

View reviewed changes

sborms added 2 commits October 1, 2021 13:49

rotation=45

5964f5b

integrating comments Sander & finetuning

3f06750

sandervh14 mentioned this pull request Oct 1, 2021

Selectable evaluation metric #111

Merged

unit testing ffs assertions

fca06da

sborms merged commit c59da30 into develop Oct 1, 2021

sborms deleted the general-fixes-bis branch October 1, 2021 16:00

Conversation

sborms commented Oct 1, 2021

Uh oh!

sandervh14 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sborms Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sborms Oct 1, 2021 •

edited

Loading