Skip to content

General fixes bis#109

Merged
sborms merged 11 commits intodevelopfrom
general-fixes-bis
Oct 1, 2021
Merged

General fixes bis#109
sborms merged 11 commits intodevelopfrom
general-fixes-bis

Conversation

@sborms
Copy link
Copy Markdown
Contributor

@sborms sborms commented Oct 1, 2021

Set of small (documentation) fixes and (!) reconfiguration of the forward feature selection such that it selects model based on the "selection" data split, after training the model on the "train" data split. Can the reviewer (@sandervh14) thoroughly check the commits which have 'ffs(election)' mentioned?

Also now correctly retrieves the model coefficients for linear regression (before it only took the first element - confusion because for logistic regression the coefficients are an array of an array).

@sborms sborms requested a review from sandervh14 October 1, 2021 09:42
Copy link
Copy Markdown
Contributor

@sandervh14 sandervh14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Sam,
I left some important review comments, but I can't review again later this afternoon, so I hereby approve and trust that if you think indeed some of my comments indeed require a change, that you'll apply them, and if you think no changes are required, you can merge.

Comment thread cobra/evaluation/pigs_tables.py Outdated
ax.xaxis.set_tick_params(labelsize=14)
plt.setp(ax.get_xticklabels(),
rotation=45, ha="right", rotation_mode="anchor")
rotation=90, ha="right", rotation_mode="anchor")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that make it less readable? Is it because for a dataset you were evaluating, 45° angle didn't display nice for some reason - even longer bin labels than I had, for example? + when rotation=90, ha="center" might be better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reconsidered, and put it back to rotation=45. 😀 I had indeed one plot with a very large number of labels, and vertical rotation made it more or less readable. But agree to keep it like this, as it makes the typical cases optimally readable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okidoki!

Comment thread cobra/evaluation/pigs_tables.py Outdated
# Set labels & ticks
ax2.set_xlabel('{} bins' ''.format(variable), fontsize=16)
ax2.xaxis.set_tick_params(rotation=45, labelsize=14)
ax2.xaxis.set_tick_params(rotation=90, labelsize=14)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If modifying X axis label stuff due to my above comment, this will also be impacted.


def _find_next_best_model(self, train_data: pd.DataFrame,
def _find_next_best_model(self,
train_data: pd.DataFrame,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just calling it "basetable" or "basetable_parts" might be better, "train_data" could suggest to people - when they're not scrolling down - that only train data is used for feature selection.

Copy link
Copy Markdown
Contributor Author

@sborms sborms Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, and also thought about changing it, but instead I emphasized in the documentation of ForwardFeatureSelection.fit() for train_data what the argument should be, and added an assertion to make sure it includes both a train & selection split.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

split="train"))
.evaluate(sel_data[current_predictors + [pred]],
sel_data[target_column_name],
split="selection"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like, since sel_data is now passed instead of train_data, split="selection" is redundant, default argument of evaluate apparently is split=None and then evaluation will just run fine on only selection split, since that is the only data split available in sel_data; though, if you want to explicitly leave split="selection" argument there for absolute clarity, I can follow that reasoning.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I put it there for clarity and that way it also populates the _eval_metrics_by_split model attribute. But your reasoning is correct.

category_size_threshold: int = 5,
p_value_threshold: float = 0.001,
scale_contingency_table: bool = True,
forced_categories: dict = {}):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also used to this style, is that bad towards linters or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know what the recommended style is. I started doing it at some point, then saw that PyCharm prefers it with spaces, but kept on going. At some point we'll enforce consistent code formatting as put forward in #54.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use what PyCharm suggests, normally it checks for PEP rules, and linters as well. We'll see when we tackle #54.

@sborms sborms merged commit c59da30 into develop Oct 1, 2021
@sborms sborms deleted the general-fixes-bis branch October 1, 2021 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants