Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
5a05f40
Issue #75: PIGs for regression.
Aug 11, 2021
f1e6774
univariate preselection based on RMSE, added new unit testing
Aug 12, 2021
027d1c7
Revised unit tests for issue #75: PIGs for regression.
Aug 12, 2021
59d967a
Merge branch 'issue-#75-pigs-for-linear-regression' into develop
Aug 12, 2021
8022ec1
Merge branch 'master' into develop
Aug 12, 2021
16a62eb
Bugfix for #66: preprocessor creation via from_params(model_type="reg…
Aug 12, 2021
2dda022
Merge pull request #90 from PythonPredictions/issue-#66-preprocessor
Aug 12, 2021
4db0cb0
Merge pull request #89 from PythonPredictions/issue-#75-pigs-for-line…
Aug 12, 2021
49b5629
Bugfix for #66: from_params() and from_pipeline() make a wrong call t…
Aug 12, 2021
7aded52
Merge pull request #91 from PythonPredictions/issue-#66-preprocessor
Aug 12, 2021
4ac4159
first changes to add linear regression to model class structure
Aug 12, 2021
c5e5e8f
first changes to add linear regression to model class structure
Aug 12, 2021
b41788e
first changes to add linear regression to model class structure
Aug 12, 2021
dec293d
unit tests forward selection & finetuning models
Aug 13, 2021
2c7b7e7
Merge pull request #88 from PythonPredictions/issue-#68-univariate_se…
sandervh14 Aug 13, 2021
24a9cf3
Merge pull request #92 from PythonPredictions/issue-#69-#70-models-fo…
sandervh14 Aug 13, 2021
0deddb5
Started on including a class of regression evaluation metrics, both n…
hendrikdewinter8 Aug 17, 2021
aceafa3
Merge branch 'master' into issue#71-evaluator_regression
hendrikdewinter8 Sep 2, 2021
480f671
Merge pull request #93 from PythonPredictions/issue#71-evaluator_regr…
hendrikdewinter8 Sep 2, 2021
ac522ea
Fix: model scoring did not work in the regression case (indexing erro…
Sep 3, 2021
122b36c
Merge pull request #94 from PythonPredictions/issue-#69-#70-models-fo…
Sep 3, 2021
2d3ecaa
split Evaluator into ClassificationEvaluator and RegressionEvaluator …
Sep 3, 2021
d31dc5b
Linear regression requires "lower is better" instead of "higher is be…
Sep 3, 2021
548643b
plotting_utils for linear regression & various small changes
Sep 10, 2021
c9577cd
Merge branch 'issue-#70-fixing-best-predictor-selection' into issue-#…
Sep 10, 2021
beaf97b
add self.model_type
Sep 10, 2021
234eddc
some clean-up
Sep 16, 2021
2c39caa
bug fix: make absolutely sure bin edges are ordered
Sep 16, 2021
29ef255
doc clarifications around category_size_threshold argument
Sep 16, 2021
678c396
typo
Sep 16, 2021
657ce5f
Merge pull request #99 from PythonPredictions/issue-#71-evaluator-reg…
sandervh14 Sep 21, 2021
a8aa750
small fix for y axis on performance curve plot (regression version)
hendrikdewinter8 Sep 22, 2021
2bffe82
Latest fix for y axis on performance curve plot (regression version)
hendrikdewinter8 Sep 22, 2021
0302c62
Merge pull request #101 from PythonPredictions/issue#76-plottingUtils
Sep 24, 2021
09e51c6
fix qq plot
Sep 24, 2021
ae66e34
expand unit testing for evaluation & plotting
Sep 24, 2021
2f7f7c8
New tutorials for both logistic and linear regression (move from html…
hendrikdewinter8 Sep 27, 2021
529dbc8
new version of tutorials (latest updates)
hendrikdewinter8 Sep 27, 2021
97ff7f8
Merge pull request #102 from PythonPredictions/issue-#76-plotting_utils
Sep 28, 2021
14ed388
new version of tutorials (latest updates)
hendrikdewinter8 Sep 28, 2021
a66c0ce
fixed the legend warning in qqplot by adding labels to the plot elements
hendrikdewinter8 Sep 28, 2021
7b5a7b1
Fixed the showing of a warning with the formatting of ticks (which ap…
hendrikdewinter8 Sep 28, 2021
137c00a
Fixed the showing of a warning with the formatting of ticks (which ap…
hendrikdewinter8 Sep 28, 2021
df89b92
again an updated version of the tutorials, after some feedback
hendrikdewinter8 Sep 28, 2021
6dbd346
Other legend labels + equal ticks on both axes.
Sep 29, 2021
29b928d
Merge pull request #106 from PythonPredictions/issue#105_warnings
sandervh14 Sep 29, 2021
13facf1
Merge pull request #104 from PythonPredictions/issue#77_documentation
Sep 29, 2021
585c8b5
qq plot modifs
Sep 29, 2021
3c84785
separate tutorials folder
Sep 29, 2021
2be37c6
reset y-axis performance plots regression
Sep 29, 2021
c073418
fix README
Sep 29, 2021
9e12942
add cleaned unrun tutorials under new folder
Sep 29, 2021
819cab8
a round of cleaning
Sep 29, 2021
9d591a4
bunch of documentation cleaning
Sep 29, 2021
de4f901
some more documentation cleaning
Sep 29, 2021
7d28335
further docs cleaning, incl. regrouping explanation
Sep 29, 2021
183535c
additional explanations in docs
Sep 29, 2021
834c869
fix raise ValueError optimal step
Sep 29, 2021
3e310d7
various docs changes and small fixes
Sep 30, 2021
d4dcd83
reorder readme
Sep 30, 2021
51f2285
add __version__ string
Sep 30, 2021
35129a3
change link logo
Sep 30, 2021
a2bb0c1
change link logo
Sep 30, 2021
eb3a400
Merge pull request #107 from PythonPredictions/general-fixes
Sep 30, 2021
aefb9e9
ffselection not on train data
Sep 30, 2021
51e0498
PIG plot fix: correctly aligning long textual bin labels with the bin…
Sep 30, 2021
ffd852e
fix train-selection split ffs
Sep 30, 2021
53a9858
PIG plot fix: avoiding the
Sep 30, 2021
f97835e
some cleaning
Sep 30, 2021
3b71e4b
Merge pull request #108 from PythonPredictions/issue-#32-improve-PIGs
Oct 1, 2021
b98f300
cleaning
Oct 1, 2021
84f3cb1
Merge branch 'develop' into general-fixes-bis
Oct 1, 2021
919458b
final PIG plots tweaks
Oct 1, 2021
c0fdab0
finetune ffs & tutorials
Oct 1, 2021
14e5bf2
add comma
Oct 1, 2021
5964f5b
rotation=45
Oct 1, 2021
3f06750
integrating comments Sander & finetuning
Oct 1, 2021
66672ba
Compute and plot forward selection with another metric of the develop…
Oct 1, 2021
fca06da
unit testing ffs assertions
Oct 1, 2021
c59da30
Merge pull request #109 from PythonPredictions/general-fixes-bis
Oct 1, 2021
390fc1c
Merge pull request #111 from PythonPredictions/selectable_evaluation_…
Oct 1, 2021
eccb344
improve usage metric arg
Oct 1, 2021
438ef86
metric_name arg in plot_performance_curves
Oct 1, 2021
d3f64fb
update tutorials
Oct 1, 2021
af6000f
update docs params & HTML generation
Oct 4, 2021
df7882e
Merge pull request #112 from PythonPredictions/sphinx-docs
Oct 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,15 @@ junit/
*.mo
*.pot

# Django stuff:
# Django stuff
*.log
local_settings.py

# Flask stuff:
# Flask stuff
instance/
.webassets-cache

# Scrapy stuff:
# Scrapy stuff
.scrapy

# Sphinx documentation
Expand Down
65 changes: 26 additions & 39 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@

.. image:: https://github.com/PythonPredictions/cobra/raw/master/material/logo.png
:width: 700

.. image:: https://img.shields.io/pypi/v/pythonpredictions-cobra.svg
:target: https://pypi.org/project/pythonpredictions-cobra/
Expand All @@ -9,26 +11,20 @@

------------------------------------------------------------------------------------------------------------------------------------

=====
cobra
=====
.. image:: material\logo.png
:width: 300
**Cobra** is a Python package to build predictive models using linear or logistic regression with a focus on performance and interpretation. It consists of several modules for data preprocessing, feature selection and model evaluation. The underlying methodology was developed at `Python Predictions <https://www.pythonpredictions.com>`_ in the course of hundreds of business-related prediction challenges. It has been tweaked, tested and optimized over the years based on feedback from clients, our team, and academic researchers.

**cobra** is a Python package to build predictive models using logistic regression with a focus on performance and interpretation. It consists of several modules for data preprocessing, feature selection and model evaluation. The underlying methodology was developed at Python Predictions in the course of hundreds of business-related prediction challenges. It has been tweaked, tested and optimized over the years based on feedback from clients, our team, and academic researchers.

Main Features
Main features
=============

- Prepare a given pandas DataFrame for predictive modelling:

- partition into train/selection/validation sets
- create bins from continuous variables
- regroup categorical variables based on statistical significance
- replace missing values and
- add columns with incidence rate per category/bin
- replace missing values
- add columns where categories/bins are replaced with average of target values (linear regression) or with incidence rate (logistic regression)

- Perform univariate feature selection based on AUC
- Perform univariate feature selection based on RMSE (linear regression) or AUC (logistic regression)
- Compute correlation matrix of predictors
- Find the suitable variables using forward feature selection
- Evaluate model performance and visualize the results
Expand All @@ -41,49 +37,40 @@ These instructions will get you a copy of the project up and running on your loc
Requirements
------------

This package requires the usual Python packages for data science:

- numpy (>=1.19.4)
- pandas (>=1.1.5)
- scipy (>=1.5.4)
- scikit-learn (>=0.23.1)
- matplotlib (>=3.3.3)
- seaborn (>=0.11.0)


These packages, along with their versions are listed in ``requirements.txt`` and can be installed using ``pip``: ::

This package requires only the usual Python libraries for data science, being numpy, pandas, scipy, scikit-learn, matplotlib, seaborn, and tqdm. These packages, along with their versions are listed in ``requirements.txt`` and can be installed using ``pip``: ::

pip install -r requirements.txt


**Note**: if you want to install cobra with e.g. pip, you don't have to install all of these requirements as these are automatically installed with cobra itself.
**Note**: if you want to install Cobra with e.g. pip, you don't have to install all of these requirements as these are automatically installed with Cobra itself.

Installation
------------

The easiest way to install cobra is using ``pip``: ::
The easiest way to install Cobra is using ``pip``: ::

pip install -U pythonpredictions-cobra

Contributing to cobra
=====================

We'd love you to contribute to the development of cobra! There are many ways in which you can contribute, the most common of which is to contribute to the source code or documentation of the project. However, there are many other ways you can contribute (report issues, improve code coverage by adding unit tests, ...).
We use GitHub issue to track all bugs and feature requests. Feel free to open an issue in case you found a bug or in case you wish to see a new feature added.
Documentation and extra material
================================

- A `blog post <https://www.pythonpredictions.com/news/the-little-trick-we-apply-to-obtain-explainability-by-design/>`_ on the overall methodology.

- A `research article <https://doi.org/10.1016/j.dss.2016.11.007>`_ by Geert Verstraeten (co-founder Python Predictions) discussing the preprocessing approach we use in Cobra.

For more details, check our `wiki <https://github.com/PythonPredictions/cobra/wiki/Contributing-guidelines-&-workflows>`_.
- HTML documentation of the `individual modules <https://pythonpredictions.github.io/cobra.io/docstring/modules.html>`_.

Help and Support
================
- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra/tutorials/tutorial_Cobra_logistic_regression.ipynb>`_ for **logistic regression**.

Documentation
-------------
- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra/tutorials/tutorial_Cobra_linear_regression.ipynb>`__ for **linear regression**.

- HTML documentation of the `individual modules <https://pythonpredictions.github.io/cobra.io/docstring/modules.html>`_
- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra.io/tutorial.html>`_
- Check out the Data Science Leuven Meetup `talk <https://www.youtube.com/watch?v=w7ceZZqMEaA&feature=youtu.be>`_ by one of the core developers (second presentation). His `slides <https://github.com/PythonPredictions/Cobra-DS-meetup-Leuven/blob/main/DS_Leuven_meetup_20210209_cobra.pdf>`_ and `related material <https://github.com/PythonPredictions/Cobra-DS-meetup-Leuven>`_ are also available.

Contributing to Cobra
=====================

Outreach
-------------
We'd love you to contribute to the development of Cobra! There are many ways in which you can contribute, the most common of which is to contribute to the source code or documentation of the project. However, there are many other ways you can contribute (report issues, improve code coverage by adding unit tests, ...).
We use GitHub issues to track all bugs and feature requests. Feel free to open an issue in case you found a bug or in case you wish to see a new feature added.

- Check out the Data Science Leuven Meetup `talk <https://www.youtube.com/watch?v=w7ceZZqMEaA&feature=youtu.be>`_ by one of the core developers (second presentation)
For more details, check out our `wiki <https://github.com/PythonPredictions/cobra/wiki/Contributing-guidelines-&-workflows>`_.
1 change: 1 addition & 0 deletions cobra/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .version import __version__
7 changes: 4 additions & 3 deletions cobra/evaluation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
from .plotting_utils import plot_univariate_predictor_quality
from .plotting_utils import plot_correlation_matrix

from .evaluator import Evaluator

# from .evaluator import Evaluator
from .evaluator import ClassificationEvaluator, RegressionEvaluator

__all__ = ["generate_pig_tables",
"compute_pig_table",
Expand All @@ -18,4 +18,5 @@
"plot_variable_importance",
"plot_univariate_predictor_quality",
"plot_correlation_matrix",
"Evaluator"]
"ClassificationEvaluator",
"RegressionEvaluator"]
Loading