Marketing Mix Model – Prescription Drug Sales

MMAI 831 | Group Assignment

Project Objective

Develop and compare parametric and machine learning models to explain and predict prescription drug sales volume. Estimate marketing response (elasticity, marginal effects) for Detailing and DTCA spending, and produce actionable managerial recommendations.

Dataset Description

Column	Type	Description
Class	Categorical	Therapeutic drug class
Agent	Categorical	Individual drug / brand
Year	Integer	Observation year
Detailing	Numeric	Sales force activity (visits or spend index)
DTCA	Numeric	Direct-to-Consumer Advertising spend
Actual Sales	Numeric	Target variable – prescription drug sales volume

File: data/MMM_Drug Data.xlsx

The dataset has a panel structure: each row is one Agent × Year combination. Multiple agents belong to the same therapeutic class.

Methodology

Train/Test Split

Temporal split (leakage-free): earlier years → training; most recent year(s) → test.
No random shuffling – this is time-series / panel data.

Feature Engineering

Log(1 + x) transforms for Sales, Detailing, DTCA (handles right skew and zeros).
Year trend variable and year-fixed-effect dummies.
Interaction terms: Detailing × DTCA, Class × Detailing, Class × DTCA.
1-year lagged sales (per agent, computed before split to prevent leakage).

Parametric Models (P1–P7)

Model	Key feature
P1 OLS Baseline	Raw-scale benchmark
P2 Log-Log	Direct elasticity interpretation
P3 Log-Log + Class FE	Controls for class heterogeneity
P4 Log-Log + Two-Way FE	Preferred spec – controls class & year shocks
P5 + Interactions	Class-specific marketing slopes
P6 Ridge	Regularised for robustness
P7 Lasso	Variable selection

Machine Learning Models (M1–M5)

Random Forest, Gradient Boosting, XGBoost, LightGBM, Elastic Net. All trained with time-series cross-validation to prevent temporal leakage.

Evaluation Metrics

R², RMSE, MAE, MAPE. AIC/BIC for OLS models.

Business Interpretation

Elasticities from log-log OLS (direct coefficient read-off).
Marginal effects at sample means.
Response curves (diminishing returns check).
Class-level heterogeneity in marketing response.
Approximate ROI / marginal response proxy.

How to Run

1. Place the data file

data/MMM_Drug Data.xlsx

2. Create a virtual environment (recommended)

python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Run the full pipeline

python main.py

All outputs are saved to /outputs (intermediate) and /report_assets (report-ready).

5. Optional: explore in Jupyter

jupyter lab notebooks/

Folder Structure

831/
├── data/
│   └── MMM_Drug Data.xlsx        ← place your data file here
├── notebooks/
│   └── exploration.ipynb         ← interactive EDA (optional)
├── src/
│   ├── __init__.py
│   ├── utils.py                  ← paths, constants, shared helpers
│   ├── data_loader.py            ← load Excel, validate, inspect
│   ├── preprocessing.py          ← train/test split, categorical encoding
│   ├── eda.py                    ← EDA charts and summary stats
│   ├── feature_engineering.py    ← log transforms, interactions, lags
│   ├── parametric_models.py      ← OLS, log-log, FE, regularised
│   ├── ml_models.py              ← RF, GBM, XGBoost, LightGBM, ElasticNet
│   ├── evaluation.py             ← metrics, comparison table, residual plots
│   ├── interpretation.py         ← elasticities, marginal effects, PDPs
│   └── reporting.py              ← polished outputs, memo, markdown draft
├── outputs/                      ← diagnostic charts and intermediate CSVs
├── report_assets/                ← report-ready charts, tables, and text
├── main.py                       ← end-to-end pipeline runner
├── requirements.txt
└── README.md

Key Assumptions

Log-log functional form is appropriate (justified by EDA – right-skewed distributions and roughly linear log-log scatter plots). The log1p transform is used to handle zero spend values safely.
Temporal train/test split is the only leakage-safe strategy for panel data with a time dimension. Random k-fold is explicitly avoided.
Class fixed effects are included to absorb baseline sales differences across therapeutic categories. This is equivalent to a within-class comparison.
Year fixed effects absorb macro-level demand shocks common to all agents in a given year (e.g., generic entry, regulatory changes, market expansion).
Agent identity is not one-hot encoded by default because the number of agents may be large relative to the number of observations, risking overfitting. Agent-level information is captured via optional target encoding (agent_fe).
Endogeneity: firms may increase detailing and DTCA spending in markets or for drugs where they expect higher demand (reverse causality). The OLS estimates should be interpreted as associations, not causal effects, unless a valid instrument is available.
Elasticity interpretation from OLS is valid under the assumption that the log-log specification correctly describes the true data-generating process.

Interpretation Notes

Positive elasticity on log(Detailing): a 1% increase in detailing is associated with approximately β% increase in sales, holding class, year, and DTCA constant.
Diminishing returns: if the response curve is concave or the polynomial term on log(Detailing)² has a negative coefficient, diminishing returns are present.
Interaction Detailing × DTCA: a positive coefficient means the two channels are complementary (each enhances the effect of the other). A negative coefficient means they are substitutes (one channel is more effective when the other is low).
AIC/BIC: prefer the model with lower AIC/BIC when test set R² is similar.

Manual Review Checklist (⚠ Before Submission)

Verify EDA charts in /outputs/eda_*.png match the dataset.
Check that train/test year split printed at runtime is sensible.
Review /report_assets/key_findings.txt and fill in [placeholders].
Edit /report_assets/report_draft.md with actual results and context.
Confirm elasticity signs are economically plausible (positive, typically < 1).
If any marketing variable is insignificant, discuss in the Limitations section.
Check residual plots for patterns (heteroskedasticity, outliers).
Align the chosen "best model" recommendation with your group's written argument.

Dependencies

See requirements.txt. Core: pandas, numpy, statsmodels, scikit-learn, matplotlib, seaborn, xgboost, lightgbm, shap.

If xgboost, lightgbm, or shap are not installed, the pipeline gracefully skips those steps with a warning rather than crashing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Marketing Mix Model – Prescription Drug Sales

MMAI 831 | Group Assignment

Project Objective

Dataset Description

Methodology

Train/Test Split

Feature Engineering

Parametric Models (P1–P7)

Machine Learning Models (M1–M5)

Evaluation Metrics

Business Interpretation

How to Run

1. Place the data file

2. Create a virtual environment (recommended)

3. Install dependencies

4. Run the full pipeline

5. Optional: explore in Jupyter

Folder Structure

Key Assumptions

Interpretation Notes

Manual Review Checklist (⚠ Before Submission)

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
notebooks		notebooks
outputs		outputs
report_assets		report_assets
src		src
.gitignore		.gitignore
AI_on_Spot_Report_V2.docx		AI_on_Spot_Report_V2.docx
LICENSE		LICENSE
MMAI -831 AIOS2026 - SDI.docx		MMAI -831 AIOS2026 - SDI.docx
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Marketing Mix Model – Prescription Drug Sales

MMAI 831 | Group Assignment

Project Objective

Dataset Description

Methodology

Train/Test Split

Feature Engineering

Parametric Models (P1–P7)

Machine Learning Models (M1–M5)

Evaluation Metrics

Business Interpretation

How to Run

1. Place the data file

2. Create a virtual environment (recommended)

3. Install dependencies

4. Run the full pipeline

5. Optional: explore in Jupyter

Folder Structure

Key Assumptions

Interpretation Notes

Manual Review Checklist (⚠ Before Submission)

Dependencies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages