Using environmental data collected by various US Federal Government agencies to predict the spread of Dengue fever in two locations: San Juan, Puerto Rico, and Iquitos, Peru.
For a detailed analysis of the methodology, results, and conclusions, please refer to the Project Report included in this repository.
This project aims to predict the total number of Dengue fever cases for each city for each week in the test set. It uses data provided by the DrivenData "DengAI: Predicting Disease Spread" competition.
According to the project report, the best performing models identified via Cross-Validation were:
| City | Model | Dataset Version | CV MAE | Test MAE |
|---|---|---|---|---|
| San Juan | AdaBoost | Transformed | 30.67 | 19.64 |
| Iquitos | Ridge Regression | Original | 6.94 | 8.31 |
Note: The lower test scores in San Juan indicate a distributional shift, suggesting the test period was easier to predict than the training period.
Note
The results in this table are from the final Project Report. Re-running the notebook may yield slightly different results due to stochastic nature of some models or minor environment differences, though the overall trends should remain consistent.
- Clone the repository:
git clone https://github.com/aasim-m/DengAI.git
- Install the required dependencies:
pip install -r requirements.txt
The main analysis and model training are contained in the Jupyter Notebook DengAI.ipynb.
To run the notebook:
jupyter notebook DengAI.ipynbDengAI.ipynb: The main notebook containing EDA, preprocessing, and model implementation.DengAI_Report.pdf: Comprehensive project report.requirements.txt: List of Python dependencies.data/: Directory containing the dataset (ensure this is present if running the code).
This project is licensed under the MIT License - see the LICENSE file for details.