Machine Learning Tasks using Python Programming
This project aims to predict cancer mortality rates in various US counties using machine learning techniques. The dataset includes socio-economic characteristics and other related information for specific counties.
The goal is to develop a regression model that can predict cancer mortality rates in "unseen" US counties. The provided dataset consists of:
Training_data.csv: Training data with various features/predictors.Training_data_targets.csv: Corresponding target variables for the training set.Test_data_example.csv: Example test data.Test_data_example_targets.csv: Example test data targets.
The dataset includes the following predictors/features:
- avgAnnCount: Mean number of reported cases of cancer diagnosed annually.
- avgDeathsPerYear: Mean number of deaths per year.
- incidenceRate: Mean per capita (100,000) cancer diagoses.
- medIncome: Median income.
- popEst2015: Population estimates for 2015.
- povertyPercent: Percentage of people living in poverty.
- studyPerCap: Per capita spending on studies.
- binnedInc: Income binned by 10k.
- MedianAge: Median age of residents.
- MedianAgeMale: Median age of male residents.
- MedianAgeFemale: Median age of female residents.
- GeoFips: County identifier.
- Data Preparation: Clean and preprocess the data.
- Model Training: Train regression models using the provided training data.
- Model Evaluation: Evaluate the models using the example test data.
- Inference: Use the trained model to predict cancer mortality rates on unseen test data.
Task.ipynb: Jupyter notebook containing the project code, analysis, and model development.Training_data.csv: Training data file.Training_data_targets.csv: Training targets file.Test_data_example.csv: Example test data file.Test_data_example_targets.csv: Example test targets file.
This project is licensed under the MIT License - see the LICENSE file for details.