An end-to-end machine learning application for breast cancer diagnosis that predicts whether a breast mass is benign or malignant based on cytology lab measurements. The project includes both model training and an interactive web interface.
- Data preprocessing and cleaning from the Wisconsin Breast Cancer Dataset
- Feature scaling using StandardScaler
- Logistic Regression classification model
- Model evaluation with accuracy metrics and classification reports
- Serialized model and scaler for production use
- Real-time interactive sliders for 30+ cell nuclei measurements
- Dynamic radar chart visualization comparing:
- Mean values
- Standard error values
- Worst-case values
- Instant prediction results with probability scores
- Responsive two-column layout design
- Data Cleaning: Automatic handling of missing values and column mapping
- Feature Scaling: Min-max scaling for visualization and model input
- Model Prediction: Real-time inference with probability outputs
- Visual Analytics: Plotly-based radar charts for multi-dimensional data visualization
- User-Friendly Interface: Intuitive sidebar controls and clear result displays
├── main.py # Streamlit web application
├── model_training.py # ML model training script
├── model.pkl # Trained logistic regression model
├── scaler.pkl # Fitted StandardScaler object
├── dataset/
│ └── cdata.csv # Breast cancer dataset
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.8+
- pip package manager
- Clone the repository:
git clone https://github.com/yourusername/breast-cancer-prediction.git cd breast-cancer-prediction - Install dependencies:
pip install -r requirements.txt
- Run the web application:
streamlit run main.py
streamlit==1.28.0
pandas==2.0.3
numpy==1.24.3
scikit-learn==1.3.0
plotly==5.17.0 To retrain the model:
python model_training.pyThis will:
- Load and clean the dataset
- Split data into training and testing sets
- Train a logistic regression model
- Evaluate model performance
- Save the model and scaler as .pkl files
- Adjust Measurements: Use the sidebar sliders to input cell nuclei measurements
- View Visualization: Observe the radar chart showing three measurement categories
- Get Predictions: See the prediction (Benign/Malignant) with probability scores
- Medical Disclaimer: Always consult healthcare professionals for actual diagnoses
- The application uses the Wisconsin Breast Cancer Dataset containing:
- 569 instances with 30 features each
- Features include mean, standard error, and worst values of:
- Radius, Texture, Perimeter, Area
- Smoothness, Compactness, Concavity
- Concave Points, Symmetry, Fractal Dimension
- Binary target variable: Malignant (M) or Benign (B)
- The logistic regression model achieves:
- High accuracy on test data
- Detailed classification metrics
- Probability outputs for confident decision-making
This application is designed to assist medical professionals and should NOT be used as a substitute for professional medical diagnosis, advice, or treatment. Always consult qualified healthcare providers for medical decisions.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- University of Wisconsin for the Breast Cancer Dataset
- Streamlit for the amazing web app framework
- Scikit-learn for machine learning tools
- Plotly for visualization capabilities
For questions or feedback, please open an issue in the GitHub reposito
main.py- Streamlit web applicationmodel_training.py- Model training script (from your second file)model.pkl- Trained modelscaler.pkl- Scaler objectdataset/cdata.csv- Dataset filerequirements.txt- DependenciesREADME.md- Documentation (created above).gitignore- To exclude unnecessary files
# Create requirements.txt
pip freeze > requirements.txt
# Initialize git repo
git init
git add .
git commit -m "Initial commit: Breast Cancer Diagnosis ML App"
git branch -M main
git remote add origin https://github.com/yourusername/repo-name.git
git push -u origin main
