A comprehensive machine learning framework for automated model selection and hyperparameter optimization using both genetic algorithms and exhaustive search methods. This interactive dashboard allows users to analyze datasets, train multiple ML models, and visualize results through an intuitive Streamlit interface.
- 📊 Automated Data Processing: Load and preprocess CSV datasets with automatic handling of data types
- 🧬 Dual Optimization Approaches: Compare genetic algorithm search vs. exhaustive grid search for model selection
- 🤖 Multiple ML Algorithms: Supports various regression models including Linear Regression, Decision Trees, Random Forests, Lasso, Ridge, KNN, and XGBoost
- 📈 Interactive Dashboard: Visualize model performance, dataset correlations, and detailed results
- 📋 Exportable Results: Save analysis results to Excel for further examination
- 💾 SQL Integration: Optional database connectivity for larger datasets
- 🖨️ Enhanced Console Output: Standardized, colorful, and informative print utilities for better user experience
The framework follows a modular component-based architecture:
| Component | Description |
|---|---|
| CGenerator | Handles data loading, preprocessing, and train/test splitting |
| CEvaluator | Implements genetic and exhaustive search algorithms for model optimization |
| CPredictor | Provides prediction capabilities using optimized models |
| CVisualizer | Creates the interactive dashboard with multiple visualization options |
| print_utils | Provides standardized and visually appealing console output functions |
- Python 3.8+
- Dependencies listed in requirements.txt:
- streamlit
- pandas
- numpy
- scikit-learn
- scikit-learn-genetic
- xgboost
- matplotlib
- seaborn
- plotly
- sqlalchemy
- pymssql (for SQL Server connectivity)
-
Clone the repository:
git clone https://github.com/AndresACV/OptimML-Framework.git cd OptimML-Framework -
Create and activate a virtual environment (recommended):
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Place your CSV datasets in the
datasetsfolder -
Run the application using Streamlit:
streamlit run app.py
Alternatively, on Windows, you can use the provided batch file:
run_app.bat
-
The dashboard will open in your default web browser, allowing you to:
- Explore dataset statistics and correlations
- View model performance comparisons
- Analyze the best hyperparameters for each algorithm
- Export results to Excel
OptimML-Framework/
├── app.py # Streamlit application entry point
├── components/ # Framework components
│ ├── __init__.py # Package initialization
│ ├── CMain.py # Main application logic
│ ├── CGenerator.py # Data loading and preprocessing component
│ ├── CEvaluator.py # Model evaluation and optimization component
│ ├── CPredictor.py # Prediction component
│ ├── CVisualizer.py # Dashboard and visualization component
│ └── utils/ # Utility modules
│ ├── __init__.py # Utils package initialization
│ └── print_utils.py # Print formatting utilities
├── datasets/ # Directory for input CSV datasets
├── results/ # Directory for output results
├── assets/ # Images and other static assets
├── requirements.txt # Python dependencies
└── run_app.bat # Windows batch file for easy startup
Extend the models dictionary in components/CEvaluator.py:
self.models = {
'LinearRegression': LinearRegression(),
'DecisionTreeRegressor': DecisionTreeRegressor(),
# Add your custom model here:
'YourCustomModel': YourCustomModel()
}Modify the create_dashboard method in components/CVisualizer.py:
def create_dashboard(self):
st.title('OptimML Framework Dashboard')
# Add your custom dashboard components hereModify the parameter search spaces in the _get_param_grids_genetic and _get_param_grids_exhaustive methods in components/CEvaluator.py.
The framework includes a comprehensive print utilities module (components/utils/print_utils.py) that provides standardized and visually appealing console output. The utilities include:
- Section Headers: Clear visual separation between major sections
- Subsection Headers: Visual separation for subsections
- Info Messages: Standard formatted informational messages
- Success Messages: Green-highlighted success notifications
- Warning Messages: Yellow-highlighted warning notifications
- Error Messages: Red-highlighted error notifications
- Progress Indicators: Visual progress bars for long-running operations
- Timestamp Messages: Messages with timestamps for chronological reference
- Model Training Notifications: Specialized formatting for model training events
- Data Loading Indicators: Specialized formatting for data loading operations
- Summary Statistics: Tabular formatting for statistical summaries
- Table Formatting: Consistent tabular data presentation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Andrés Calvo - GitHub Profile
