A comprehensive collection of data science assignments covering various topics from exploratory data analysis to advanced machine learning techniques.
This repository contains four assignments completed as part of a Data Science Laboratory course. Each assignment focuses on different aspects of the data science workflow, from data preprocessing and visualization to implementing various machine learning algorithms.
- Focus: Exploratory Data Analysis and Data Preprocessing
- Dataset: Titanic passenger data
- Techniques:
- Data cleaning and handling missing values
- Feature engineering
- Visualization with Seaborn and Matplotlib
- Data preprocessing (encoding, scaling)
- Logistic Regression model implementation
- Key Outcomes: Developed a predictive model for Titanic survival with comprehensive data preparation steps
- Content: Advanced data analysis techniques
- Visualizations: Multiple figures demonstrating data relationships and model performance
- Focus: Implementation of various ensemble learning methods
- Datasets: Iris dataset (classification) and synthetic data (regression)
- Techniques:
- Bagging (Bagging Classifier and Regressor)
- Boosting (AdaBoost, Gradient Boosting, XGBoost)
- Stacking (Stacking Classifier and Regressor)
- Key Outcomes: Comparative analysis of different ensemble methods for both classification and regression tasks
- Content: Further advanced data science techniques
- Focus: Advanced modeling and evaluation
Data-Science-Lab/
├── Assignment 1/
│ ├── Titanic_EDA_and_Data_Preprocessing.ipynb
│ ├── Assignment 1.docx
│ ├── Code Images/
│ └── Output Images/
├── Assignment 2/
│ ├── data_science_lab_2.ipynb
│ ├── Assignment 2.docx
│ └── [Various visualization images]
├── Assignment 3/
│ ├── data_science_lab_3.ipynb
│ ├── Assignment 3.docx
│ └── [Various visualization images]
├── Assignment 4/
│ ├── data_science_lab_4.ipynb
│ ├── Assignment 4.docx
│ └── [Various visualization images]
├── Assignment 1.pdf
├── Assignment 2.pdf
├── Assignment 3.pdf
├── Assignment 4.pdf
└── LICENSE
- Python Libraries:
- Pandas for data manipulation
- NumPy for numerical operations
- Matplotlib and Seaborn for data visualization
- Scikit-learn for machine learning implementations
- XGBoost for gradient boosting
- Clone this repository
- Ensure you have Python and the required libraries installed
- Navigate to the specific assignment folder
- Open the Jupyter notebook files to view the code and analysis
This project is licensed under the terms of the license included in the repository.