By: Abderrahim Benmoussa, Ph.D.
This repository showcases a comprehensive set of skills in data analytics and data science, especially in the domain of biology and health through a unique project, ranging from foundational data visualization to intricate machine learning techniques.
- Foundations of Data Analytics
- Intermediate Data Analysis Techniques
- Advanced Data Analysis Techniques
- Special Topics in Biology and Health
- Advanced Machine Learning Techniques
- Final Projects and Capstone
- Objective: Understand the basics of data importing, cleaning, and preprocessing.
- Tasks: Import datasets, conduct descriptive statistics, handle missing values, and normalize data.
- Dataset: Breast Cancer Wisconsin (Diagnostic) Data Set
- Objective: Apply basic statistical models to understand relationships in the data.
- Tasks: Data exploration, statistical test selection using LLM and group comparison through hypothesis testing
- Dataset: Breast Cancer Wisconsin (Diagnostic) Data Set
- Objective: Visualize data distributions and relationships.
- Tasks: Use Matplotlib, Seaborn, and Plotly for histograms, scatter plots, and heatmaps.
- Dataset: Human Resources Analytics
- Objective: Understand the basics of genetics data and its structure.
- Tasks: Introduction to genetic markers, SNPs, and genotypes.
- Dataset: Genetic Variation Dataset
- Objective: Analyze time-dependent data.
- Tasks: Decomposition, ARIMA modeling, and forecasting.
- Dataset: Malaria in Colombia
- Objective: Group data based on similarities.
- Tasks: K-means clustering, hierarchical clustering, and DBSCAN.
- Dataset: Metabolomics Data
- Objective: Predict categorical outcomes.
- Tasks: Logistic regression, decision trees, and support vector machines.
- Dataset: Pima Indians Diabetes Database
- Objective: Reduce the dimensionality of data.
- Tasks: PCA and t-SNE.
- Dataset: Genetic Variation Dataset.
- Objective: Understand the basics of neural networks.
- Tasks: Introduction to neural networks using TensorFlow/Keras.
- Dataset: Skin Cancer MNIST
- Objective: Image data analysis.
- Tasks: Introduction and implementation of CNNs.
- Dataset: Skin Cancer MNIST.
- Objective: Analyze text data.
- Tasks: Text preprocessing, sentiment analysis, and topic modeling.
- Dataset: PubMed 200k RCT
- Objective: Understand the basics of reinforcement learning.
- Tasks: Introduction to Q-learning.
- Dataset: Custom environment using OpenAI Gym.
- Objective: Analyze genomic sequences.
- Tasks: Sequence alignment, gene prediction, and phylogenetics.
- Dataset: NCBI GenBank
- Objective: Understand protein structures and functions.
- Tasks: Protein sequence and structure analysis.
- Dataset: Protein Data Bank
- Objective: Analyze metabolic pathways.
- Tasks: Metabolic pathway analysis and biomarker discovery.
- Dataset: MetaboLights
- Objective: Understand interactions within biological systems.
- Tasks: Network and pathway enrichment analysis.
- Dataset: BioGRID
- Objective: Improve model performance using ensemble techniques.
- Tasks: Bagging, boosting, and stacking.
- Dataset: Heart Disease UCI
- Objective: Use pre-trained models for new tasks.
- Tasks: Introduction and fine-tuning of pre-trained models.
- Dataset: Malaria Cell Images Dataset
- Objective: Discover patterns without labeled data.
- Tasks: Autoencoders and GANs.
- Dataset: MNIST
- Objective: Understand model decisions.
- Tasks: Feature importance, SHAP values, and LIME.
- Dataset: Heart Disease UCI.
- Objective: Integrate data from multiple omics levels.
- Tasks: Data integration and joint analysis.
- Dataset: TCGA Pan-Cancer (PANCAN)
- Objective: Predict treatment outcomes based on individual data.
- Tasks: Predict drug responses and personalized treatment recommendations.
- Dataset: GDSC
- Objective: Predict disease risk and outcomes.
- Tasks: Risk factor analysis and predictive modeling.
- Dataset: Framingham Heart Study dataset
- Objective: Demonstrate all learned skills in a comprehensive project.
- Tasks: End-to-end data analysis.
- Dataset: Choose based on personal interest or combine multiple datasets.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License