Hi, this is a repository of work I do in my spare time to pump up my skills.
All data was taken from public sources, so I added CSV files to each project folder.
The projects are done based on my current skills. So if I learn something new and can apply it to previously learned data, I will add to the projects.
| Name | Description | Stack |
|---|---|---|
| Proxy ETL | This project implements the process of parsing proxies from https://free-proxy-list.net/ and loading the resulting data into a BigQuery table in Google Cloud. It uses Apache Airflow to schedule and execute tasks as a Directed Acyclic Graph (DAG). | python, pandas, requests, BeatifulSoup, Apache Airflow, ping3, BigQuery, Docker |
| Supermarket sales | In this study I have analyzed the sales of three supermarkets. I analyzed each product line individually. I also analyzed sales by month, by day of the week and by week. In this study I did a lot of work with time. I converted date as in weeks, months, days, days of the week and time as in hours. Also built a lot of graphs: pie, barplot, lineplot, histplot, relplot. Figured out the percentages of different payment methods. The jupyter notebook describes everything, what which part of the code does what. | python, pandas, matplotlib, seaborn, Tableau |
| Bookings | All data analysis is stored in a jupyter notebook. Only used pandas. I practiced writing code to change columns and calculate other metrics. In some places an alternative solution to the problem is presented. | python, pandas |
| Exam Scores | The dataset Students Performance in Exams is taken from the platform Kaggle. This data set consists of the marks secured by the students in various subjects. Purpose: To understand how students' academic performance is affected by their parents, test preparation, etc. | SQL, python, pandas, matplotlib |
| Fake News | The purpose of this study is to see the percentage of fake and true news on a sample of news. And also to find out the names of the most famous liars and the most honest people. Only applied SQL here. | SQL |
| Dashboard Germany | his dashboard is made in Excel with the help of pivot tables. The data is taken from the internet for 8 cities in Germany. The dashboard presents data: revenue by month, shares of regions, categories, top 10 products, customers. There are also slices by year, region and city. Each chart is on a separate pivot table and on a separate sheet . This is for convenience and for quick changes to the dashboard. | Excel |
Basic development tools:
-
Python3 programming language and its libraries:
-
programming environment Jupyter Notebook;
-
query language (SQL).
-
Docker
