Skip to content

waiyanps/TidyTuesday-Python

Repository files navigation

TidyTuesday in Python

Hello Everyone!!!.

It's really nice to see like-minded people here. I'm a big fan of David Robinson's Tidytuesday screencasts and I was looking for TidyTuesday in Python works. Unfortunately, there are not much, some are not persistent.

I found Michael Chow's TidyTuesday in Python screencasts (which is recommended by David) and found that his work is more R-ish than Python-ish. Michael Chow is suiba creator. After making some researches, I have decided to replicate most of David's analyses using only pandas and plotly. When I first tried to replicate one of his works, It took me a whole week to get 40/100 of his results (R was not my thing). Followed every step David did in his analysis, couldn't replicate the same results as David did, Failed!.

That failure motivated me to read Pandas Documentation and pushed me to google a lot like the followings

  • What is equivalent of fct_lump in python pandas?
  • What is equivalent of fct_floor in python pandas?
  • What is equivalent of gather() in python pandas?
  • What is equivalent of unnest()-tidytext in python nltk
  • And more Whats......

I could find some but for some situations I had to implement functions with pure Python.

After two or three screencasts , I started to feel comfortable with his works and R (tidyverse) syntax. I am more of a Python guy and I don't know R programming much but now I even know how to make data analysis using tidyverse. Most important thing is, I was a guy who didn't know what to do with a dataset using pandas, how to stack datasets, how to melt them, how to do conditional count after groupby.

Now I know that I can upgrade my Pandas skill and data wrangling skill with python by doing this Project.

Plots created with Plotly

alt text

alt text

How does gender breakdown relate to typical earnings

alt text

This project helps me -

  • Know what kind of questions we should ask
  • Know how to think like a data scientist
  • Know how to approach different types of datasets
  • Know a small amount of R(tidyverse).
  • Know how to do advanced data analysis using pandas and numpy.
  • Know how to make plots using plotly.

Future works

  • Relicat all of David's TTD screencasts using python
  • Build Dashboards using plotly + dash
  • Planning to build/translate/port some useful R packages into Python on top of Pandas like

-> keras to Tensorflow, -> fastai to PyTorch ( i'm not smart enough to do this yet! :) )

  • Write blog posts about my works (How to TidyTuesday using Python, etc).

Suggestion

  • Don't Worry! Everything is Figurable and Doable using Python and Pandas.
  • Join Plotly Community, ask a lot of questions.
  • Install Rstudio to reproduce David's codes line by line, to better understand what he did in his screencasts . (David's doing data analyses like he is speed running :) )
  • Watch screencasts bit by bit, open his code in new tab (we can find codes in descriptions)

Websites you may like

askpython realpython datafish

The datasets comes from the Tidy Tuesday project
David's Screencasts Youtube

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published