Skip to content

BuildCircle/superheroes-data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Superheroes player stats

We have just launched our brand new superheroes game and have been collecting the user stats from the last few days. This repo contains some of the initial raw data. We need to run some basic processing on the data in preperation for analysis. You are welcome to use any tecnology, most people approach this with Jupyter notebooks and pandas.

Data processing

Using the small CSV datafile. Write a simple python program that processes the data and writes the outout to another CSV file. Be sure to exclude people with nonsensical data. Processing includes:

  • Anonymise any personal data for each person
  • Add unique ID's for each person
  • Calculate the age of each person
  • Some peoples age may be incorrect or impossible, we should filter these out
  • Test the program against the larger CSV file

The output should look something like:

UID First name Last name Address Age
1234 xxxx xxxx xxx xxx 31

Following on from this discuss how you would implement this in a cloud environment or data lake and what you would productionize the code.

Small datafile

Large datafile

Finally calculate the median average & 95th percentile age of our playerbase from the large processed dataset

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%