Skip to content

exversion/junky

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

junky

Bad Data == Bad Analysis

Junky is a WiP started at PyGotham 2014 intended to one day become a tool to quickly identify potential problems in the distribution and normalization of datasets.

Dependencies

  • Pandas, numpy, etc

Tasks

Dataset Size

  • How many records? How much space do we have to examine subgroups?

Normalization of Variables

  • In columns with string types does the data need to be cleaned to normalize categories?

Likelihood of Missing Data

  • Percentage of Null values for each column
  • Rows with missing columns

Normal Distributions

  • Z-scores, T-tests, F-tests, heteroskedasticity, and box-jenkins test
  • Max, Min, Median, Mode, Mean

About

Junky is a series of simple tests to detect potential problems with a dataset before providing analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages