junky

Bad Data == Bad Analysis

Junky is a WiP started at PyGotham 2014 intended to one day become a tool to quickly identify potential problems in the distribution and normalization of datasets.

Dependencies

Pandas, numpy, etc

Tasks

Dataset Size

How many records? How much space do we have to examine subgroups?

Normalization of Variables

In columns with string types does the data need to be cleaned to normalize categories?

Likelihood of Missing Data

Percentage of Null values for each column
Rows with missing columns

Normal Distributions

Z-scores, T-tests, F-tests, heteroskedasticity, and box-jenkins test
Max, Min, Median, Mode, Mean

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

junky

Bad Data == Bad Analysis

Dependencies

Tasks

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

exversion/junky

Folders and files

Latest commit

History

Repository files navigation

junky

Bad Data == Bad Analysis

Dependencies

Tasks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages