Skip to content

fake data generation of credit card transactions using python

Notifications You must be signed in to change notification settings

dianecloud/data_generation-2015

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Data generation using pure python, HAWQ (with PL/Python), or MapReduce (streaming via python)

Instructions are included for each of the 3. MapReduce version is in the early stages and it's not currently recommended.

TODO:

  • location of locations_partitions.csv is hardcoded (fixed?)

  • come up with a realistic template so numbers aren't out of whack

  • script to calculate expected outputs based on profiles

  • for transactions, give the option to provide either a folder of all profiles to iterate through or just one json (automatic checking)

  • user input to generate config files

  • test output against profiles

  • add shell scripts to install python packages

  • add shell scripts to fix hard coding for HAWQ and MR

  • clean up HAWQ and MR code

  • add more/better data

  • improve performance of MapReduce

  • Spark streaming?

  • create_pickles doesn't run if the number of years doesn't match the profile inputs

  • work on making datasets repeatable via random seed

  • script to replace hashbang with which python

  • script to replace hard links

About

fake data generation of credit card transactions using python

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.9%
  • HTML 8.0%
  • Jupyter Notebook 1.1%