Skip to content

An algorithm for crowd sourcing platforms that ensures high quality work through smart gold dispersion.

Notifications You must be signed in to change notification settings

cjnayak/SmartTrainingAlgorithm

Repository files navigation

dataminingProject: Sama Source Gold Gating Algorthim

Our gating algorthim requires a Samasource generated json file.

To run our file, please just enter

python runner.py json_data_file.json

##Data Our data files are

data/Getty_Training1.json
data/Getty_Training2.json
data/Getty_Validation.json

These files were generated by running the Training_Validation_Generator.py file on a master download from Samasource's data warehouse. The Training Dataset can be accessed in this repository, while the Validation dataset can be downloaded here.

Parameters

The optional parameter of "batch" may be passed in the command line with the parameter "batch":

python runner.py data/Getty_Training1.json batch 892

If there is no batch with that number, or no batch specified, the script will default to the batch with the most tasks in the branch.

You can also specify if you would like to run the centroid over a loop of 30 times to estimate the profit gains over the course of a full project.

python runner.py data/Getty_Training1.json run-loop

If this parameter is on, the centroid selection iterates through a loop 30 times. This loop represents the number of gold batches all users must complete in order to finish an entire project of 300,000 tasks. These functions must also be run 30 times to decrease the variability of the k-means function that is generates random centroids. Running the loop estimates the amount of gold saved, the cost savings, and the increased profit Samasource receives if it implements the smart gating Algorithm.

The complete loop file runs in under 2 minutes for the training sets.

Plottings contains a number of functions that can be called to run graphs of the existing data.

##Dependencies

Our runner script has the following dependencies:

  • Numpy
  • Scipy
  • Scikit learn
  • Matplot Lib

##Old Code: the project's graveyard

A monty carlo simulation was developed for the threshold parameters and though not currently used, can be investigated to see what it outputs. Other unused bits of code can be found in the ```old`` folder.

About

An algorithm for crowd sourcing platforms that ensures high quality work through smart gold dispersion.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages