Skip to content

Process

Javier Otegui edited this page Dec 28, 2016 · 1 revision

Process

The whole process of this API is divided into two parts: service initialization and file parsing.

Service initialization

Main wiki page: https://github.com/VertNet/dedupe/wiki/1.-Service-initialization

Handled by the dedupe.DedupeAPI module, this first part is in charge of taking the file and associated parameters from the user's request, parse them, store the file in Google Cloud Storage and launch the workers that will do the actual job.

File parsing

Main wiki page: https://github.com/VertNet/dedupe/wiki/2.-File-parsing

This second part, which runs asynchronously, is in charge of actually parsing the file in search for duplicates, and of preparing the requested output.

Clone this wiki locally