Create a sustainable and scalable method in which to track dealer conversion performance through meaningful KPI's and calculated scoring based off of predictive models created with the purpose of learning dealerships expected sales and service figures.
Whereas traditional models look to define expected performance based on geographical comparison and industry standards, this model aims to understand fluctuation in volume based on seasonality and similarly sized dealerhsips. Metrics pertaining to lead suppliers and breakdowns by lead type are future considerations.
The two main areas of focus are listed below and their respective KPI's that were proven to have strong correlation to the target metric, which in this case is actual sales. Service has no yet been implemented and is therefore left out. It follows a similar structure to the sales methodology.
Generating Sales: VDP Views | Unique Visitors | Lead Volume | Visitor Conversions | Actual Sales
Managing Sales: Close Rate | SDSV Close Rate | Sales Loyalty | % of Leads Responded to Within 30 Minutes
The below implementation was developed using AWS, Python, PySpark, and Tableau. The dack is a mock representation of data seen in the field due to PPI and data rights. It is intended as a mock representation of the process.
The following diagram shows the general architecture used to create the models. The following is a more descriptive overview"
- The mock data was created using a Jupyter Notebook running on EC2
- Mock data is sent to S3 using Boto3 SDK
- Event trigger begins Glue's Pyspark ETL handling
- Data is reloaded into S3 in a full form and newly created form with unique ID for ML process
- Minimized data set load triggers the Machine Learning Batch process
- Batch process returns data to S3
- Predictive results are combined with original file using PySpark data transformation
- Results are stored in S3 and downloaded
- Data imported into Tableau for visualization and further calculations
The mock data for training the model was created using the two scripts below:
The batch prediction set of which the the report data is comprised is altered slightly both up and down for metrics to show enough variance that would simulate normal changes in dealerships. An example of this can be seen in modifying slight the mu and sigma of the normalized distribution in the sample file below:
The log results of the ML training can be found here
In order to visualize the results of the scoring system a Tableau dashboard was created to see how the process works with mock data. A sample image is shown below and the full hosted report can be found on Tableau public here.

