Just wanted to comment that the job ran in 2 minutes, instead of 30 minutes as mentioned in the instructions. Could be due to Spark 2.4 + Python 3.