Last updated: November 24th, 2021
So, you started learning Python. Now what?
The Python for Journalists MOOC I got to create for Datajournalism.com has found its way to journalists eager to learn Python. Which off course is great. But when you're done with that course, what's next?
In this living repository I'll collect Jupyter Notebooks that contain code that I actually wrote and used in the newsroom. To make sure it will help you learn Python, I'll add comments, links and tips to my code. If I do this right, my code should become a workbook in which you can practice. :)
Some Notebooks will be in Dutch, others will be in English. An overview of all of them can be found here:
| Notebook | Content | Language |
|---|---|---|
| 201006 two step scraper notebookcheck.ipynb | My friend A. wanted to scrape notebookcheck.com, to get some specs on smartphones. Figured you might want to hop along on this quirky scrape-journey. Shows how to scrape non-tabular data, uses regular expressions. | EN |
| 200923 Webscraping with Python - Dataharvest 2020.ipynb | This Notebook contains a scraper to collect all the program of the 2020 DataHarvest+ European Investigative Journalism Conference. It was created for a DataHarvest+ session on Python, taught together with the wonderful Adriana Homolova. FYI: There also exists an empty version of this notebook. (Oh, and a final note, you'll find all material used in the Python sessions of DataHarvest+ 2020 taught by Adriana and yours truly here.) | EN |
| 200923 Scraper info Dutch municipalities.ipynb | This Notebook contains a scraper to collect all contactinfo; list of political parties and the number of seats they have; and names, parties and function of every single city councillor for every county (gemeente) in the Netherlands from almanak.overheid.nl. Also written for the Dataharvest+ EIJC 2020. :) | EN |
| 190722 Almanak Gemeenten Scraper + Toelichting.ipynb | This Notebook contains a scraper to collect all contactinfo for every county (gemeente) in the Netherlands from almanak.overheid.nl. Builds upon the 4th module of the Python for Journalists MOOC. | NL |
| 190723 Almanak Provincies Scraper + Toelichting.ipynb | This Notebook contains a scraper to collect all contactinfo for every province (provincie) in the Netherlands from almanak.overheid.nl. Builds both upon the 4th module of the Python for Journalists MOOC and the county scraper 190722 Almanak Gemeenten Scraper + Toelichting.ipynb. | NL |
| Search Script Scrape Notebook.ipynb | This notebook contains answers to some of the 101 webscraping and research tasks that were an exercise part of the Stanford Computational Journalism Lab. | EN |