IODA Mozilla

Prerequisites
Scraper script arguments
Script overview

Prerequisites

To obtain Mozilla data from BigQuery using the scraper script, you will first need to generate the Application Default Credentials (ADC) for authentication. For more information about ADCs: https://cloud.google.com/docs/authentication/application-default-credentials

Install the Google Cloud CLI on the machine it will run on.
- https://cloud.google.com/sdk/docs/install
Create local authentication credentials for the principal user account by entering the following in the terminal:
```
gcloud auth application-default login
```
A Google sign-in screen should open in your web browser. Select the relevant user account. After you land on the page indicating successful authentication, the Authorization Default Credentials (ADC) for the account is created. You can find the generated application_default_credentials.json file in the following paths:

Linux/MacOS:

$HOME/.config/gcloud/application_default_credentials.json

Windows:

%APPDATA%\gcloud\application_default_credentials.json

In the shell script to set up the Docker container, ensure that the following line is included to map the local ADC to the corresponding Docker directory:

Linux/MacOS:

-v "$HOME/.config/gcloud/application_default_credentials.json:/root/.config/gcloud/application_default_credentials.json`

Windows:

-v "C:\Users\d\AppData\Roaming\gcloud\application_default_credentials.json:/root/.config/gcloud/application_default_credentials.json"

Running locally

https://cloud.google.com/docs/authentication/set-up-adc-local-dev-environment

Running in a containerized environment

https://cloud.google.com/docs/authentication/set-up-adc-containerized-environment

Scraper script arguments

In the Docker container script, some key arguments can be specified:

--projectid, which is your own GCP Project ID. Please note that this is required.
--endtime, a Unix timestamp corresponding to the end time of the data to be fetched. If not provided, defaults to the current time.
--starttime, a Unix timestamp corresponding to the starting time of the data to be fetched. If not provided, defaults to the end time minus the lookback period (in days).

Note:

The default lookback period is 2 days.

If you would like to change this, please update DEFAULT_LOOKBACK_PERIOD in constants.py.

Other utility arguments you could also specify are:

--debug. Specify y to enable debug mode that prints the fetched and processed data to the command line. If debug mode is activated, Data is also not pushed to Kafka.
--savedata. Specify y to enable save mode for saving fetched (and unprocessed) Mozilla data with all metrics as a .csv file in the /data directory.

Script overview

The script does the following:

Fetches data for all countries from the specified (or default) timestamp.

Note:

The default configuration fetches data from all countries.

If you would like to fetch data from a specific country, please change the country_code parameter for the fetchData function in main (line 283).

Aggregates data into two separate DataFrames, one country-aggregated and one region-aggregated.
Calculates relevant metrics for each country and region.
The processed country & region-aggregated data are then combined into a single dictionary, where country-aggregated data comes first, followed by region-aggregated data.
The combined data is sent to Kafka.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
README.md		README.md
constants.py		constants.py
mozillaScraper.py		mozillaScraper.py
mozilla_to_ne_map_20250325200212.csv		mozilla_to_ne_map_20250325200212.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IODA Mozilla

Prerequisites

Running locally

Running in a containerized environment

Scraper script arguments

Script overview

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

InetIntel/ioda-mozilla

Folders and files

Latest commit

History

Repository files navigation

IODA Mozilla

Prerequisites

Running locally

Running in a containerized environment

Scraper script arguments

Script overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages