The cm-data-ingestion package provides easy data ingestion from various geodata sources. This package is based on the great dlt framework (https://dlthub.com/), so all custom sources can be easily pipelined into the standard destinations supported by dlt (https://dlthub.com/docs/dlt-ecosystem/destinations/). The main idea behind this package is to break siloed geodata and make it easily accessible in standard data environments for further visualization or analytics. The package is also built with the analytics as code approach in mind.
Currently supported Sources:
- OvertureMaps https://overturemaps.org/
- OpenStreetMap https://www.openstreetmap.org/
- WorldPop https://www.worldpop.org/
- GTFS MobilityDatabase https://mobilitydatabase.org/
- GeoBoundaries https://www.geoboundaries.org/
Using these Sources, you can ingest geodata from heterogeneous systems through a single, standardized ingestion logic, consistent with all other sources.
Prepared dbt staging models are also available in the dbt folder. These models perform basic normalization of the raw data ingested from the Sources and can be optionally used as part of your downstream transformation pipeline.
Run the following command to install the cm-data-ingestion package on your system:
pip install git+https://github.com/clevermaps/cm-data-ingestion.gitSetup credentials in ./dlt/secrets.toml and then just call:
from cm_data_ingestion.pipelines.pipeline import ingest_ovm
import dlt
config = {
"items": [
{"theme": "divisions", "type": "division_area"}
],
"options": {
"release": "2025-10-22.0",
"bbox": [12.084961,48.458352,19.028320,51.179343]
}
}
ingest_ovm('duckdb', config)
ingest_ovm('postgres', config)
ingest_ovm('filesystem', config)
More detailed documentation can be found in docs folder.
Example configurations and usage can be found in examples, demonstrating how to set up ingestion for various sources.
- Python 3 for core logic and scripting.
- DuckDB for embedded analytical database capabilities.
- PyArrow and related libraries for efficient data handling.
- Requests and other HTTP libraries for API communication.
- Pytest for unit testing.
The modular design allows easy addition of new data sources or pipelines by adhering to established interfaces and patterns.
0.0.1 Initial version