Keep Norwegian legal documents in sync. Automatically.
Lovdata publishes Norwegian laws and regulations as downloadable datasets. These datasets update frequently—laws change, new regulations appear, old ones are amended. Manually tracking what changed is tedious and error-prone.
This tool solves that:
- Download datasets from Lovdata's API
- Extract compressed archives automatically
- Track exactly which files were added, modified, or removed
- Sync incrementally—only process what changed since last run
Built for developers building legal tech, researchers analyzing legal data, or anyone who needs reliable, up-to-date Norwegian legal documents.
Lovdata provides bulk dataset downloads—not individual file checksums or change feeds. This means:
- Filenames are stable identifiers -
LOV-1999-07-02-63.xmlnever changes, even when the law's content is amended - No file-level change detection - The API only provides dataset-level
lastModifiedtimestamps - All changes are detected - Both regulatory amendments and Lovdata's editorial updates trigger change notifications
Why content hashing? Since the API doesn't provide checksums or change indicators for individual files, this tool computes xxHash for each file to detect modifications. This is the only reliable way to identify which specific documents changed.
Result: You get precise change detection, but cannot programmatically distinguish between legal amendments and editorial updates. Manual review recommended for critical changes.
Official API docs: https://api.lovdata.no/publicData
# Install dependencies
uv sync
# Sync everything (first run downloads all datasets)
uv run lov update
# Second run? Only processes changes
uv run lov updateThat's it. Your datasets are in data/extracted/, and you know exactly what changed.
- Download - Fetches dataset archives from Lovdata's API
- Extract - Uncompresses tar.bz2 files into organized directories
- Track - Computes xxHash for each file and compares against previous hashes
- Report - Shows exactly what changed since last run
After running lov update, you get:
data/
├── raw/
│ ├── gjeldende-lover.tar.bz2 # Downloaded archives
│ └── gjeldende-sentrale-forskrifter.tar.bz2
├── extracted/
│ ├── gjeldende-lover/
│ │ └── nl/
│ │ ├── nl-18840614-003.xml # Extracted legal documents
│ │ ├── nl-18880623-003.xml
│ │ └── ...
│ └── gjeldende-sentrale-forskrifter/
│ └── sf/
│ └── ...
└── state.json # Tracks file changes
# List all added files
uv run lov files list --status added
# See statistics
uv run lov files statsOutput example:
Dataset Statistics
┌────────────────────┬───────┬──────────┬─────────┐
│ Dataset │ Added │ Modified │ Removed │
├────────────────────┼───────┼──────────┼─────────┤
│ gjeldende-lover │ 245 │ 12 │ 3 │
└────────────────────┴───────┴──────────┴─────────┘
When Lovdata removes documents from their dataset, they linger in data/extracted/. Clean them up:
# Preview what would be deleted
uv run lov files prune --dry-run
# Actually remove them
uv run lov files pruneOnly want "gjeldende" (current) laws?
# Set in .env
LOVDATA_DATASET_FILTER=gjeldendeAlready have datasets but want fresh copies?
uv run lov update --forcefrom lovlig import sync_datasets
# Downloads, extracts, tracks changes - all in one call
sync_datasets()What happens:
- Downloads all datasets from Lovdata API
- Extracts archives to
data/extracted/ - Updates
data/state.jsonwith file hashes - Shows progress bars and summary
from lovlig import sync_datasets, Settings
from pathlib import Path
config = Settings(
dataset_filter="gjeldende", # Only current laws
raw_data_dir=Path("my_data/archives"),
extracted_data_dir=Path("my_data/docs"),
max_download_concurrency=8 # Download 8 files at once
)
sync_datasets(config=config)Perfect for automation—run this after syncing to see what's new:
from lovlig import Settings, StateManager, FileStatus
from lovlig.domain.services import FileQueryService
config = Settings()
query = FileQueryService()
with StateManager(config.state_file) as state:
# Get files added in the last sync
new_files = query.get_files_by_filter(
state=state.data,
status="added",
limit=100
)
for file_meta in new_files:
print(f"New: {file_meta.path}")
# Get statistics per dataset
stats = query.get_dataset_statistics(state.data)
for dataset_name, counts in stats.items():
print(f"{dataset_name}: {counts['added']} added, {counts['modified']} modified")from lovlig import DatasetSync, Settings
from pathlib import Path
# 1. Configure
config = Settings(dataset_filter="gjeldende")
orchestrator = DatasetSync(config)
# 2. Sync datasets
orchestrator.sync_datasets(force_download=False)
# 3. Process new files
from lovlig import StateManager
from lovlig.domain.services import FileQueryService
query = FileQueryService()
with StateManager(config.state_file) as state:
added_files = query.get_files_by_filter(state=state.data, status="added")
for file_meta in added_files:
xml_path = config.extracted_data_dir / file_meta.path
# Your processing logic here
process_legal_document(xml_path)Real-world use cases:
- Legal search engine: Index new/modified documents in Elasticsearch
- Change notifications: Email alerts when specific laws are updated
- Data warehouse: ETL pipeline to load legal data into database
- Analysis: Track trends in legal changes over time
See examples/sdk_usage.py for complete examples.
Create a .env file for persistent settings:
# API settings
LOVDATA_API_TIMEOUT=30
# Dataset filtering
LOVDATA_DATASET_FILTER=gjeldende # Only "gjeldende" datasets
# LOVDATA_DATASET_FILTER=null # All datasets
# Performance
LOVDATA_MAX_DOWNLOAD_CONCURRENCY=4 # Parallel downloadsfrom lovlig import Settings
# Override environment variables
settings = Settings(
api_timeout=60,
max_download_concurrency=8
)| Variable | Default | What It Does |
|---|---|---|
LOVDATA_API_TIMEOUT |
30 |
Seconds before API requests timeout |
LOVDATA_DATASET_FILTER |
gjeldende |
Filter datasets by name (or null = all) |
LOVDATA_MAX_DOWNLOAD_CONCURRENCY |
4 |
How many files to download in parallel |
See "Important: How Lovdata's API Works" section above for context on API constraints.
- Not real-time - Requires periodic polling (no webhooks available)
- Storage needed - Local hashes in
state.json(~5-15 MB) + extracted XML files (several GB) - Change detection only - Identifies that a file changed, not what or why
- Full dataset downloads - Entire archives must be re-downloaded when any file changes
Q: What happens if a sync is interrupted? A: State is only saved after successful completion. Just run again.
Q: Can I filter to only track specific laws? A: Yes, use the query API after syncing to filter by status, dataset name, or file patterns. See examples in "Use as Python SDK" section above.
MIT
