feat: add postcode lookup for durham council by InertiaUK · Pull Request #2067 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-12T16:22:44Z

Summary

Users can provide either UPRN or postcode + house number
UPRN takes priority when provided (backward compatible)
Uses Durham's JSON-RPC durham.Localities.PostcodeLookup endpoint for address resolution
Bin data still fetched via existing Selenium page with ?uprn= param
Falls back to first result if no match found

Testing

UPRN path (backward compat): UPRN 200003218818 ✅
Postcode + house number: DH7 6TH + paon 2 ✅
Tested via API end-to-end ✅

Summary by CodeRabbit

New Features
- Durham Council now accepts postcode and house number as input instead of requiring UPRN
- UPRN is now optional and will be automatically resolved when not provided

Users can provide either UPRN or postcode + house number. UPRN takes priority when provided (backward compatible). Uses Durham's JSON-RPC PostcodeLookup endpoint to resolve postcode to UPRN, then uses existing Selenium bin page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-12T16:22:56Z

📝 Walkthrough

Walkthrough

Durham Council's bin collection scraper is refactored to accept postcode and house number inputs, resolving UPRN via Durham's JSON-RPC endpoint. Page fetching switches from direct HTTP requests to Selenium WebDriver with explicit waits. Bin date extraction is simplified, and test configuration is updated to document the new input requirements.

Changes

Durham Council WebDriver and postcode lookup

Layer / File(s)	Summary
UPRN resolution helper and endpoint constant `uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`	`AJAX_URL` constant and `_resolve_uprn_from_postcode` helper function added. The helper performs JSON-RPC POST requests to Durham's endpoint to resolve UPRNs from a postcode (optionally filtered by normalized house number) and raises `ValueError` when no matches are found.
WebDriver-based parse_data with UPRN resolution `uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`	`CouncilClass.parse_data` refactored to extract input parameters from kwargs, resolve missing UPRN via postcode lookup when available, use Selenium WebDriver for page fetching with explicit waits for page elements, and ensure proper driver cleanup in finally block.
Bin date extraction refactoring `uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`	Bin collection date extraction simplified: regex-matched dates are parsed to datetime and directly appended to bins list with type and collectionDate fields.
Test fixture configuration update `uk_bin_collection/tests/input.json`	DurhamCouncil entry updated with `postcode` and `house_number` examples, web_driver set to `http://localhost:4444`, and `wiki_note` changed to reflect postcode/house number as primary inputs with UPRN now optional.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

robbrad/UKBinCollectionData#1660: Related migration from requests/BeautifulSoup to Selenium-driven parse_data flow with WebDriver waits and lifecycle handling.

Poem

🐰 A postcode whispers to Durham's RPC door,
UPRN resolved, no UPRN guessing store!
Selenium steers the browser through the page,
WebDriver waits for bins—collected, staged! 🗑️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately reflects the main change: adding postcode lookup functionality for Durham Council.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-12T16:24:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (a9ccbfa).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2067   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 724: The Durham note currently describes input shape but omits the new
runtime Selenium webdriver dependency; update the "wiki_note" text so it still
instructs users to provide postcode/house number/UPRN and explicitly states that
a Selenium webdriver is required at runtime because DurhamCouncil.py now always
calls create_webdriver(...). Edit the note to add a short sentence like "A
Selenium webdriver is required to run this scraper (created via
create_webdriver(...))" so readers know about the external runtime dependency.

In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`:
- Around line 38-47: The second-pass fallback currently does a raw substring
check (paon_norm in addr) which causes false positives (e.g., "2" matching
"12"); change it to match whole tokens or word boundaries instead: for each
entry in entries, normalize entry["address"] to uppercase, then either split the
address into tokens and check if paon_norm equals any token, or use a regex
word-boundary match (e.g., r'\b{paon_norm}\b') against addr; keep the initial
strict startswith checks unchanged and still return entry["uprn"] when a
whole-token/word-boundary match is found.
- Around line 54-61: The postcode-to-UPRN lookup in DurhamCouncil.__init__ only
reads kwargs.get("paon") so supplied house_number is ignored; update the
argument handling to accept house_number too (e.g., read
kwargs.get("house_number") and prefer paon if present, otherwise use
house_number) and pass that resolved value into _resolve_uprn_from_postcode when
calling it; adjust the local variables (user_paon/user_house_number) used around
the user_postcode branch so _resolve_uprn_from_postcode receives the intended
house identifier.
- Around line 29-36: The parser currently zips uprns and addrs which can
silently truncate mismatched lists; update the logic in DurhamCouncil.py to
validate that the counts of uprns and addrs are equal before building entries
(check len(uprns) == len(addrs)) and raise a ValueError with a clear message
including the postcode if they differ, instead of proceeding to the for u, a in
zip(uprns, addrs) loop that creates entries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bba7852b-d20d-4577-815b-fc699e266f66

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and a9ccbfa.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py

coderabbitai · 2026-05-12T16:26:57Z

+        "web_driver": "http://localhost:4444",
        "wiki_name": "County Durham",
-        "wiki_note": "Pass the UPRN. You will need to use [FindMyAddress](https://www.findmyaddress.co.uk/search)."
+        "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the new Selenium requirement in the Durham note.

DurhamCouncil.py now always goes through create_webdriver(...), so this note should still tell users that a Selenium webdriver is required. As written, the new instructions make the input shape clear but hide the new runtime dependency.

📝 Suggested note update

- "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required." + "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."

"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/tests/input.json` at line 724, The Durham note currently describes input shape but omits the new runtime Selenium webdriver dependency; update the "wiki_note" text so it still instructs users to provide postcode/house number/UPRN and explicitly states that a Selenium webdriver is required at runtime because DurhamCouncil.py now always calls create_webdriver(...). Edit the note to add a short sentence like "A Selenium webdriver is required to run this scraper (created via create_webdriver(...))" so readers know about the external runtime dependency.

coderabbitai · 2026-05-12T16:26:57Z

+    uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)
+    addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)
+    if not uprns:
+        raise ValueError(f"No addresses found for postcode: {postcode}")
+
+    entries = []
+    for u, a in zip(uprns, addrs):
+        entries.append({"address": a, "uprn": u})


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if Durham returns mismatched address and UPRN counts.

zip(uprns, addrs) silently truncates on a partial parse, so a small upstream response change can drop candidates or mispair an address with the wrong UPRN. This parser should reject that response shape explicitly instead of continuing with corrupted data.

🛡️ Minimal guard

uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml) addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml) if not uprns: raise ValueError(f"No addresses found for postcode: {postcode}") + if len(uprns) != len(addrs): + raise ValueError( + f"Unexpected Durham postcode lookup response for {postcode}: " + f"{len(uprns)} UPRNs but {len(addrs)} formatted addresses" + ) entries = [] for u, a in zip(uprns, addrs): entries.append({"address": a, "uprn": u})

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)

addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)

if not uprns:

raise ValueError(f"No addresses found for postcode: {postcode}")

entries = []

for u, a in zip(uprns, addrs):

entries.append({"address": a, "uprn": u})

uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)

addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)

if not uprns:

raise ValueError(f"No addresses found for postcode: {postcode}")

if len(uprns) != len(addrs):

raise ValueError(

f"Unexpected Durham postcode lookup response for {postcode}: "

f"{len(uprns)} UPRNs but {len(addrs)} formatted addresses"

)

entries = []

for u, a in zip(uprns, addrs):

entries.append({"address": a, "uprn": u})

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 35-35: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines 29 - 36, The parser currently zips uprns and addrs which can silently truncate mismatched lists; update the logic in DurhamCouncil.py to validate that the counts of uprns and addrs are equal before building entries (check len(uprns) == len(addrs)) and raise a ValueError with a clear message including the postcode if they differ, instead of proceeding to the for u, a in zip(uprns, addrs) loop that creates entries.

coderabbitai · 2026-05-12T16:26:57Z

+    if paon:
+        paon_norm = str(paon).strip().upper()
+        for entry in entries:
+            addr = entry["address"].upper()
+            if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):
+                return entry["uprn"]
+        for entry in entries:
+            addr = entry["address"].upper()
+            if paon_norm in addr:
+                return entry["uprn"]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't use raw substring matching for paon.

The second pass makes values like 2 match 12, 20, flat numbers, etc., so a failed exact match can still resolve to the wrong property. That is riskier than the documented “exact match, otherwise first result” behavior.

🎯 Safer fallback

if paon: paon_norm = str(paon).strip().upper() for entry in entries: addr = entry["address"].upper() if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","): return entry["uprn"] - for entry in entries: - addr = entry["address"].upper() - if paon_norm in addr: - return entry["uprn"] return entries[0]["uprn"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if paon:

paon_norm = str(paon).strip().upper()

for entry in entries:

addr = entry["address"].upper()

if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):

return entry["uprn"]

for entry in entries:

addr = entry["address"].upper()

if paon_norm in addr:

return entry["uprn"]

if paon:

paon_norm = str(paon).strip().upper()

for entry in entries:

addr = entry["address"].upper()

if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):

return entry["uprn"]

return entries[0]["uprn"]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines 38 - 47, The second-pass fallback currently does a raw substring check (paon_norm in addr) which causes false positives (e.g., "2" matching "12"); change it to match whole tokens or word boundaries instead: for each entry in entries, normalize entry["address"] to uppercase, then either split the address into tokens and check if paon_norm equals any token, or use a regex word-boundary match (e.g., r'\b{paon_norm}\b') against addr; keep the initial strict startswith checks unchanged and still return entry["uprn"] when a whole-token/word-boundary match is found.

coderabbitai · 2026-05-12T16:26:57Z

+        user_uprn = kwargs.get("uprn")
+        user_postcode = kwargs.get("postcode")
+        user_paon = kwargs.get("paon")
+        headless = kwargs.get("headless")
+        web_driver = kwargs.get("web_driver")
+
+        if not user_uprn and user_postcode:
+            user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Accept house_number here as well as paon.

The new Durham fixture and note both use house_number, but this code only reads paon. In that path the lookup ignores the user-supplied house number and can fall back to the first postcode result even when a specific property was provided.

🔧 Minimal fix

- user_paon = kwargs.get("paon") + user_paon = kwargs.get("paon") or kwargs.get("house_number")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

user_uprn = kwargs.get("uprn")

user_postcode = kwargs.get("postcode")

user_paon = kwargs.get("paon")

headless = kwargs.get("headless")

web_driver = kwargs.get("web_driver")

if not user_uprn and user_postcode:

user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon)

user_uprn = kwargs.get("uprn")

user_postcode = kwargs.get("postcode")

user_paon = kwargs.get("paon") or kwargs.get("house_number")

headless = kwargs.get("headless")

web_driver = kwargs.get("web_driver")

if not user_uprn and user_postcode:

user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines 54 - 61, The postcode-to-UPRN lookup in DurhamCouncil.__init__ only reads kwargs.get("paon") so supplied house_number is ignored; update the argument handling to accept house_number too (e.g., read kwargs.get("house_number") and prefer paon if present, otherwise use house_number) and pass that resolved value into _resolve_uprn_from_postcode when calling it; adjust the local variables (user_paon/user_house_number) used around the user_postcode branch so _resolve_uprn_from_postcode receives the intended house identifier.

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add postcode lookup for durham council#2067

feat: add postcode lookup for durham council#2067
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:feat/durham-postcode-lookup

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."
	"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver."

Conversation

InertiaUK commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading