Skip to content

feat: add postcode lookup for durham council#2067

Open
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:feat/durham-postcode-lookup
Open

feat: add postcode lookup for durham council#2067
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:feat/durham-postcode-lookup

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

Summary

  • Users can provide either UPRN or postcode + house number
  • UPRN takes priority when provided (backward compatible)
  • Uses Durham's JSON-RPC durham.Localities.PostcodeLookup endpoint for address resolution
  • Bin data still fetched via existing Selenium page with ?uprn= param
  • Falls back to first result if no match found

Testing

  • UPRN path (backward compat): UPRN 200003218818
  • Postcode + house number: DH7 6TH + paon 2
  • Tested via API end-to-end ✅

Summary by CodeRabbit

  • New Features
    • Durham Council now accepts postcode and house number as input instead of requiring UPRN
    • UPRN is now optional and will be automatically resolved when not provided

Review Change Stack

Users can provide either UPRN or postcode + house number.
UPRN takes priority when provided (backward compatible).
Uses Durham's JSON-RPC PostcodeLookup endpoint to resolve
postcode to UPRN, then uses existing Selenium bin page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

Durham Council's bin collection scraper is refactored to accept postcode and house number inputs, resolving UPRN via Durham's JSON-RPC endpoint. Page fetching switches from direct HTTP requests to Selenium WebDriver with explicit waits. Bin date extraction is simplified, and test configuration is updated to document the new input requirements.

Changes

Durham Council WebDriver and postcode lookup

Layer / File(s) Summary
UPRN resolution helper and endpoint constant
uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py
AJAX_URL constant and _resolve_uprn_from_postcode helper function added. The helper performs JSON-RPC POST requests to Durham's endpoint to resolve UPRNs from a postcode (optionally filtered by normalized house number) and raises ValueError when no matches are found.
WebDriver-based parse_data with UPRN resolution
uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py
CouncilClass.parse_data refactored to extract input parameters from kwargs, resolve missing UPRN via postcode lookup when available, use Selenium WebDriver for page fetching with explicit waits for page elements, and ensure proper driver cleanup in finally block.
Bin date extraction refactoring
uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py
Bin collection date extraction simplified: regex-matched dates are parsed to datetime and directly appended to bins list with type and collectionDate fields.
Test fixture configuration update
uk_bin_collection/tests/input.json
DurhamCouncil entry updated with postcode and house_number examples, web_driver set to http://localhost:4444, and wiki_note changed to reflect postcode/house number as primary inputs with UPRN now optional.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • robbrad/UKBinCollectionData#1660: Related migration from requests/BeautifulSoup to Selenium-driven parse_data flow with WebDriver waits and lifecycle handling.

Poem

🐰 A postcode whispers to Durham's RPC door,
UPRN resolved, no UPRN guessing store!
Selenium steers the browser through the page,
WebDriver waits for bins—collected, staged! 🗑️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately reflects the main change: adding postcode lookup functionality for Durham Council.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (a9ccbfa).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2067   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 724: The Durham note currently describes input shape but omits the new
runtime Selenium webdriver dependency; update the "wiki_note" text so it still
instructs users to provide postcode/house number/UPRN and explicitly states that
a Selenium webdriver is required at runtime because DurhamCouncil.py now always
calls create_webdriver(...). Edit the note to add a short sentence like "A
Selenium webdriver is required to run this scraper (created via
create_webdriver(...))" so readers know about the external runtime dependency.

In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`:
- Around line 38-47: The second-pass fallback currently does a raw substring
check (paon_norm in addr) which causes false positives (e.g., "2" matching
"12"); change it to match whole tokens or word boundaries instead: for each
entry in entries, normalize entry["address"] to uppercase, then either split the
address into tokens and check if paon_norm equals any token, or use a regex
word-boundary match (e.g., r'\b{paon_norm}\b') against addr; keep the initial
strict startswith checks unchanged and still return entry["uprn"] when a
whole-token/word-boundary match is found.
- Around line 54-61: The postcode-to-UPRN lookup in DurhamCouncil.__init__ only
reads kwargs.get("paon") so supplied house_number is ignored; update the
argument handling to accept house_number too (e.g., read
kwargs.get("house_number") and prefer paon if present, otherwise use
house_number) and pass that resolved value into _resolve_uprn_from_postcode when
calling it; adjust the local variables (user_paon/user_house_number) used around
the user_postcode branch so _resolve_uprn_from_postcode receives the intended
house identifier.
- Around line 29-36: The parser currently zips uprns and addrs which can
silently truncate mismatched lists; update the logic in DurhamCouncil.py to
validate that the counts of uprns and addrs are equal before building entries
(check len(uprns) == len(addrs)) and raise a ValueError with a clear message
including the postcode if they differ, instead of proceeding to the for u, a in
zip(uprns, addrs) loop that creates entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bba7852b-d20d-4577-815b-fc699e266f66

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and a9ccbfa.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py

"web_driver": "http://localhost:4444",
"wiki_name": "County Durham",
"wiki_note": "Pass the UPRN. You will need to use [FindMyAddress](https://www.findmyaddress.co.uk/search)."
"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the new Selenium requirement in the Durham note.

DurhamCouncil.py now always goes through create_webdriver(...), so this note should still tell users that a Selenium webdriver is required. As written, the new instructions make the input shape clear but hide the new runtime dependency.

📝 Suggested note update
-        "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."
+        "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."
"wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/tests/input.json` at line 724, The Durham note currently
describes input shape but omits the new runtime Selenium webdriver dependency;
update the "wiki_note" text so it still instructs users to provide
postcode/house number/UPRN and explicitly states that a Selenium webdriver is
required at runtime because DurhamCouncil.py now always calls
create_webdriver(...). Edit the note to add a short sentence like "A Selenium
webdriver is required to run this scraper (created via create_webdriver(...))"
so readers know about the external runtime dependency.

Comment on lines +29 to +36
uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)
addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)
if not uprns:
raise ValueError(f"No addresses found for postcode: {postcode}")

entries = []
for u, a in zip(uprns, addrs):
entries.append({"address": a, "uprn": u})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if Durham returns mismatched address and UPRN counts.

zip(uprns, addrs) silently truncates on a partial parse, so a small upstream response change can drop candidates or mispair an address with the wrong UPRN. This parser should reject that response shape explicitly instead of continuing with corrupted data.

🛡️ Minimal guard
     uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)
     addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)
     if not uprns:
         raise ValueError(f"No addresses found for postcode: {postcode}")
+    if len(uprns) != len(addrs):
+        raise ValueError(
+            f"Unexpected Durham postcode lookup response for {postcode}: "
+            f"{len(uprns)} UPRNs but {len(addrs)} formatted addresses"
+        )
 
     entries = []
     for u, a in zip(uprns, addrs):
         entries.append({"address": a, "uprn": u})

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)
addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)
if not uprns:
raise ValueError(f"No addresses found for postcode: {postcode}")
entries = []
for u, a in zip(uprns, addrs):
entries.append({"address": a, "uprn": u})
uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)
addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)
if not uprns:
raise ValueError(f"No addresses found for postcode: {postcode}")
if len(uprns) != len(addrs):
raise ValueError(
f"Unexpected Durham postcode lookup response for {postcode}: "
f"{len(uprns)} UPRNs but {len(addrs)} formatted addresses"
)
entries = []
for u, a in zip(uprns, addrs):
entries.append({"address": a, "uprn": u})
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 35-35: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines
29 - 36, The parser currently zips uprns and addrs which can silently truncate
mismatched lists; update the logic in DurhamCouncil.py to validate that the
counts of uprns and addrs are equal before building entries (check len(uprns) ==
len(addrs)) and raise a ValueError with a clear message including the postcode
if they differ, instead of proceeding to the for u, a in zip(uprns, addrs) loop
that creates entries.

Comment on lines +38 to +47
if paon:
paon_norm = str(paon).strip().upper()
for entry in entries:
addr = entry["address"].upper()
if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):
return entry["uprn"]
for entry in entries:
addr = entry["address"].upper()
if paon_norm in addr:
return entry["uprn"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't use raw substring matching for paon.

The second pass makes values like 2 match 12, 20, flat numbers, etc., so a failed exact match can still resolve to the wrong property. That is riskier than the documented “exact match, otherwise first result” behavior.

🎯 Safer fallback
     if paon:
         paon_norm = str(paon).strip().upper()
         for entry in entries:
             addr = entry["address"].upper()
             if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):
                 return entry["uprn"]
-        for entry in entries:
-            addr = entry["address"].upper()
-            if paon_norm in addr:
-                return entry["uprn"]
 
     return entries[0]["uprn"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if paon:
paon_norm = str(paon).strip().upper()
for entry in entries:
addr = entry["address"].upper()
if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):
return entry["uprn"]
for entry in entries:
addr = entry["address"].upper()
if paon_norm in addr:
return entry["uprn"]
if paon:
paon_norm = str(paon).strip().upper()
for entry in entries:
addr = entry["address"].upper()
if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):
return entry["uprn"]
return entries[0]["uprn"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines
38 - 47, The second-pass fallback currently does a raw substring check
(paon_norm in addr) which causes false positives (e.g., "2" matching "12");
change it to match whole tokens or word boundaries instead: for each entry in
entries, normalize entry["address"] to uppercase, then either split the address
into tokens and check if paon_norm equals any token, or use a regex
word-boundary match (e.g., r'\b{paon_norm}\b') against addr; keep the initial
strict startswith checks unchanged and still return entry["uprn"] when a
whole-token/word-boundary match is found.

Comment on lines +54 to +61
user_uprn = kwargs.get("uprn")
user_postcode = kwargs.get("postcode")
user_paon = kwargs.get("paon")
headless = kwargs.get("headless")
web_driver = kwargs.get("web_driver")

if not user_uprn and user_postcode:
user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Accept house_number here as well as paon.

The new Durham fixture and note both use house_number, but this code only reads paon. In that path the lookup ignores the user-supplied house number and can fall back to the first postcode result even when a specific property was provided.

🔧 Minimal fix
-        user_paon = kwargs.get("paon")
+        user_paon = kwargs.get("paon") or kwargs.get("house_number")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user_uprn = kwargs.get("uprn")
user_postcode = kwargs.get("postcode")
user_paon = kwargs.get("paon")
headless = kwargs.get("headless")
web_driver = kwargs.get("web_driver")
if not user_uprn and user_postcode:
user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon)
user_uprn = kwargs.get("uprn")
user_postcode = kwargs.get("postcode")
user_paon = kwargs.get("paon") or kwargs.get("house_number")
headless = kwargs.get("headless")
web_driver = kwargs.get("web_driver")
if not user_uprn and user_postcode:
user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines
54 - 61, The postcode-to-UPRN lookup in DurhamCouncil.__init__ only reads
kwargs.get("paon") so supplied house_number is ignored; update the argument
handling to accept house_number too (e.g., read kwargs.get("house_number") and
prefer paon if present, otherwise use house_number) and pass that resolved value
into _resolve_uprn_from_postcode when calling it; adjust the local variables
(user_paon/user_house_number) used around the user_postcode branch so
_resolve_uprn_from_postcode receives the intended house identifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant