feat: add postcode lookup for durham council#2067
Conversation
Users can provide either UPRN or postcode + house number. UPRN takes priority when provided (backward compatible). Uses Durham's JSON-RPC PostcodeLookup endpoint to resolve postcode to UPRN, then uses existing Selenium bin page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughDurham Council's bin collection scraper is refactored to accept postcode and house number inputs, resolving UPRN via Durham's JSON-RPC endpoint. Page fetching switches from direct HTTP requests to Selenium WebDriver with explicit waits. Bin date extraction is simplified, and test configuration is updated to document the new input requirements. ChangesDurham Council WebDriver and postcode lookup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2067 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 724: The Durham note currently describes input shape but omits the new
runtime Selenium webdriver dependency; update the "wiki_note" text so it still
instructs users to provide postcode/house number/UPRN and explicitly states that
a Selenium webdriver is required at runtime because DurhamCouncil.py now always
calls create_webdriver(...). Edit the note to add a short sentence like "A
Selenium webdriver is required to run this scraper (created via
create_webdriver(...))" so readers know about the external runtime dependency.
In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`:
- Around line 38-47: The second-pass fallback currently does a raw substring
check (paon_norm in addr) which causes false positives (e.g., "2" matching
"12"); change it to match whole tokens or word boundaries instead: for each
entry in entries, normalize entry["address"] to uppercase, then either split the
address into tokens and check if paon_norm equals any token, or use a regex
word-boundary match (e.g., r'\b{paon_norm}\b') against addr; keep the initial
strict startswith checks unchanged and still return entry["uprn"] when a
whole-token/word-boundary match is found.
- Around line 54-61: The postcode-to-UPRN lookup in DurhamCouncil.__init__ only
reads kwargs.get("paon") so supplied house_number is ignored; update the
argument handling to accept house_number too (e.g., read
kwargs.get("house_number") and prefer paon if present, otherwise use
house_number) and pass that resolved value into _resolve_uprn_from_postcode when
calling it; adjust the local variables (user_paon/user_house_number) used around
the user_postcode branch so _resolve_uprn_from_postcode receives the intended
house identifier.
- Around line 29-36: The parser currently zips uprns and addrs which can
silently truncate mismatched lists; update the logic in DurhamCouncil.py to
validate that the counts of uprns and addrs are equal before building entries
(check len(uprns) == len(addrs)) and raise a ValueError with a clear message
including the postcode if they differ, instead of proceeding to the for u, a in
zip(uprns, addrs) loop that creates entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bba7852b-d20d-4577-815b-fc699e266f66
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py
| "web_driver": "http://localhost:4444", | ||
| "wiki_name": "County Durham", | ||
| "wiki_note": "Pass the UPRN. You will need to use [FindMyAddress](https://www.findmyaddress.co.uk/search)." | ||
| "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required." |
There was a problem hiding this comment.
Document the new Selenium requirement in the Durham note.
DurhamCouncil.py now always goes through create_webdriver(...), so this note should still tell users that a Selenium webdriver is required. As written, the new instructions make the input shape clear but hide the new runtime dependency.
📝 Suggested note update
- "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required."
+ "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required." | |
| "wiki_note": "Provide your postcode and house number. UPRN is also accepted but no longer required. This parser requires a Selenium webdriver." |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/tests/input.json` at line 724, The Durham note currently
describes input shape but omits the new runtime Selenium webdriver dependency;
update the "wiki_note" text so it still instructs users to provide
postcode/house number/UPRN and explicitly states that a Selenium webdriver is
required at runtime because DurhamCouncil.py now always calls
create_webdriver(...). Edit the note to add a short sentence like "A Selenium
webdriver is required to run this scraper (created via create_webdriver(...))"
so readers know about the external runtime dependency.
| uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml) | ||
| addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml) | ||
| if not uprns: | ||
| raise ValueError(f"No addresses found for postcode: {postcode}") | ||
|
|
||
| entries = [] | ||
| for u, a in zip(uprns, addrs): | ||
| entries.append({"address": a, "uprn": u}) |
There was a problem hiding this comment.
Fail fast if Durham returns mismatched address and UPRN counts.
zip(uprns, addrs) silently truncates on a partial parse, so a small upstream response change can drop candidates or mispair an address with the wrong UPRN. This parser should reject that response shape explicitly instead of continuing with corrupted data.
🛡️ Minimal guard
uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml)
addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml)
if not uprns:
raise ValueError(f"No addresses found for postcode: {postcode}")
+ if len(uprns) != len(addrs):
+ raise ValueError(
+ f"Unexpected Durham postcode lookup response for {postcode}: "
+ f"{len(uprns)} UPRNs but {len(addrs)} formatted addresses"
+ )
entries = []
for u, a in zip(uprns, addrs):
entries.append({"address": a, "uprn": u})Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml) | |
| addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml) | |
| if not uprns: | |
| raise ValueError(f"No addresses found for postcode: {postcode}") | |
| entries = [] | |
| for u, a in zip(uprns, addrs): | |
| entries.append({"address": a, "uprn": u}) | |
| uprns = re.findall(r"<uprn[^>]*>([^<]+)</uprn>", result_xml) | |
| addrs = re.findall(r"<formatted_address[^>]*>([^<]+)</formatted_address>", result_xml) | |
| if not uprns: | |
| raise ValueError(f"No addresses found for postcode: {postcode}") | |
| if len(uprns) != len(addrs): | |
| raise ValueError( | |
| f"Unexpected Durham postcode lookup response for {postcode}: " | |
| f"{len(uprns)} UPRNs but {len(addrs)} formatted addresses" | |
| ) | |
| entries = [] | |
| for u, a in zip(uprns, addrs): | |
| entries.append({"address": a, "uprn": u}) |
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 35-35: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines
29 - 36, The parser currently zips uprns and addrs which can silently truncate
mismatched lists; update the logic in DurhamCouncil.py to validate that the
counts of uprns and addrs are equal before building entries (check len(uprns) ==
len(addrs)) and raise a ValueError with a clear message including the postcode
if they differ, instead of proceeding to the for u, a in zip(uprns, addrs) loop
that creates entries.
| if paon: | ||
| paon_norm = str(paon).strip().upper() | ||
| for entry in entries: | ||
| addr = entry["address"].upper() | ||
| if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","): | ||
| return entry["uprn"] | ||
| for entry in entries: | ||
| addr = entry["address"].upper() | ||
| if paon_norm in addr: | ||
| return entry["uprn"] |
There was a problem hiding this comment.
Don't use raw substring matching for paon.
The second pass makes values like 2 match 12, 20, flat numbers, etc., so a failed exact match can still resolve to the wrong property. That is riskier than the documented “exact match, otherwise first result” behavior.
🎯 Safer fallback
if paon:
paon_norm = str(paon).strip().upper()
for entry in entries:
addr = entry["address"].upper()
if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","):
return entry["uprn"]
- for entry in entries:
- addr = entry["address"].upper()
- if paon_norm in addr:
- return entry["uprn"]
return entries[0]["uprn"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if paon: | |
| paon_norm = str(paon).strip().upper() | |
| for entry in entries: | |
| addr = entry["address"].upper() | |
| if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","): | |
| return entry["uprn"] | |
| for entry in entries: | |
| addr = entry["address"].upper() | |
| if paon_norm in addr: | |
| return entry["uprn"] | |
| if paon: | |
| paon_norm = str(paon).strip().upper() | |
| for entry in entries: | |
| addr = entry["address"].upper() | |
| if addr.startswith(paon_norm + " ") or addr.startswith(paon_norm + ","): | |
| return entry["uprn"] | |
| return entries[0]["uprn"] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines
38 - 47, The second-pass fallback currently does a raw substring check
(paon_norm in addr) which causes false positives (e.g., "2" matching "12");
change it to match whole tokens or word boundaries instead: for each entry in
entries, normalize entry["address"] to uppercase, then either split the address
into tokens and check if paon_norm equals any token, or use a regex
word-boundary match (e.g., r'\b{paon_norm}\b') against addr; keep the initial
strict startswith checks unchanged and still return entry["uprn"] when a
whole-token/word-boundary match is found.
| user_uprn = kwargs.get("uprn") | ||
| user_postcode = kwargs.get("postcode") | ||
| user_paon = kwargs.get("paon") | ||
| headless = kwargs.get("headless") | ||
| web_driver = kwargs.get("web_driver") | ||
|
|
||
| if not user_uprn and user_postcode: | ||
| user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon) |
There was a problem hiding this comment.
Accept house_number here as well as paon.
The new Durham fixture and note both use house_number, but this code only reads paon. In that path the lookup ignores the user-supplied house number and can fall back to the first postcode result even when a specific property was provided.
🔧 Minimal fix
- user_paon = kwargs.get("paon")
+ user_paon = kwargs.get("paon") or kwargs.get("house_number")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| user_uprn = kwargs.get("uprn") | |
| user_postcode = kwargs.get("postcode") | |
| user_paon = kwargs.get("paon") | |
| headless = kwargs.get("headless") | |
| web_driver = kwargs.get("web_driver") | |
| if not user_uprn and user_postcode: | |
| user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon) | |
| user_uprn = kwargs.get("uprn") | |
| user_postcode = kwargs.get("postcode") | |
| user_paon = kwargs.get("paon") or kwargs.get("house_number") | |
| headless = kwargs.get("headless") | |
| web_driver = kwargs.get("web_driver") | |
| if not user_uprn and user_postcode: | |
| user_uprn = _resolve_uprn_from_postcode(user_postcode, user_paon) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines
54 - 61, The postcode-to-UPRN lookup in DurhamCouncil.__init__ only reads
kwargs.get("paon") so supplied house_number is ignored; update the argument
handling to accept house_number too (e.g., read kwargs.get("house_number") and
prefer paon if present, otherwise use house_number) and pass that resolved value
into _resolve_uprn_from_postcode when calling it; adjust the local variables
(user_paon/user_house_number) used around the user_postcode branch so
_resolve_uprn_from_postcode receives the intended house identifier.
Summary
durham.Localities.PostcodeLookupendpoint for address resolution?uprn=paramTesting
200003218818✅DH7 6TH+ paon2✅Summary by CodeRabbit