Skip to content

fix(LondonBoroughSutton): add postcode search + fix HTML parsing#2033

Open
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/sutton-postcode-search
Open

fix(LondonBoroughSutton): add postcode search + fix HTML parsing#2033
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/sutton-postcode-search

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

Summary

  • Sutton uses internal property IDs, not ONSUD UPRNs — old scraper required manual UPRN lookup from URL
  • Added postcode search: POST to /waste with postcode, parse address dropdown, match by house number
  • Fixed parent div selector: waste-service-grid instead of generic div (service data is two levels up)
  • Added retry loop keyed on service count, not "Loading" text
  • Updated input.json with postcode + house_number test params

Summary by CodeRabbit

Release Notes

  • Improvements
    • Enhanced London Borough Sutton bin collection lookup to support postcode and house number queries, alongside Sutton-specific UPRN references
    • Improved network resilience with automatic retry logic for transient connection issues
    • Clearer error messages when property information cannot be resolved

Review Change Stack

Sutton uses internal property IDs, not ONSUD UPRNs. Old scraper
required manual UPRN lookup. New scraper searches by postcode,
matches address, then retries page load until bin data appears
(server-side rendering delay). Also fixed parent div selector
(waste-service-grid not generic div).

Updated input.json with postcode + house_number test params.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

LondonBoroughSutton council scraper is refactored to resolve waste collection data using postcode and house number lookup instead of direct UPRN polling. A new property ID resolution helper translates postcode+paon into a property ID by scraping address select options. The parse_data() method now uses resilient HTTP retry with rate-limit handling, validates property resolution, and improves date parsing with year correction. Test configuration is updated with required input fields and documentation.

Changes

LondonBoroughSutton Postcode-Driven Property Lookup

Layer / File(s) Summary
Property ID resolution helper and BASE_URL
uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py
BASE_URL constant targets Sutton's waste site; _resolve_property_id() posts a postcode lookup and selects the matching address value from <select> options based on paon.
parse_data() refactored with retry policy and property ID resolution
uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py
requests.Session is configured with urllib3.Retry for rate-limit and transient-error handling; property ID is resolved from uprn (with "Loading your bin days…" validation) or postcode+paon; service data is fetched and parsed with improved date handling and year correction; bins are sorted by parsed collection date.
Test fixture configuration update
uk_bin_collection/tests/input.json
LondonBoroughSutton test entry adds house_number, postcode, and skip_get_url fields; wiki_note documents postcode+house number and URL-derived UPRN as accepted inputs.

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

  • robbrad/UKBinCollectionData#1715: Both PRs refactor a council scraper to replace HTML polling with UPRN-driven/API-backed flow and add a helper to resolve property IDs from postcode+paon.
  • robbrad/UKBinCollectionData#1691: Both PRs modify LondonBoroughSutton.py's HTTP fetching and parsing logic by adding resilient requests.Session/Retry and changing polling/date parsing behavior.

Suggested reviewers

  • dp247

Poem

🐰 A postcode and paon, together they quest,
Through select-box dropdowns, we find what is best,
With retries resilient and dates that are true,
Sutton's bins shall align with each parse and review! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding postcode search functionality and fixing HTML parsing issues in the LondonBoroughSutton council module.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (74f26f8).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2033   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py`:
- Around line 59-61: The PAON variable may be empty because inputs use
"house_number"; change the kwargs handling in LondonBoroughSutton (where
user_uprn, postcode, paon are read) to set paon = kwargs.get("paon") or
kwargs.get("house_number") so PAON has a fallback before the postcode/property
resolution later in the method (the code that uses paon around the property
lookup on lines ~77-78). Update any logic that relies on paon (property
resolution/path construction) to use this fallback variable.
- Around line 33-36: The current selector silently falls back to the first
candidate by assigning best = val when no PAON match is found; instead, remove
this fallback assignment and after iterating candidates, if no explicit PAON
match was found (i.e., best is still falsy) raise a clear exception (e.g.,
ValueError or a custom ParseError) including the input address/PAON candidates
for diagnostics; reference the variables best and val (and the function that
contains them in LondonBoroughSutton.py) to locate and implement the change so
the parser fails loudly on unexpected address formats rather than returning
another property’s data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9a03625-9600-40b5-8d6f-fb9efb3377c6

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 74f26f8.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py

Comment on lines +33 to +36
if not best and val:
best = val

Raises:
RuntimeError: If the council page still reports "Loading your bin days..." after polite retries.
"""
user_uprn = kwargs.get("uprn")
data = {"bins": []}
return best
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid silently defaulting to the first address option when PAON doesn’t match.

Lines 33-36 can return an arbitrary address (best) instead of failing, which risks returning another property’s collection data.

Suggested fix
-    best = None
+    best = None
     for opt in select.find_all("option"):
         val = opt.get("value", "")
         if not val or val == "missing":
             continue
         text = opt.get_text(strip=True).lower()
         if paon_lower and text.startswith(paon_lower):
             return val
         if not best and val:
             best = val
 
-    return best
+    if paon_lower:
+        return None
+    return best

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not best and val:
best = val
Raises:
RuntimeError: If the council page still reports "Loading your bin days..." after polite retries.
"""
user_uprn = kwargs.get("uprn")
data = {"bins": []}
return best
if not best and val:
best = val
if paon_lower:
return None
return best
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py` around
lines 33 - 36, The current selector silently falls back to the first candidate
by assigning best = val when no PAON match is found; instead, remove this
fallback assignment and after iterating candidates, if no explicit PAON match
was found (i.e., best is still falsy) raise a clear exception (e.g., ValueError
or a custom ParseError) including the input address/PAON candidates for
diagnostics; reference the variables best and val (and the function that
contains them in LondonBoroughSutton.py) to locate and implement the change so
the parser fails loudly on unexpected address formats rather than returning
another property’s data.

Comment on lines +59 to +61
user_uprn = kwargs.get("uprn")
postcode = kwargs.get("postcode")
paon = kwargs.get("paon")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use house_number as a PAON fallback before postcode resolution.

Line 61 only reads paon, but Sutton inputs now provide house_number; this can leave PAON empty and mis-resolve the property on Line 78.

Suggested fix
-        paon = kwargs.get("paon")
+        paon = kwargs.get("paon") or kwargs.get("house_number")

Also applies to: 77-78

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py` around
lines 59 - 61, The PAON variable may be empty because inputs use "house_number";
change the kwargs handling in LondonBoroughSutton (where user_uprn, postcode,
paon are read) to set paon = kwargs.get("paon") or kwargs.get("house_number") so
PAON has a fallback before the postcode/property resolution later in the method
(the code that uses paon around the property lookup on lines ~77-78). Update any
logic that relies on paon (property resolution/path construction) to use this
fallback variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant