fix(LondonBoroughSutton): add postcode search + fix HTML parsing#2033
fix(LondonBoroughSutton): add postcode search + fix HTML parsing#2033InertiaUK wants to merge 1 commit into
Conversation
Sutton uses internal property IDs, not ONSUD UPRNs. Old scraper required manual UPRN lookup. New scraper searches by postcode, matches address, then retries page load until bin data appears (server-side rendering delay). Also fixed parent div selector (waste-service-grid not generic div). Updated input.json with postcode + house_number test params.
📝 WalkthroughWalkthroughLondonBoroughSutton council scraper is refactored to resolve waste collection data using postcode and house number lookup instead of direct UPRN polling. A new property ID resolution helper translates postcode+paon into a property ID by scraping address select options. The parse_data() method now uses resilient HTTP retry with rate-limit handling, validates property resolution, and improves date parsing with year correction. Test configuration is updated with required input fields and documentation. ChangesLondonBoroughSutton Postcode-Driven Property Lookup
🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2033 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py`:
- Around line 59-61: The PAON variable may be empty because inputs use
"house_number"; change the kwargs handling in LondonBoroughSutton (where
user_uprn, postcode, paon are read) to set paon = kwargs.get("paon") or
kwargs.get("house_number") so PAON has a fallback before the postcode/property
resolution later in the method (the code that uses paon around the property
lookup on lines ~77-78). Update any logic that relies on paon (property
resolution/path construction) to use this fallback variable.
- Around line 33-36: The current selector silently falls back to the first
candidate by assigning best = val when no PAON match is found; instead, remove
this fallback assignment and after iterating candidates, if no explicit PAON
match was found (i.e., best is still falsy) raise a clear exception (e.g.,
ValueError or a custom ParseError) including the input address/PAON candidates
for diagnostics; reference the variables best and val (and the function that
contains them in LondonBoroughSutton.py) to locate and implement the change so
the parser fails loudly on unexpected address formats rather than returning
another property’s data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9a03625-9600-40b5-8d6f-fb9efb3377c6
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py
| if not best and val: | ||
| best = val | ||
|
|
||
| Raises: | ||
| RuntimeError: If the council page still reports "Loading your bin days..." after polite retries. | ||
| """ | ||
| user_uprn = kwargs.get("uprn") | ||
| data = {"bins": []} | ||
| return best |
There was a problem hiding this comment.
Avoid silently defaulting to the first address option when PAON doesn’t match.
Lines 33-36 can return an arbitrary address (best) instead of failing, which risks returning another property’s collection data.
Suggested fix
- best = None
+ best = None
for opt in select.find_all("option"):
val = opt.get("value", "")
if not val or val == "missing":
continue
text = opt.get_text(strip=True).lower()
if paon_lower and text.startswith(paon_lower):
return val
if not best and val:
best = val
- return best
+ if paon_lower:
+ return None
+ return bestBased on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not best and val: | |
| best = val | |
| Raises: | |
| RuntimeError: If the council page still reports "Loading your bin days..." after polite retries. | |
| """ | |
| user_uprn = kwargs.get("uprn") | |
| data = {"bins": []} | |
| return best | |
| if not best and val: | |
| best = val | |
| if paon_lower: | |
| return None | |
| return best |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py` around
lines 33 - 36, The current selector silently falls back to the first candidate
by assigning best = val when no PAON match is found; instead, remove this
fallback assignment and after iterating candidates, if no explicit PAON match
was found (i.e., best is still falsy) raise a clear exception (e.g., ValueError
or a custom ParseError) including the input address/PAON candidates for
diagnostics; reference the variables best and val (and the function that
contains them in LondonBoroughSutton.py) to locate and implement the change so
the parser fails loudly on unexpected address formats rather than returning
another property’s data.
| user_uprn = kwargs.get("uprn") | ||
| postcode = kwargs.get("postcode") | ||
| paon = kwargs.get("paon") |
There was a problem hiding this comment.
Use house_number as a PAON fallback before postcode resolution.
Line 61 only reads paon, but Sutton inputs now provide house_number; this can leave PAON empty and mis-resolve the property on Line 78.
Suggested fix
- paon = kwargs.get("paon")
+ paon = kwargs.get("paon") or kwargs.get("house_number")Also applies to: 77-78
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughSutton.py` around
lines 59 - 61, The PAON variable may be empty because inputs use "house_number";
change the kwargs handling in LondonBoroughSutton (where user_uprn, postcode,
paon are read) to set paon = kwargs.get("paon") or kwargs.get("house_number") so
PAON has a fallback before the postcode/property resolution later in the method
(the code that uses paon around the property lookup on lines ~77-78). Update any
logic that relies on paon (property resolution/path construction) to use this
fallback variable.
Summary
/wastewith postcode, parse address dropdown, match by house numberwaste-service-gridinstead of genericdiv(service data is two levels up)input.jsonwithpostcode+house_numbertest paramsSummary by CodeRabbit
Release Notes