fix: EastRidingCouncil - replace Selenium with direct JSON API#1989
fix: EastRidingCouncil - replace Selenium with direct JSON API#1989InertiaUK wants to merge 1 commit into
Conversation
The old scraper used Selenium to interact with a dropdown, requiring the exact full address string in paon (e.g. "14 THE LEASES BEVERLEY HU17 8LG"). This was fragile and broke when the address format didn't match exactly. Rewritten to call the council's public JSON API directly: - No Selenium dependency - Accepts standard postcode + house_number parameters - Three-tier address matching: HouseNameOrder exact → Address startswith → Address contains - Handles single-result postcodes without house_number - Returns Blue/Green/Brown bin dates from API response
📝 WalkthroughWalkthroughThe EastRidingCouncil bin collection scraper has been migrated from Selenium-driven HTML scraping to a direct HTTP API call. Test input data has been updated with a simplified house number and corrected API endpoint URL to reflect the new implementation approach. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Around line 809-815: The fixture's guidance is stale: update the wiki_note to
reflect the new input contract that uses discrete fields like house_number
(instead of the old full-dropdown Selenium address) and remove the unused
Selenium metadata; specifically change the wiki_note for the East Riding entry
(wiki_name "East Riding of Yorkshire") to instruct contributors to supply the
discrete address fields (house_number, postcode, etc.) consistent with
skip_get_url: true and delete the web_driver property if the site no longer
requires Selenium.
In `@uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py`:
- Around line 97-117: The loop over BIN_DATE_FIELDS currently swallows
ValueError/TypeError and silently drops malformed dates; instead, stop
swallowing errors—when datetime.strptime for a given field fails, raise a new
exception (or re-raise) with a clear message including the field name, the
offending date_str and the matched record so format drift is visible; update the
code around BIN_DATE_FIELDS, matched, collection_date and the assembly of
data["bins"] to validate/convert dates and fail loudly rather than continue, and
ensure the subsequent sort still assumes all collectionDate values are valid (so
no silent omissions occur).
- Around line 57-78: The current PAON fallback matching in EastRidingCouncil.py
(the user_paon handling that iterates entries and checks HouseNameOrder and
Address) is too permissive and can wrongly match short inputs; tighten it by
first validating user_paon (ignore empty/whitespace, require a minimum
meaningful length or digits when numeric), then change the Address fallback
checks to use bounded/whole-token matching (e.g., word-boundary or token
equality against split Address tokens) instead of plain startswith/contains, and
only accept startswith/contains if user_paon meets the minimum length check;
update the three matching blocks (HouseNameOrder exact, Address startswith,
Address contains) to apply these validation rules so you fail-safe (leave
matched None) rather than returning a likely-wrong first hit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 27539691-7fe4-4bd0-8f67-c9400dd45e6d
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py
| "house_number": "14", | ||
| "postcode": "HU17 8LG", | ||
| "skip_get_url": true, | ||
| "url": "https://wasterecyclingapi.eastriding.gov.uk", | ||
| "url": "https://www.eastriding.gov.uk", | ||
| "web_driver": "http://selenium:4444", | ||
| "wiki_name": "East Riding of Yorkshire", | ||
| "wiki_note": "Put the full address as it displays on the council website dropdown when you do the check manually." |
There was a problem hiding this comment.
Update the East Riding fixture guidance to match the new input contract.
This block now uses house_number: "14", but the wiki_note still tells users to pass the full dropdown address from the old Selenium flow. That mismatch will send users down the wrong path. If East Riding no longer needs Selenium at all, this is also a good place to drop the stale web_driver metadata.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/tests/input.json` around lines 809 - 815, The fixture's
guidance is stale: update the wiki_note to reflect the new input contract that
uses discrete fields like house_number (instead of the old full-dropdown
Selenium address) and remove the unused Selenium metadata; specifically change
the wiki_note for the East Riding entry (wiki_name "East Riding of Yorkshire")
to instruct contributors to supply the discrete address fields (house_number,
postcode, etc.) consistent with skip_get_url: true and delete the web_driver
property if the site no longer requires Selenium.
| if user_paon: | ||
| paon_upper = user_paon.strip().upper() | ||
| # Try exact match on HouseNameOrder first (most reliable) | ||
| for entry in entries: | ||
| house_name = (entry.get("HouseNameOrder") or "").strip().upper() | ||
| if house_name == paon_upper: | ||
| matched = entry | ||
| break | ||
| # Fall back to Address startswith | ||
| if not matched: | ||
| for entry in entries: | ||
| address = (entry.get("Address") or "").upper() | ||
| if address.startswith(paon_upper): | ||
| matched = entry | ||
| break | ||
| # Fall back to Address contains | ||
| if not matched: | ||
| for entry in entries: | ||
| address = (entry.get("Address") or "").upper() | ||
| if paon_upper in address: | ||
| matched = entry | ||
| break |
There was a problem hiding this comment.
Tighten the fallback address match before selecting a property.
The startswith/contains fallbacks can silently pick the wrong address for short or blank-ish inputs, e.g. "1" matching "10 ..." or "11 ...". At that point returning the first hit is worse than failing, because it gives the user another household’s collection dates.
Suggested direction
+import re
...
if user_paon:
paon_upper = user_paon.strip().upper()
+ if not paon_upper:
+ raise ValueError("House number/name cannot be blank")
# Try exact match on HouseNameOrder first (most reliable)
for entry in entries:
house_name = (entry.get("HouseNameOrder") or "").strip().upper()
if house_name == paon_upper:
matched = entry
break
- # Fall back to Address startswith
+
+ def address_starts_with_paon(entry):
+ address = (entry.get("Address") or "").upper()
+ return re.match(rf"^{re.escape(paon_upper)}(?:\\b|[, ])", address)
+
if not matched:
- for entry in entries:
- address = (entry.get("Address") or "").upper()
- if address.startswith(paon_upper):
- matched = entry
- break
- # Fall back to Address contains
- if not matched:
- for entry in entries:
- address = (entry.get("Address") or "").upper()
- if paon_upper in address:
- matched = entry
- break
+ candidates = [entry for entry in entries if address_starts_with_paon(entry)]
+ if len(candidates) == 1:
+ matched = candidates[0]
+ elif len(candidates) > 1:
+ raise ValueError(
+ f"Ambiguous address match for '{user_paon}' at postcode {user_postcode}"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py` around
lines 57 - 78, The current PAON fallback matching in EastRidingCouncil.py (the
user_paon handling that iterates entries and checks HouseNameOrder and Address)
is too permissive and can wrongly match short inputs; tighten it by first
validating user_paon (ignore empty/whitespace, require a minimum meaningful
length or digits when numeric), then change the Address fallback checks to use
bounded/whole-token matching (e.g., word-boundary or token equality against
split Address tokens) instead of plain startswith/contains, and only accept
startswith/contains if user_paon meets the minimum length check; update the
three matching blocks (HouseNameOrder exact, Address startswith, Address
contains) to apply these validation rules so you fail-safe (leave matched None)
rather than returning a likely-wrong first hit.
| # Extract bin collection dates from the matched entry | ||
| for field, bin_type in BIN_DATE_FIELDS.items(): | ||
| date_str = matched.get(field) | ||
| if not date_str: | ||
| continue | ||
| try: | ||
| collection_date = datetime.strptime( | ||
| date_str, "%Y-%m-%dT%H:%M:%S" | ||
| ) | ||
| ) | ||
| drop = Select(dropdown) | ||
| drop.select_by_visible_text(str(user_paon)) | ||
|
|
||
| results_present = wait.until( | ||
| EC.presence_of_element_located( | ||
| ( | ||
| By.CLASS_NAME, | ||
| "results", | ||
| ) | ||
| data["bins"].append( | ||
| { | ||
| "type": bin_type, | ||
| "collectionDate": collection_date.strftime(date_format), | ||
| } | ||
| ) | ||
| ) | ||
|
|
||
| data = {"bins": []} # dictionary for data | ||
| soup = BeautifulSoup(driver.page_source, "html.parser") | ||
| except (ValueError, TypeError): | ||
| continue | ||
|
|
||
| bin_types = {} # Dictionary to store bin types | ||
| data["bins"].sort( | ||
| key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) | ||
| ) |
There was a problem hiding this comment.
Don’t swallow invalid collection dates here.
If East Riding changes one of these fields, the parser currently drops that bin silently and can return a partial or empty schedule. This project usually benefits from failing loudly on unexpected council payloads so format drift is caught immediately.
Suggested change
for field, bin_type in BIN_DATE_FIELDS.items():
date_str = matched.get(field)
if not date_str:
continue
try:
collection_date = datetime.strptime(
date_str, "%Y-%m-%dT%H:%M:%S"
)
- data["bins"].append(
- {
- "type": bin_type,
- "collectionDate": collection_date.strftime(date_format),
- }
- )
- except (ValueError, TypeError):
- continue
+ except (ValueError, TypeError) as exc:
+ raise ValueError(
+ f"Unexpected {field} value {date_str!r} for postcode {user_postcode}"
+ ) from exc
+
+ data["bins"].append(
+ {
+ "type": bin_type,
+ "collectionDate": collection_date.strftime(date_format),
+ }
+ )
+
+ if not data["bins"]:
+ raise ValueError(
+ f"No collection dates found for matched address at postcode {user_postcode}"
+ )Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures over silent error handling when parsing council data so upstream format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py` around
lines 97 - 117, The loop over BIN_DATE_FIELDS currently swallows
ValueError/TypeError and silently drops malformed dates; instead, stop
swallowing errors—when datetime.strptime for a given field fails, raise a new
exception (or re-raise) with a clear message including the field name, the
offending date_str and the matched record so format drift is visible; update the
code around BIN_DATE_FIELDS, matched, collection_date and the assembly of
data["bins"] to validate/convert dates and fail loudly rather than continue, and
ensure the subsequent sort still assumes all collectionDate values are valid (so
no silent omissions occur).
|
Included in May 2026 Release PR #1992. Closing. |
…API (resolved conflict)
Replaces the Selenium-based dropdown scraper with a direct call to the council's public JSON API.
Problem: The old scraper required the exact full address string (e.g.
14 THE LEASES BEVERLEY HU17 8LG) in thepaonparameter forselect_by_visible_text(). Any mismatch in formatting caused failures.Fix: Calls
wasterecyclingapi.eastriding.gov.uk/api/RecyclingData/CollectionsDatadirectly with the postcode, then matches by house number against theHouseNameOrderfield. Three-tier matching: exact HouseNameOrder, Address startswith, Address contains.Changes:
EastRidingCouncil.py— full rewrite, requests-only (no Selenium)input.json— updated to usehouse_number+postcode(waspaonwith full address)Testing: Verified with HU17 8LG (14 The Leases) — returns 3 bins (Blue, Green, Brown).
Summary by CodeRabbit
Refactor
Tests