fix: EastRidingCouncil - replace Selenium with direct JSON API by InertiaUK · Pull Request #1989 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-01T19:55:36Z

Replaces the Selenium-based dropdown scraper with a direct call to the council's public JSON API.

Problem: The old scraper required the exact full address string (e.g. 14 THE LEASES BEVERLEY HU17 8LG) in the paon parameter for select_by_visible_text(). Any mismatch in formatting caused failures.

Fix: Calls wasterecyclingapi.eastriding.gov.uk/api/RecyclingData/CollectionsData directly with the postcode, then matches by house number against the HouseNameOrder field. Three-tier matching: exact HouseNameOrder, Address startswith, Address contains.

Changes:

EastRidingCouncil.py — full rewrite, requests-only (no Selenium)
input.json — updated to use house_number + postcode (was paon with full address)

Testing: Verified with HU17 8LG (14 The Leases) — returns 3 bins (Blue, Green, Brown).

Summary by CodeRabbit

Refactor
- Bin collection schedule retrieval system has been completely reimplemented to use direct API integration instead of web scraping, delivering improvements in reliability, data retrieval speed, error handling, and overall system stability.
Tests
- Test configuration updated to align with new API endpoint specifications and revised address field format requirements.

The old scraper used Selenium to interact with a dropdown, requiring the exact full address string in paon (e.g. "14 THE LEASES BEVERLEY HU17 8LG"). This was fragile and broke when the address format didn't match exactly. Rewritten to call the council's public JSON API directly: - No Selenium dependency - Accepts standard postcode + house_number parameters - Three-tier address matching: HouseNameOrder exact → Address startswith → Address contains - Handles single-result postcodes without house_number - Returns Blue/Green/Brown bin dates from API response

coderabbitai · 2026-05-01T19:55:48Z

📝 Walkthrough

Walkthrough

The EastRidingCouncil bin collection scraper has been migrated from Selenium-driven HTML scraping to a direct HTTP API call. Test input data has been updated with a simplified house number and corrected API endpoint URL to reflect the new implementation approach.

Changes

Cohort / File(s)	Summary
Test Data Updates `uk_bin_collection/tests/input.json`	Simplified `house_number` from full address string to `"14"` and updated `url` endpoint from wasterecyclingapi to www.eastriding.gov.uk; removed trailing newline.
API Migration `uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py`	Replaced Selenium/HTML parsing with direct HTTP API integration using fixed endpoint and credentials (APIKey, Licensee). Implements postcode normalization, address matching logic (HouseNameOrder exact match with Address fallback), and extracts BlueDate/GreenDate/BrownDate fields into typed bin entries with parsed collection dates; removed all browser automation and HTML parsing code.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

#1715: Converts council scraper from Selenium/HTML scraping to API-based implementation with similar CouncilClass.parse_data rewrite pattern.
#1742: Refactors another council's parse_data from browser automation to direct API calls with JSON response handling.

Suggested reviewers

dp247

Poem

🐰 From tangled webs of Selenium dreams,
To API calls with cleaner streams,
East Riding's bins now answer quick,
No more browser tricks to click!
Dates cascade in JSON light! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: replacing Selenium-based scraping with a direct JSON API call for EastRidingCouncil. It directly reflects the core modification evident in both the test data and the main implementation changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Around line 809-815: The fixture's guidance is stale: update the wiki_note to
reflect the new input contract that uses discrete fields like house_number
(instead of the old full-dropdown Selenium address) and remove the unused
Selenium metadata; specifically change the wiki_note for the East Riding entry
(wiki_name "East Riding of Yorkshire") to instruct contributors to supply the
discrete address fields (house_number, postcode, etc.) consistent with
skip_get_url: true and delete the web_driver property if the site no longer
requires Selenium.

In `@uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py`:
- Around line 97-117: The loop over BIN_DATE_FIELDS currently swallows
ValueError/TypeError and silently drops malformed dates; instead, stop
swallowing errors—when datetime.strptime for a given field fails, raise a new
exception (or re-raise) with a clear message including the field name, the
offending date_str and the matched record so format drift is visible; update the
code around BIN_DATE_FIELDS, matched, collection_date and the assembly of
data["bins"] to validate/convert dates and fail loudly rather than continue, and
ensure the subsequent sort still assumes all collectionDate values are valid (so
no silent omissions occur).
- Around line 57-78: The current PAON fallback matching in EastRidingCouncil.py
(the user_paon handling that iterates entries and checks HouseNameOrder and
Address) is too permissive and can wrongly match short inputs; tighten it by
first validating user_paon (ignore empty/whitespace, require a minimum
meaningful length or digits when numeric), then change the Address fallback
checks to use bounded/whole-token matching (e.g., word-boundary or token
equality against split Address tokens) instead of plain startswith/contains, and
only accept startswith/contains if user_paon meets the minimum length check;
update the three matching blocks (HouseNameOrder exact, Address startswith,
Address contains) to apply these validation rules so you fail-safe (leave
matched None) rather than returning a likely-wrong first hit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27539691-7fe4-4bd0-8f67-c9400dd45e6d

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and bc6640c.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py

coderabbitai · 2026-05-01T20:00:07Z

+        "house_number": "14",
        "postcode": "HU17 8LG",
        "skip_get_url": true,
-        "url": "https://wasterecyclingapi.eastriding.gov.uk",
+        "url": "https://www.eastriding.gov.uk",
        "web_driver": "http://selenium:4444",
        "wiki_name": "East Riding of Yorkshire",
        "wiki_note": "Put the full address as it displays on the council website dropdown when you do the check manually."


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the East Riding fixture guidance to match the new input contract.

This block now uses house_number: "14", but the wiki_note still tells users to pass the full dropdown address from the old Selenium flow. That mismatch will send users down the wrong path. If East Riding no longer needs Selenium at all, this is also a good place to drop the stale web_driver metadata.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/tests/input.json` around lines 809 - 815, The fixture's guidance is stale: update the wiki_note to reflect the new input contract that uses discrete fields like house_number (instead of the old full-dropdown Selenium address) and remove the unused Selenium metadata; specifically change the wiki_note for the East Riding entry (wiki_name "East Riding of Yorkshire") to instruct contributors to supply the discrete address fields (house_number, postcode, etc.) consistent with skip_get_url: true and delete the web_driver property if the site no longer requires Selenium.

coderabbitai · 2026-05-01T20:00:07Z

+        if user_paon:
+            paon_upper = user_paon.strip().upper()
+            # Try exact match on HouseNameOrder first (most reliable)
+            for entry in entries:
+                house_name = (entry.get("HouseNameOrder") or "").strip().upper()
+                if house_name == paon_upper:
+                    matched = entry
+                    break
+            # Fall back to Address startswith
+            if not matched:
+                for entry in entries:
+                    address = (entry.get("Address") or "").upper()
+                    if address.startswith(paon_upper):
+                        matched = entry
+                        break
+            # Fall back to Address contains
+            if not matched:
+                for entry in entries:
+                    address = (entry.get("Address") or "").upper()
+                    if paon_upper in address:
+                        matched = entry
+                        break


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten the fallback address match before selecting a property.

The startswith/contains fallbacks can silently pick the wrong address for short or blank-ish inputs, e.g. "1" matching "10 ..." or "11 ...". At that point returning the first hit is worse than failing, because it gives the user another household’s collection dates.

Suggested direction

+import re ... if user_paon: paon_upper = user_paon.strip().upper() + if not paon_upper: + raise ValueError("House number/name cannot be blank") # Try exact match on HouseNameOrder first (most reliable) for entry in entries: house_name = (entry.get("HouseNameOrder") or "").strip().upper() if house_name == paon_upper: matched = entry break - # Fall back to Address startswith + + def address_starts_with_paon(entry): + address = (entry.get("Address") or "").upper() + return re.match(rf"^{re.escape(paon_upper)}(?:\\b|[, ])", address) + if not matched: - for entry in entries: - address = (entry.get("Address") or "").upper() - if address.startswith(paon_upper): - matched = entry - break - # Fall back to Address contains - if not matched: - for entry in entries: - address = (entry.get("Address") or "").upper() - if paon_upper in address: - matched = entry - break + candidates = [entry for entry in entries if address_starts_with_paon(entry)] + if len(candidates) == 1: + matched = candidates[0] + elif len(candidates) > 1: + raise ValueError( + f"Ambiguous address match for '{user_paon}' at postcode {user_postcode}" + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py` around lines 57 - 78, The current PAON fallback matching in EastRidingCouncil.py (the user_paon handling that iterates entries and checks HouseNameOrder and Address) is too permissive and can wrongly match short inputs; tighten it by first validating user_paon (ignore empty/whitespace, require a minimum meaningful length or digits when numeric), then change the Address fallback checks to use bounded/whole-token matching (e.g., word-boundary or token equality against split Address tokens) instead of plain startswith/contains, and only accept startswith/contains if user_paon meets the minimum length check; update the three matching blocks (HouseNameOrder exact, Address startswith, Address contains) to apply these validation rules so you fail-safe (leave matched None) rather than returning a likely-wrong first hit.

coderabbitai · 2026-05-01T20:00:07Z

+        # Extract bin collection dates from the matched entry
+        for field, bin_type in BIN_DATE_FIELDS.items():
+            date_str = matched.get(field)
+            if not date_str:
+                continue
+            try:
+                collection_date = datetime.strptime(
+                    date_str, "%Y-%m-%dT%H:%M:%S"
                )
-            )
-            drop = Select(dropdown)
-            drop.select_by_visible_text(str(user_paon))
-
-            results_present = wait.until(
-                EC.presence_of_element_located(
-                    (
-                        By.CLASS_NAME,
-                        "results",
-                    )
+                data["bins"].append(
+                    {
+                        "type": bin_type,
+                        "collectionDate": collection_date.strftime(date_format),
+                    }
                )
-            )
-
-            data = {"bins": []}  # dictionary for data
-            soup = BeautifulSoup(driver.page_source, "html.parser")
+            except (ValueError, TypeError):
+                continue

-            bin_types = {}  # Dictionary to store bin types
+        data["bins"].sort(
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow invalid collection dates here.

If East Riding changes one of these fields, the parser currently drops that bin silently and can return a partial or empty schedule. This project usually benefits from failing loudly on unexpected council payloads so format drift is caught immediately.

Suggested change

for field, bin_type in BIN_DATE_FIELDS.items(): date_str = matched.get(field) if not date_str: continue try: collection_date = datetime.strptime( date_str, "%Y-%m-%dT%H:%M:%S" ) - data["bins"].append( - { - "type": bin_type, - "collectionDate": collection_date.strftime(date_format), - } - ) - except (ValueError, TypeError): - continue + except (ValueError, TypeError) as exc: + raise ValueError( + f"Unexpected {field} value {date_str!r} for postcode {user_postcode}" + ) from exc + + data["bins"].append( + { + "type": bin_type, + "collectionDate": collection_date.strftime(date_format), + } + ) + + if not data["bins"]: + raise ValueError( + f"No collection dates found for matched address at postcode {user_postcode}" + )

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures over silent error handling when parsing council data so upstream format changes are detected early.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EastRidingCouncil.py` around lines 97 - 117, The loop over BIN_DATE_FIELDS currently swallows ValueError/TypeError and silently drops malformed dates; instead, stop swallowing errors—when datetime.strptime for a given field fails, raise a new exception (or re-raise) with a clear message including the field name, the offending date_str and the matched record so format drift is visible; update the code around BIN_DATE_FIELDS, matched, collection_date and the assembly of data["bins"] to validate/convert dates and fail loudly rather than continue, and ensure the subsequent sort still assumes all collectionDate values are valid (so no silent omissions occur).

robbrad · 2026-05-01T20:22:38Z

Included in May 2026 Release PR #1992. Closing.

…API (resolved conflict)

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

robbrad mentioned this pull request May 1, 2026

May 2026 Release #1992

Merged

robbrad closed this May 1, 2026

pull Bot pushed a commit to mrw298/UKBinCollectionData that referenced this pull request May 1, 2026

Merge PR robbrad#1989 - EastRidingCouncil replace Selenium with JSON …

52b9be4

…API (resolved conflict)

robbrad mentioned this pull request May 2, 2026

May 2026 Release #1996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: EastRidingCouncil - replace Selenium with direct JSON API#1989

fix: EastRidingCouncil - replace Selenium with direct JSON API#1989
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/EastRidingCouncil-json-api

InertiaUK commented May 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

InertiaUK commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

InertiaUK commented May 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading