Skip to content

fix: Council Fix Pack - November 2025#1679

Merged
robbrad merged 23 commits into
robbrad:November_releasefrom
m26dvd:master
Nov 8, 2025
Merged

fix: Council Fix Pack - November 2025#1679
robbrad merged 23 commits into
robbrad:November_releasefrom
m26dvd:master

Conversation

@m26dvd
Copy link
Copy Markdown
Contributor

@m26dvd m26dvd commented Oct 21, 2025

New Councils
fix: #1229 - Dumfries and Galloway Council

BREAKING FIXES
fix: #1675 fix: #1259 - Rochdale Council - This is an updated scraper that relies on the UPRN only. The bin types may be named differently
fix: #1688 - Chelmsford City Council - Bins are now split into different sensors instead of all in one
Newport City Council - Removing the need for Selenium and moving to UPRN based lookup

Fixes
fix: #1653 - Norwich City Council
fix: #1641 - Wokingham Borough Council
fix: #1621 - London Borough of Harrow
fix: #1625 - Hart District Council
fix: #1685 - Brighton & Hove - New URL
fix: #1683 - London Borough of Hounslow
fix: #1676 - Derby City Council
fix: #1690 - Boston Borough Council
fix: #1382 - Middlesborough Council - Removed the requirement for Selenium
fix: #1698 - Southampton City Council

Fixed previously but Issues not closed off
fix: #1468 - West Oxfordshire Council - fixed by #1527
fix: #1454 - Fixed by #1495

Summary by CodeRabbit

  • New Features

    • Added Dumfries & Galloway council parser.
  • Bug Fixes / Changes

    • Updated Brighton & Hove, Hounslow and Norwich URLs; corrected Rochdale identifier.
    • Newport now uses UPRN-based lookup.
    • Multiple councils (Middlesbrough, Hounslow, Rochdale, Newport, others) moved to API-driven retrieval, reducing browser automation.
    • Parse improvements: emit separate entries for multiple bin types; improved page-load and cookie handling; request parameter adjustments for Southampton.
  • Documentation

    • Updated council lookup examples and notes.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 21, 2025

Walkthrough

Multiple council parsers, tests, and docs updated: new Dumfries & Galloway parser added; several councils migrated from Selenium/HTML scraping to API/ICS/encrypted HTTP flows; multiple URLs and UPRNs corrected; cookie/Cloudflare waits, request headers, and multi-bin splitting logic added.

Changes

Cohort / File(s) Summary
Test & Docs
uk_bin_collection/tests/input.json, wiki/Councils.md
Added Dumfries & Galloway entry; corrected Brighton & Hove, Hounslow URLs; fixed Norwich typo; updated CLI examples/notes; adjusted UPRNs and removed some web_driver flags; minor formatting tweaks.
New Council (ICS)
uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py
New parser using ICS events: validates UPRN, fetches ICS, splits event summaries into multiple collection types, returns {"bins":[...]} entries.
Selenium reliability & cookies
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py
Added user-agent, Cloudflare title wait, guarded cookie-accept flow with try/except and sleeps to stabilise interactions.
Hardcoded URL update
uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py
Replaced dynamic URL retrieval with hardcoded enviroservices.brighton-hove.gov.uk collections URL.
ICS / multi-collection handling
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py
Added icalevents usage and ICS parsing; supports multiple collections per ICS event by splitting summaries and emitting separate entries.
Endpoint path change
uk_bin_collection/uk_bin_collection/councils/DerbyCityCouncil.py
Updated Derby endpoint path from binday/Binday?search.PremisesId= to binday/BinDays/{uprn}.
API-driven refactors (token/session)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py, uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py
Replaced HTML/Soup/Selenium flows with session/token-based API calls and JSON parsing; builds {"bins":[...]} results.
Request header added
uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py
Added User-Agent header and 30s timeout to GET requests.
Recollect API (no Selenium)
uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py
Replaced Selenium/BeautifulSoup with Recollect address-suggest and place_calendar.json API calls; parse JSON to build bins.
Encrypted HTTP flow (Newport)
uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py
Rewrote to use AES-CBC encode/decode helpers and encrypted HTTP POSTs (NewportInput dataclass, encode/decode helpers); removed Selenium/HTML flow; added cryptography + network logic.
Selenium → BeautifulSoup parsing
uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py
Replaced Selenium DOM traversal with BeautifulSoup parsing of page source and updated selectors to iterate nested collection/date blocks.
Row multi-bin splitting
uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py
Refactored to split multi-type cells (using "&") and emit separate entries per bin type with the same collection date.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant SessionAPI as Session/Token API (Hounslow/Rochdale)
    participant Recollect as Recollect API (Middlesbrough)
    participant ICS as ICS Service (Chelmsford/Dumfries)
    participant Encrypted as Newport Encrypted API

    Client->>SessionAPI: GET session page (extract token/session_id)
    SessionAPI->>SessionAPI: return token
    Client->>SessionAPI: POST with token + payload
    SessionAPI-->>Client: JSON response -> parse -> build {"bins":[...]}

    Client->>Recollect: POST address-suggest (paon/postcode)
    Recollect-->>Client: place_id
    Client->>Recollect: GET place_calendar.json(place_id)
    Recollect-->>Client: JSON -> parse -> {"bins":[...]}

    Client->>ICS: GET .ics URL (uprn)
    ICS-->>Client: calendar events
    Client->>Client: split event summaries -> {"bins":[...]}

    Client->>Encrypted: build NewportInput -> AES encrypt -> POST
    Encrypted-->>Client: hex response -> decrypt -> {"bins":[...]}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Focus items:
    • Session/token lifecycle, headers, retries, and error handling in Hounslow and Rochdale API flows.
    • Newport AES encode/decode correctness (padding, key/IV, error handling).
    • ICS parsing and multi-collection splitting (Chelmsford, Dumfries) and consistent date formatting.
    • Selenium→BeautifulSoup and added sleeps/waits (Boston, Wokingham) for selector accuracy and flakiness.
    • Ensure hardcoded URLs and removed web_driver fields align with test inputs and docs.

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐇
I hopped through tokens, ICS, and queues,
swapped browsers for APIs and tidy news,
I toasted cookies, fixed a URL race,
split multi-bins and cleaned the chase,
now bin days march home — with carrot-powered grace. 🥕🗑️

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: Council Fix Pack - November 2025' accurately reflects the changeset as a comprehensive council-wide fixes bundle addressing multiple council parsers and configuration updates.
Linked Issues check ✅ Passed The PR addresses all linked issues: #1229 adds Dumfries & Galloway, #1675/#1259 fix Rochdale via API rewrite, #1688 splits Chelmsford bins, #1653 fixes Norwich URL typo, #1641 updates Wokingham selectors, #1621 adds Harrow AJAX support, #1625 fixes Hart bin_type error, #1685 updates Brighton URL, #1683 updates Hounslow endpoint, #1676 fixes Derby API path, #1690 hardens Boston error handling, #1382 updates Middlesbrough to API, and #1468/#1454 preserve prior fixes.
Out of Scope Changes check ✅ Passed All changes are within scope: council configuration updates (input.json, wiki), council parser fixes/rewrites (BostonBoroughCouncil, BrightonandHoveCityCouncil, Chelmsford, Derby, Hart, Harrow, Hounslow, Middlesbrough, Newport, Rochdale, Wokingham), and a new Dumfries & Galloway parser directly address the linked issues.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@m26dvd m26dvd changed the title fix: Council Fix Pack - October 2025 fix: Council Fix Pack - November 2025 Oct 21, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (475e160) to head (75cbf30).
⚠️ Report is 33 commits behind head on November_release.

Additional details and impacted files
@@                Coverage Diff                @@
##           November_release    #1679   +/-   ##
=================================================
  Coverage             86.79%   86.79%           
=================================================
  Files                     9        9           
  Lines                  1136     1136           
=================================================
  Hits                    986      986           
  Misses                  150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@m26dvd m26dvd marked this pull request as ready for review October 29, 2025 09:39
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py (3)

25-26: Add error handling for the API request.

The code lacks error handling for network failures and invalid JSON responses. If the API is unreachable or returns malformed data, this will cause unhandled exceptions.

Consider adding error handling:

-        response = requests.get(URI)
-        response_table = response.json()
+        try:
+            response = requests.get(URI, timeout=10)
+            response.raise_for_status()
+            response_table = response.json()
+        except requests.RequestException as e:
+            raise ValueError(f"Failed to fetch bin collection data: {e}")
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON response: {e}")

28-28: Validate the JSON response structure.

Accessing response_table[0]["data"] without validation can raise IndexError or KeyError if the response structure is unexpected.

Add validation:

+        if not response_table or not isinstance(response_table, list):
+            raise ValueError("Invalid response structure: expected non-empty list")
+        if "data" not in response_table[0]:
+            raise ValueError("Invalid response structure: missing 'data' key")
+        
         soup = BeautifulSoup(response_table[0]["data"], "html.parser")

54-71: Add error handling for date parsing.

datetime.strptime will raise ValueError if the date string doesn't match the expected format "%d %B". This can happen if the council changes their date format or returns unexpected data.

Add error handling:

 def format_date(self, date_str):
     # Get the current date and year
     current_date = datetime.now()
     current_year = current_date.year
 
-    # Parse the provided date string (e.g. "23 January")
-    date_obj = datetime.strptime(date_str, "%d %B")
+    try:
+        # Parse the provided date string (e.g. "23 January")
+        date_obj = datetime.strptime(date_str, "%d %B")
+    except ValueError as e:
+        raise ValueError(f"Failed to parse date '{date_str}': {e}")
 
     # Check if the provided date has already passed this year
     if date_obj.replace(year=current_year) < current_date:
         # If the date has passed this year, assume the next year
         date_obj = date_obj.replace(year=current_year + 1)
     else:
         # Otherwise, use the current year
         date_obj = date_obj.replace(year=current_year)
 
     # Format the date in "DD/MM/YYYY" format
     return date_obj.strftime("%d/%m/%Y")
🧹 Nitpick comments (9)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)

83-85: Consider simplifying the exception message.

The error message includes user data (address) which could be simplified. However, including the address is useful for debugging, so this is optional.

If desired, you could simplify to:

-                raise ValueError(
-                    f"Could not find collection round for address: {user_paon}"
-                )
+                raise ValueError("Could not find collection round for the specified address")
uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)

32-39: Use caller-provided URL when present; remove unused vars.

Hardcoding the URL blocks overrides and complicates testing. Prefer fallback to the fixed URL. Also, uprn and headers are unused here.

Apply:

-            url = "https://enviroservices.brighton-hove.gov.uk/link/collections"
+            url = kwargs.get("url") or "https://enviroservices.brighton-hove.gov.uk/link/collections"
             uprn = kwargs.get("uprn")
@@
-            driver.get(url)
+            driver.get(url)

Outside this hunk:

  • Drop the unused headers variable at Line 31.
  • If not used elsewhere in this file, remove import requests at Line 7 and the uprn local (or rename to _uprn) to satisfy lint.

Please confirm tests don’t rely on overriding the URL for this council.

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py (1)

1-4: Make imports explicit and define a request timeout constant.

Avoid relying on star-imports for stdlib and date symbols; define a single timeout source.

Apply:

 import time
+import json
+from datetime import datetime, timedelta
@@
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import *  # date_format, check_uprn (kept for consistency)
+
+TIMEOUT = 20  # seconds
uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py (6)

18-21: UPRN-only: make postcode optional; don’t fail when postcode is absent.

PR objective states Rochdale now relies on UPRN only. Remove mandatory postcode validation and include postcode in payload only if provided.

-        user_postcode = kwargs.get("postcode")
+        user_postcode = kwargs.get("postcode")
         user_uprn = kwargs.get("uprn")
-        check_postcode(user_postcode)
         check_uprn(user_uprn)

Build the form values conditionally:

-        data = {
+        payload = {
             "formValues": {
                 "Location details": {
                     "propertyUPRN": {
                         "value": user_uprn,
                     },
-                    "postcode_search": {
-                        "value": user_postcode,
-                    },
                     "bartecToken": {
                         "value": token,
                     },

Then append postcode only when present:

+        form_loc = payload["formValues"]["Location details"]
+        if user_postcode:
+            form_loc["postcode_search"] = {"value": user_postcode}

Please confirm the backend truly doesn’t require postcode.
Expected: UPRN-only works for all tested addresses.

Also applies to: 67-69


61-83: Variable naming: avoid reusing data/params; use distinct names.

Reusing data and params for different payloads reduces clarity. Rename to payload, token_params, lookup_params, and resp.

-        data = {
+        payload = {
             "formValues": {
@@
-        params = {
+        lookup_params = {
             "id": "686e9147a867e",
@@
-        r = s.post(API_URL, json=data, headers=headers, params=params)
+        r = s.post(API_URL, json=payload, headers=headers, params=lookup_params)
         r.raise_for_status()
-
-        data = r.json()
-        rows_data = data["integration"]["transformed"]["rows_data"]
+        resp = r.json()
+        rows_data = resp["integration"]["transformed"]["rows_data"]

Also applies to: 97-101


74-81: Use date.today() for date fields; drop time component concerns.

Dates are date-only; using datetime.now() is unnecessary and can cause TZ/DST confusion.

-                    "dateMinimum": {
-                        "value": datetime.now().strftime("%Y-%m-%d"),
-                    },
+                    "dateMinimum": {"value": date.today().strftime("%Y-%m-%d")},
@@
-                    "dateMaximum": {
-                        "value": (datetime.now() + timedelta(days=30)).strftime(
-                            "%Y-%m-%d"
-                        ),
-                    },
+                    "dateMaximum": {"value": (date.today() + timedelta(days=30)).strftime("%Y-%m-%d")},

57-59: Exception types and messages: prefer TypeError for type mismatches; keep messages brief.

Aligns with Ruff TRY004/TRY003 hints.

-        if not isinstance(rows_data, dict):
-            raise ValueError("Invalid data returned from API")
+        if not isinstance(rows_data, dict):
+            raise TypeError("rows_data must be dict")

@@
-        if not isinstance(rows_data, dict):
-            raise ValueError("Invalid data returned from API")
+        if not isinstance(rows_data, dict):
+            raise TypeError("rows_data must be dict")

Also applies to: 102-104


105-113: Unused loop var; resilient ISO date parsing (handles Z/offset).

Rename unused key and parse ISO timestamps that may include timezone.

-        for key, value in rows_data.items():
+        for _, value in rows_data.items():
             dict_data = {
                 "type": value["bartecBinType"],
-                "collectionDate": datetime.strptime(
-                    value["bartecBinStartDate"], "%Y-%m-%dT%H:%M:%S"
-                ).strftime(date_format),
+                "collectionDate": (
+                    # supports '...Z' or '+HH:MM' by normalizing to fromisoformat
+                    (datetime.fromisoformat(value["bartecBinStartDate"].replace("Z", "+00:00")))
+                ).strftime(date_format),
             }
             bindata["bins"].append(dict_data)

If some rows lack time zone and still fail, add a small fallback:

try:
    dt = datetime.fromisoformat(value["bartecBinStartDate"].replace("Z", "+00:00"))
except ValueError:
    dt = datetime.strptime(value["bartecBinStartDate"], "%Y-%m-%dT%H:%M:%S")

28-35: Header hygiene: set a project UA.

Optional: use a descriptive User-Agent for observability and to aid council ops.

-            "User-Agent": "Mozilla/5.0",
+            "User-Agent": "UKBinCollectionData/rochdale (requests)",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 475e160 and 2d3efea.

📒 Files selected for processing (11)
  • uk_bin_collection/tests/input.json (5 hunks)
  • uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2 hunks)
  • uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/DerbyCityCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py (2 hunks)
  • uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py (2 hunks)
  • uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2 hunks)
  • wiki/Councils.md (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • create_webdriver (321-360)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • create_webdriver (321-360)
uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • create_webdriver (321-360)
🪛 Ruff (0.14.2)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

34-34: create_webdriver may be undefined, or defined from star imports

(F405)


50-50: Do not use bare except

(E722)


50-51: try-except-pass detected, consider logging the exception

(S110)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py

56-56: Prefer TypeError exception for invalid type

(TRY004)


56-56: Avoid specifying long messages outside the exception class

(TRY003)


69-69: datetime may be undefined, or defined from star imports

(F405)


72-72: datetime may be undefined, or defined from star imports

(F405)


72-72: timedelta may be undefined, or defined from star imports

(F405)


98-98: Prefer TypeError exception for invalid type

(TRY004)


98-98: Avoid specifying long messages outside the exception class

(TRY003)


100-100: json may be undefined, or defined from star imports

(F405)


105-105: datetime may be undefined, or defined from star imports

(F405)


107-107: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py

28-28: Probable use of requests call without timeout

(S113)

uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py

58-58: Prefer TypeError exception for invalid type

(TRY004)


58-58: Avoid specifying long messages outside the exception class

(TRY003)


74-74: datetime may be undefined, or defined from star imports

(F405)


77-77: datetime may be undefined, or defined from star imports

(F405)


77-77: timedelta may be undefined, or defined from star imports

(F405)


103-103: Prefer TypeError exception for invalid type

(TRY004)


103-103: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Loop control variable key not used within loop body

Rename unused key to _key

(B007)


108-108: datetime may be undefined, or defined from star imports

(F405)


110-110: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py

12-12: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


17-17: Unused method argument: page

(ARG002)


28-28: create_webdriver may be undefined, or defined from star imports

(F405)


45-45: Do not use bare except

(E722)


45-46: try-except-pass detected, consider logging the exception

(S110)


83-85: Abstract raise to an inner function

(TRY301)


83-85: Avoid specifying long messages outside the exception class

(TRY003)


102-102: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py

98-98: datetime may be undefined, or defined from star imports

(F405)


103-103: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py

33-33: Local variable uprn is assigned to but never used

Remove assignment to unused variable uprn

(F841)


38-38: create_webdriver may be undefined, or defined from star imports

(F405)

🔇 Additional comments (15)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)

1-2: Effective anti-bot strategy to fix Boston Borough runtime failures.

The addition of a custom user agent, Cloudflare bypass wait, and guarded cookie acceptance effectively addresses the Chrome/driver stacktrace issue mentioned in #1690. The approach is sound and consistent with similar fixes across other council modules in this PR.

Also applies to: 33-51

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (3)

87-92: ICS event fetching setup looks good.

The 60-day window is appropriate for bin collection schedules, and the icalevents.events() call correctly filters events within the date range.


106-113: Exception handling and resource cleanup implemented correctly.

The try-except-finally pattern ensures the WebDriver is properly cleaned up, and exceptions are logged before being re-raised for upstream handling.


96-105: Multi-collection processing logic is correct and ready.

The code properly splits comma-separated collection types into individual bin entries (addressing issue #1688) and correctly strips whitespace. The approach handles single and multiple collections consistently with similar patterns elsewhere in the codebase.

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (3)

1-1: LGTM!

The BeautifulSoup import is correctly added to support the new HTML parsing approach.


106-114: LGTM!

The error handling pattern is well-structured. The finally block ensures the driver is always closed, preventing resource leaks, and the exception is properly re-raised to preserve error information for upstream handling.


98-105: Add defensive validation before indexing split results.

Lines 99 and 102 both split text and index results without verifying array length. Line 99 splits by space expecting index [1] to contain the date; line 102 splits by "(" expecting index [0] to exist. If the website format changes again (as evidenced by the recent fix on 2025-10-21), both will raise IndexError.

While the broad try-except catches these errors, defensive validation—as used in WestLindseyDistrictCouncil and ChorleyCouncil—is more robust. Consider checking len(split_result) > expected_index before accessing indexed elements.

wiki/Councils.md (3)

788-789: Brighton URL doc update matches scraper.
Looks good and aligns with the new enviroservices endpoint.


2299-2300: Hounslow command updated to new service path.
Consistent with the API-driven implementation.


2888-2889: Norwich protocol typo fixed.
Correcting hhttps→https is correct.

uk_bin_collection/tests/input.json (5)

317-317: Brighton test URL updated.
Matches the scraper’s fixed URL; OK.


1389-1389: Hounslow test URL switched to my.hounslow.gov.uk.
Aligned with new API flow.


1765-1765: North Tyneside wiki_name tweak.
No functional impact; fine.


1801-1801: Norwich https override fixed.
Good catch; aligns with wiki.


1961-1961: <!-- [scratchpad_start]
[task verification_1/1]
[observations]

  • Review comment concerns: Rochdale UPRN handling, potential 12-digit zero-padding requirement
  • Grep found RochdaleCouncil.py implementation (lines 17-66)
  • RochdaleCouncil.py receives user_uprn and calls check_uprn() but does NOT apply zfill padding
  • check_uprn() in common.py only validates non-empty; no length enforcement
  • Other councils with explicit padding: KingsLynnandWestNorfolkBC (zfill(12)), WiganBoroughCouncil (zfill(12)), ReigateAndBansteadBoroughCouncil (zfill(12)), WiltshireCouncil (zfill(12)), MidDevonCouncil (zfill(12)), LeicesterCityCouncil (zfill(12)), HinckleyandBosworthBoroughCouncil (zfill(12)), MorayCouncil (zfill(8))
  • RochdaleCouncil.py line 62-66: UPRN sent directly in formValues without padding
  • Reviewer states "Change looks intentional per PR notes" and asks confirmation padding not needed
  • Cannot verify actual API requirements without access to PR notes or API spec
    [analysis]
  • RochdaleCouncil intentionally omits padding unlike majority of councils
  • This is either: (a) Rochdale API accepts non-padded UPRNs, or (b) oversight
  • Reviewer appears confident this is intentional but requests confirmation
  • No evidence in codebase of Rochdale API documentation or padding requirement comments
    [pending]
  • Cannot confirm if Rochdale API actually requires/forbids padding
  • PR notes not accessible for verification
  • No API documentation reference in RochdaleCouncil.py
    [actions]
  • Since reviewer affirms intentionality and asks only for confirmation that padding not required, flag as VERIFY
  • Note the inconsistency with other councils
  • Recommend manual verification of Rochdale API expectations
    [done]
    Analysis complete. Insufficient evidence to confirm or deny padding requirement.
    [scratchpad_end]
    -->

Verify that RochdaleCouncil API behavior aligns with non-padded UPRN implementation.

The codebase shows RochdaleCouncil passes UPRN values directly without zero-padding, unlike several other councils (Wigan, Reigate, Wiltshire, Leicester, etc.) that explicitly pad to 12 digits using zfill(12). The implementation difference is clear, but API requirements cannot be confirmed from the codebase alone.

Confirm: Does the Rochdale webforms API accept non-padded UPRNs, or should padding be added for consistency with councils that use zfill(12)?

Comment thread uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/DerbyCityCouncil.py
Comment thread uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py
Comment on lines +39 to +50
sid = session_data["auth-session"]
params = {
"id": "6846c784a46b5",
"repeat_against": "",
"noRetry": "true",
"getOnlyTokens": "undefined",
"log_id": "",
"app_name": "AF-Renderer::Self",
# unix_timestamp
"_": str(int(time.time() * 1000)),
"sid": sid,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate presence of session id before use.

Direct indexing can KeyError. Fail fast with a clearer message.

-        sid = session_data["auth-session"]
+        sid = session_data.get("auth-session")
+        if not sid:
+            raise RuntimeError("Missing 'auth-session' in session response from Rochdale")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sid = session_data["auth-session"]
params = {
"id": "6846c784a46b5",
"repeat_against": "",
"noRetry": "true",
"getOnlyTokens": "undefined",
"log_id": "",
"app_name": "AF-Renderer::Self",
# unix_timestamp
"_": str(int(time.time() * 1000)),
"sid": sid,
}
sid = session_data.get("auth-session")
if not sid:
raise RuntimeError("Missing 'auth-session' in session response from Rochdale")
params = {
"id": "6846c784a46b5",
"repeat_against": "",
"noRetry": "true",
"getOnlyTokens": "undefined",
"log_id": "",
"app_name": "AF-Renderer::Self",
# unix_timestamp
"_": str(int(time.time() * 1000)),
"sid": sid,
}
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py around lines
39 to 50, the code directly indexes session_data["auth-session"] which can raise
a KeyError; check for the key before using it (e.g. use
session_data.get("auth-session") or an explicit if "auth-session" not in
session_data) and if missing raise or return a clear, specific error message (or
log and abort) so the failure is explicit and easier to debug, then populate
params["sid"] from the validated value.

Comment on lines +55 to 60
data = r.json()
rows_data = data["integration"]["transformed"]["rows_data"]["0"]
if not isinstance(rows_data, dict):
raise ValueError("Invalid data returned from API")
token = rows_data["bartecToken"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Token extraction is brittle; guard nested keys and types.

Raw indexing into integration/transformed/rows_data/"0" can explode on shape drift.

-        data = r.json()
-        rows_data = data["integration"]["transformed"]["rows_data"]["0"]
-        if not isinstance(rows_data, dict):
-            raise ValueError("Invalid data returned from API")
-        token = rows_data["bartecToken"]
+        resp = r.json()
+        try:
+            rows_container = resp["integration"]["transformed"]["rows_data"]
+            first_row = rows_container.get("0") if isinstance(rows_container, dict) else rows_container[0]
+            if not isinstance(first_row, dict):
+                raise TypeError("rows_data[0] must be a dict")
+            token = first_row["bartecToken"]
+        except (KeyError, IndexError, TypeError) as e:
+            raise RuntimeError("Unexpected token response shape from Rochdale API") from e
🧰 Tools
🪛 Ruff (0.14.2)

58-58: Prefer TypeError exception for invalid type

(TRY004)


58-58: Avoid specifying long messages outside the exception class

(TRY003)

Comment on lines 70 to 74
collection_date_cards = WebDriverWait(driver, timeout).until(
EC.presence_of_all_elements_located(
(By.XPATH, '//div[@class = "card__content"]')
(By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused variable assignment.

The collection_date_cards variable is assigned but never used before being shadowed at line 84. The wait itself is still valuable to ensure the page has loaded, but consider using _ to indicate the result is intentionally unused.

Apply this diff:

-            collection_date_cards = WebDriverWait(driver, timeout).until(
+            _ = WebDriverWait(driver, timeout).until(
                 EC.presence_of_all_elements_located(
                     (By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
                 )
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
collection_date_cards = WebDriverWait(driver, timeout).until(
EC.presence_of_all_elements_located(
(By.XPATH, '//div[@class = "card__content"]')
(By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
)
)
_ = WebDriverWait(driver, timeout).until(
EC.presence_of_all_elements_located(
(By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
)
)
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py
around lines 70 to 74, the result of WebDriverWait is assigned to
collection_date_cards but that variable is never used and is shadowed later;
change the assignment to use a throwaway name (e.g., _ = WebDriverWait(driver,
timeout).until(...)) so the wait still occurs but the unused result is
explicitly ignored, avoiding the misleading variable and preventing shadowing at
line 84.

Comment thread uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py Outdated
Comment on lines +83 to +96
for collection_card in collection_cards:
collection_date_cards = collection_card.find_all(
"div", {"class": "card__content"}
)
dict_data = {
"type": waste_type.text,
"collectionDate": dt_collection_date.strftime(date_format),
}
data["bins"].append(dict_data)

for collection_date_card in collection_date_cards:

waste_type = collection_date_card.find(
"h3", {"class": "heading heading--sub heading--tiny"}
)

collection_date = collection_date_card.find(
"span", {"class": "card__date"}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add None checks for element lookups.

The find() calls on lines 90-92 and 94-96 return None if elements are not found. The subsequent code at lines 98-105 accesses .text without checking for None, which would raise AttributeError if the page structure doesn't match expectations.

Consider adding defensive checks:

                 for collection_date_card in collection_date_cards:
 
                     waste_type = collection_date_card.find(
                         "h3", {"class": "heading heading--sub heading--tiny"}
                     )
 
                     collection_date = collection_date_card.find(
                         "span", {"class": "card__date"}
                     )
+
+                    if not waste_type or not collection_date:
+                        continue
 
                     dt_collection_date = datetime.strptime(
                         collection_date.text.strip().split(" ")[1], source_date_format
                     )
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py
around lines 83 to 96, the .find() calls for waste_type and collection_date can
return None and later code accesses .text causing AttributeError; update the
parsing to defensively check each result (e.g., if waste_type is None or
collection_date is None) and skip that card (continue) or provide a safe default
before accessing .text, or use conditional extraction (e.g.,
element.get_text(strip=True) only when element is truthy) so the loop safely
handles missing elements.

@m26dvd m26dvd marked this pull request as draft October 29, 2025 10:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3efea and e2d12d7.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • create_webdriver (321-360)
🪛 Ruff (0.14.2)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

39-39: create_webdriver may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2)

1-40: Strong foundation for bot detection bypass.

The addition of a custom user agent and Selenium exception imports sets up robust handling for the Boston Borough Council site. The user agent string helps bypass bot detection mechanisms that were likely causing the runtime failures mentioned in issue #1690.


42-46: Effective Cloudflare bypass implementation.

The wait condition correctly handles Cloudflare's "Just a moment" challenge page with an appropriate 30-second timeout. The subsequent 3-second sleep allows the page to fully stabilize before interacting with form elements.

Removing the requirement for Selenium and moving to UPRN
@m26dvd m26dvd marked this pull request as ready for review October 30, 2025 09:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
uk_bin_collection/tests/input.json (1)

1555-1560: Update Middlesbrough instructions to drop Selenium requirement

The new Middlesbrough scraper no longer uses Selenium, but the wiki_note still tells users a webdriver is required. That will mislead anyone setting this up. Please amend the note (and, by extension, the generated wiki entry) to reflect the new API-based flow.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2d12d7 and 87a3ad4.

📒 Files selected for processing (5)
  • uk_bin_collection/tests/input.json (7 hunks)
  • uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py (1 hunks)
  • wiki/Councils.md (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (1)
  • check_paon (52-64)
uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (1)
  • check_uprn (67-78)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py (3)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (1)
  • check_uprn (67-78)
custom_components/uk_bin_collection/calendar.py (1)
  • event (54-63)
🪛 Ruff (0.14.2)
uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py

7-7: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


12-12: Unused method argument: page

(ARG002)


18-18: check_paon may be undefined, or defined from star imports

(F405)


30-30: Probable use of requests call without timeout

(S113)


39-39: f-string without any placeholders

Remove extraneous f prefix

(F541)


52-52: Probable use of requests call without timeout

(S113)


104-104: date_format may be undefined, or defined from star imports

(F405)


116-116: Consider moving this statement to an else block

(TRY300)

uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py

12-12: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


17-17: Unused method argument: page

(ARG002)


23-23: check_uprn may be undefined, or defined from star imports

(F405)


42-42: date_format may be undefined, or defined from star imports

(F405)

Comment on lines +33 to +40
for address in addresses:
if "place_id" in address:
place_id = address["place_id"]
break

if not place_id:
print(f"An error occurred: retrieving the address")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle missing place_id without crashing or returning the wrong type

When the address lookup comes back empty, place_id is never defined, so the if not place_id: branch raises UnboundLocalError and aborts the scraper. Even if it didn’t crash, the bare return would hand None back to the pipeline instead of the expected {"bins": []} structure. Please initialise place_id before the loop and return the empty data dict when nothing is found, e.g.:

+            place_id = None
             addresses = response.json()
             for address in addresses:
                 if "place_id" in address:
                     place_id = address["place_id"]
                     break

             if not place_id:
-                print(f"An error occurred: retrieving the address")
-                return
+                print("An error occurred: retrieving the address")
+                return data

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.2)

39-39: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py around
lines 33 to 40, the variable place_id may be referenced before assignment when
the address lookup returns no entries and the current bare return yields None;
initialize place_id (e.g. place_id = None) before the loop, keep scanning
addresses for a place_id, and if none is found return the expected empty data
structure (for example return {"bins": []}) instead of a bare return so the
scraper does not raise UnboundLocalError and the pipeline receives the correct
type.

@robbrad
Copy link
Copy Markdown
Owner

robbrad commented Nov 5, 2025

@m26dvd - happy with 2 failed, 10 passed ? Ok to merge?

also some good suggestions on from the rabbit above

…ow.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (2)

37-37: Consider removing debug print statement.

The print statement outputs collection times to stdout, which can clutter logs in production environments.

Apply this diff to remove it:

-            CollectTime = (collection["eventTime"]).split("T")[0]
-            print(CollectTime)
-
+            CollectTime = (collection["eventTime"]).split("T")[0]

28-31: Consider adding error handling for the HTTP request and JSON parsing.

The current implementation lacks error handling for network failures, non-200 responses, or invalid JSON, which could cause unhandled exceptions in Home Assistant integrations.

Consider adding status code validation and exception handling:

         headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}
         response = requests.get(URI, headers=headers, timeout=30)
+        response.raise_for_status()
 
         # Parse the JSON response
         bin_collection = response.json()

Alternatively, wrap the entire request block in a try-except for more granular error handling:

try:
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}
    response = requests.get(URI, headers=headers, timeout=30)
    response.raise_for_status()
    bin_collection = response.json()
except requests.exceptions.RequestException as e:
    # Handle network/HTTP errors
    raise Exception(f"Failed to retrieve bin data from Harrow: {e}")
except ValueError as e:
    # Handle JSON parsing errors
    raise Exception(f"Failed to parse bin data from Harrow: {e}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87a3ad4 and 1baa677.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (1)

27-28: LGTM! Previous review concern addressed.

The addition of the timeout parameter and User-Agent header resolves the hanging request issue flagged in the previous review.

m26dvd and others added 2 commits November 5, 2025 09:11
…cil.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…ouncil.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)

81-81: UNFIXED: Hardcoded calendar year will break in April 2026.

This issue was previously flagged but remains unresolved. The hardcoded 2025-26 string will cause the scraper to fail when the council publishes the 2026-27 calendar (likely around April 2026).

Please implement the dynamic calendar year calculation suggested in the previous review, or add fallback logic to try multiple years if the URL returns 404.

Example implementation for dynamic year:

# Determine calendar year (Apr-Mar boundary)
current_year = datetime.now().year
current_month = datetime.now().month
if current_month >= 4:
    calendar_year = f"{current_year}-{str(current_year + 1)[2:]}"
else:
    calendar_year = f"{current_year - 1}-{str(current_year)[2:]}"
    
ics_url = f"https://www.chelmsford.gov.uk/media/t03c4mik/{day}-{letter}-{calendar_year}.ics"
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)

45-47: Consider removing unused exception variable.

The exception variable e is captured but never used. Since the exception is being silently suppressed (which may be acceptable for optional cookie banners), you can simplify to just except Exception:.

Apply this diff:

-            except Exception as e:
+            except Exception:
                 # Cookie banner not present or already accepted
                 pass
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1baa677 and 2715ff4.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • create_webdriver (321-360)
🪛 Ruff (0.14.3)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py

12-12: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


17-17: Unused method argument: page

(ARG002)


28-28: create_webdriver may be undefined, or defined from star imports

(F405)


45-47: try-except-pass detected, consider logging the exception

(S110)


45-45: Do not catch blind exception: Exception

(BLE001)


45-45: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


84-86: Abstract raise to an inner function

(TRY301)


84-86: Avoid specifying long messages outside the exception class

(TRY003)


103-103: date_format may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (2)

21-28: Previous issue fixed: user agent now correctly passed as string.

The user agent is now properly defined as a string variable and passed correctly to create_webdriver, resolving the previous critical issue where a dict was incorrectly passed.


95-106: Excellent: Multi-bin split logic correctly implements PR objective.

The new logic that splits event.summary by comma and creates separate bin entries addresses issue #1688 perfectly. Each collection type now reports independently with its own entry, which aligns with the requirement to split bins into different sensors.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2)

70-74: Unused variable assignment.

The collection_date_cards variable is assigned but never used before being shadowed at line 84. The wait itself is valuable for ensuring page load, but the variable should use _ to indicate the result is intentionally unused.

Apply this diff:

-            collection_date_cards = WebDriverWait(driver, timeout).until(
+            _ = WebDriverWait(driver, timeout).until(
                 EC.presence_of_all_elements_located(
                     (By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
                 )
             )

87-104: Add None checks for element lookups to prevent AttributeError.

The find() calls at lines 89-95 can return None if elements are not found. Lines 98 and 101 access .text without checking for None, which will raise AttributeError if the page structure doesn't match expectations. This is especially critical given that issue #1641 specifically mentions addressing "no-such-element" errors for this council.

Apply this diff to add defensive checks:

                 for collection_date_card in collection_date_cards:
 
                     waste_type = collection_date_card.find(
                         "h3", {"class": "heading heading--sub heading--tiny"}
                     )
 
                     collection_date = collection_date_card.find(
                         "span", {"class": "card__date"}
                     )
+
+                    if not waste_type or not collection_date:
+                        continue
 
                     dt_collection_date = datetime.strptime(
                         collection_date.text.strip().split(" ")[1], source_date_format
                     )
                     dict_data = {
                         "type": waste_type.text.strip().split("(")[0].strip(),
                         "collectionDate": dt_collection_date.strftime(date_format),
                     }
                     data["bins"].append(dict_data)
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (1)

7-7: Consider explicit imports instead of star import.

The star import at line 7 makes it unclear what symbols are being imported (e.g., datetime, date_format). While this pattern appears consistent across the codebase, explicit imports improve code clarity and help avoid potential naming conflicts.

Example of explicit imports:

from uk_bin_collection.uk_bin_collection.common import (
    check_paon,
    check_postcode,
    create_webdriver,
    date_format,
    datetime,
)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2715ff4 and 7ba0cc9.

📒 Files selected for processing (2)
  • uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2 hunks)
  • uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • create_webdriver (321-360)
🪛 Ruff (0.14.3)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

39-39: create_webdriver may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py

98-98: datetime may be undefined, or defined from star imports

(F405)


103-103: date_format may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (5)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (3)

1-8: LGTM! Imports support the enhanced error handling.

The added time module and specific Selenium exceptions are all used appropriately in the cookie banner handling and page load stabilization logic below.


38-46: LGTM! Cloudflare bypass and user agent properly implemented.

The user agent string and Cloudflare protection bypass logic are well-structured. The 30-second timeout is appropriate for waiting out Cloudflare checks (distinct from the faster cookie banner timeout below), and the title condition correctly waits for a non-empty title that's not "Just a moment".


48-61: Past review comment addressed; implementation looks solid.

The timeout has been reduced to 10 seconds as suggested in the previous review, and the exception handling correctly covers all relevant failure modes (timeout, element not found, click intercepted). The graceful fallback with pass ensures the scraper continues even when the cookie banner is absent.

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2)

1-1: LGTM: BeautifulSoup import added.

The new import is necessary for the updated parsing strategy and correctly replaces the previous Selenium-only approach.


76-80: LGTM: BeautifulSoup parsing setup.

The parsing setup correctly uses the driver's page source and targets the same elements that were waited for in the previous step.

@m26dvd
Copy link
Copy Markdown
Contributor Author

m26dvd commented Nov 5, 2025

@m26dvd - happy with 2 failed, 10 passed ? Ok to merge?

also some good suggestions on from the rabbit above

@robbrad Happy to merge now. I've accepted some of the suggestions from rabbit although it commits changes with incorrect lint messages

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py (1)

25-40: Review header values for accuracy and maintainability.

The headers include Chrome version "141" (line 32), which doesn't exist yet. Additionally, the headers are very detailed with browser fingerprinting information that may be brittle and require frequent updates.

Consider:

  • Using a realistic, current Chrome version
  • Simplifying headers to only those required by the server
  • Testing whether all these headers are necessary for successful requests

If only basic headers are needed, consider simplifying to:

 headers = {
-    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
-    "accept-language": "en-GB,en;q=0.9",
-    "cache-control": "max-age=0",
-    "dnt": "1",
-    "priority": "u=0, i",
+    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
     "referer": "https://www.southampton.gov.uk",
-    "sec-ch-ua": '"Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"',
-    "sec-ch-ua-mobile": "?0",
-    "sec-ch-ua-platform": '"Windows"',
-    "sec-fetch-dest": "document",
-    "sec-fetch-mode": "navigate",
-    "sec-fetch-site": "same-origin",
-    "upgrade-insecure-requests": "1",
-    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
 }

Test to ensure the simplified headers still work with the server.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ba0cc9 and 75cbf30.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.3)
uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py

46-46: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)

Comment on lines +42 to +44
params = {
"UPRN": {user_uprn},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Fix set syntax in params dictionary.

Line 43 uses {user_uprn} which creates a Python set, not a string value. This will cause the UPRN to be sent as a string representation of a set (e.g., "{'12345'}") instead of the actual UPRN value, breaking the API request.

Apply this diff to fix the syntax:

 params = {
-    "UPRN": {user_uprn},
+    "UPRN": user_uprn,
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
params = {
"UPRN": {user_uprn},
}
params = {
"UPRN": user_uprn,
}
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py around
lines 42 to 44, the params dict uses {user_uprn} which creates a Python set
rather than passing the UPRN value; replace the set literal with the actual
value (e.g., use user_uprn or str(user_uprn) if the API expects a string) so the
"UPRN" key holds the correct scalar/string value instead of a set.

Comment on lines +46 to 50
r = requests.get(
"https://www.southampton.gov.uk/whereilive/waste-calendar",
params=params,
headers=headers,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to prevent indefinite hangs.

The requests.get call is missing a timeout parameter, which can cause the request to hang indefinitely if the server becomes unresponsive. This is especially problematic in automated systems like Home Assistant.

Apply this diff to add a timeout:

 r = requests.get(
     "https://www.southampton.gov.uk/whereilive/waste-calendar",
     params=params,
     headers=headers,
+    timeout=30,
 )

Based on static analysis hints.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
r = requests.get(
"https://www.southampton.gov.uk/whereilive/waste-calendar",
params=params,
headers=headers,
)
r = requests.get(
"https://www.southampton.gov.uk/whereilive/waste-calendar",
params=params,
headers=headers,
timeout=30,
)
🧰 Tools
🪛 Ruff (0.14.3)

46-46: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py around
lines 46 to 50, the requests.get call lacks a timeout which can cause indefinite
hangs; add a timeout parameter (for example timeout=10) to the requests.get(...)
call to bound the request duration and prevent blocking, and update any similar
requests in this file to include the same timeout.

@robbrad robbrad changed the base branch from master to November_release November 8, 2025 09:02
@robbrad robbrad merged commit d6ecb0b into robbrad:November_release Nov 8, 2025
12 of 14 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment