fix: Council Fix Pack - November 2025 by m26dvd · Pull Request #1679 · robbrad/UKBinCollectionData

m26dvd · 2025-10-21T21:43:06Z

New Councils
fix: #1229 - Dumfries and Galloway Council

BREAKING FIXES
fix: #1675 fix: #1259 - Rochdale Council - This is an updated scraper that relies on the UPRN only. The bin types may be named differently
fix: #1688 - Chelmsford City Council - Bins are now split into different sensors instead of all in one
Newport City Council - Removing the need for Selenium and moving to UPRN based lookup

Fixes
fix: #1653 - Norwich City Council
fix: #1641 - Wokingham Borough Council
fix: #1621 - London Borough of Harrow
fix: #1625 - Hart District Council
fix: #1685 - Brighton & Hove - New URL
fix: #1683 - London Borough of Hounslow
fix: #1676 - Derby City Council
fix: #1690 - Boston Borough Council
fix: #1382 - Middlesborough Council - Removed the requirement for Selenium
fix: #1698 - Southampton City Council

Fixed previously but Issues not closed off
fix: #1468 - West Oxfordshire Council - fixed by #1527
fix: #1454 - Fixed by #1495

Summary by CodeRabbit

New Features
- Added Dumfries & Galloway council parser.
Bug Fixes / Changes
- Updated Brighton & Hove, Hounslow and Norwich URLs; corrected Rochdale identifier.
- Newport now uses UPRN-based lookup.
- Multiple councils (Middlesbrough, Hounslow, Rochdale, Newport, others) moved to API-driven retrieval, reducing browser automation.
- Parse improvements: emit separate entries for multiple bin types; improved page-load and cookie handling; request parameter adjustments for Southampton.
Documentation
- Updated council lookup examples and notes.

fix: robbrad#1675 fix: robbrad#1259

coderabbitai · 2025-10-21T21:43:17Z

Walkthrough

Multiple council parsers, tests, and docs updated: new Dumfries & Galloway parser added; several councils migrated from Selenium/HTML scraping to API/ICS/encrypted HTTP flows; multiple URLs and UPRNs corrected; cookie/Cloudflare waits, request headers, and multi-bin splitting logic added.

Changes

Cohort / File(s)	Summary
Test & Docs `uk_bin_collection/tests/input.json`, `wiki/Councils.md`	Added Dumfries & Galloway entry; corrected Brighton & Hove, Hounslow URLs; fixed Norwich typo; updated CLI examples/notes; adjusted UPRNs and removed some `web_driver` flags; minor formatting tweaks.
New Council (ICS) `uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py`	New parser using ICS events: validates UPRN, fetches ICS, splits event summaries into multiple collection types, returns {"bins":[...]} entries.
Selenium reliability & cookies `uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py`	Added user-agent, Cloudflare title wait, guarded cookie-accept flow with try/except and sleeps to stabilise interactions.
Hardcoded URL update `uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py`	Replaced dynamic URL retrieval with hardcoded enviroservices.brighton-hove.gov.uk collections URL.
ICS / multi-collection handling `uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py`	Added icalevents usage and ICS parsing; supports multiple collections per ICS event by splitting summaries and emitting separate entries.
Endpoint path change `uk_bin_collection/uk_bin_collection/councils/DerbyCityCouncil.py`	Updated Derby endpoint path from `binday/Binday?search.PremisesId=` to `binday/BinDays/{uprn}`.
API-driven refactors (token/session) `uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py`, `uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py`	Replaced HTML/Soup/Selenium flows with session/token-based API calls and JSON parsing; builds {"bins":[...]} results.
Request header added `uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py`	Added User-Agent header and 30s timeout to GET requests.
Recollect API (no Selenium) `uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py`	Replaced Selenium/BeautifulSoup with Recollect address-suggest and place_calendar.json API calls; parse JSON to build bins.
Encrypted HTTP flow (Newport) `uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py`	Rewrote to use AES-CBC encode/decode helpers and encrypted HTTP POSTs (NewportInput dataclass, encode/decode helpers); removed Selenium/HTML flow; added cryptography + network logic.
Selenium → BeautifulSoup parsing `uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py`	Replaced Selenium DOM traversal with BeautifulSoup parsing of page source and updated selectors to iterate nested collection/date blocks.
Row multi-bin splitting `uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py`	Refactored to split multi-type cells (using "&") and emit separate entries per bin type with the same collection date.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant SessionAPI as Session/Token API (Hounslow/Rochdale)
    participant Recollect as Recollect API (Middlesbrough)
    participant ICS as ICS Service (Chelmsford/Dumfries)
    participant Encrypted as Newport Encrypted API

    Client->>SessionAPI: GET session page (extract token/session_id)
    SessionAPI->>SessionAPI: return token
    Client->>SessionAPI: POST with token + payload
    SessionAPI-->>Client: JSON response -> parse -> build {"bins":[...]}

    Client->>Recollect: POST address-suggest (paon/postcode)
    Recollect-->>Client: place_id
    Client->>Recollect: GET place_calendar.json(place_id)
    Recollect-->>Client: JSON -> parse -> {"bins":[...]}

    Client->>ICS: GET .ics URL (uprn)
    ICS-->>Client: calendar events
    Client->>Client: split event summaries -> {"bins":[...]}

    Client->>Encrypted: build NewportInput -> AES encrypt -> POST
    Encrypted-->>Client: hex response -> decrypt -> {"bins":[...]}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Focus items:
- Session/token lifecycle, headers, retries, and error handling in Hounslow and Rochdale API flows.
- Newport AES encode/decode correctness (padding, key/IV, error handling).
- ICS parsing and multi-collection splitting (Chelmsford, Dumfries) and consistent date formatting.
- Selenium→BeautifulSoup and added sleeps/waits (Boston, Wokingham) for selector accuracy and flakiness.
- Ensure hardcoded URLs and removed web_driver fields align with test inputs and docs.

Possibly related PRs

Resolve merge messages found in SomersetCouncil, NewportCityCouncil, and TestValleyCouncil. #1660 — Modifies NewportCityCouncil.py; likely overlaps or conflicts with this PR’s Newport encrypted HTTP rewrite.

Suggested reviewers

dp247

Poem

🐇
I hopped through tokens, ICS, and queues,
swapped browsers for APIs and tidy news,
I toasted cookies, fixed a URL race,
split multi-bins and cleaned the chase,
now bin days march home — with carrot-powered grace. 🥕🗑️

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: Council Fix Pack - November 2025' accurately reflects the changeset as a comprehensive council-wide fixes bundle addressing multiple council parsers and configuration updates.
Linked Issues check	✅ Passed	The PR addresses all linked issues: #1229 adds Dumfries & Galloway, #1675/#1259 fix Rochdale via API rewrite, #1688 splits Chelmsford bins, #1653 fixes Norwich URL typo, #1641 updates Wokingham selectors, #1621 adds Harrow AJAX support, #1625 fixes Hart bin_type error, #1685 updates Brighton URL, #1683 updates Hounslow endpoint, #1676 fixes Derby API path, #1690 hardens Boston error handling, #1382 updates Middlesbrough to API, and #1468/#1454 preserve prior fixes.
Out of Scope Changes check	✅ Passed	All changes are within scope: council configuration updates (input.json, wiki), council parser fixes/rewrites (BostonBoroughCouncil, BrightonandHoveCityCouncil, Chelmsford, Derby, Hart, Harrow, Hounslow, Middlesbrough, Newport, Rochdale, Wokingham), and a new Dumfries & Galloway parser directly address the linked issues.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-21T21:45:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (475e160) to head (75cbf30).
⚠️ Report is 33 commits behind head on November_release.

Additional details and impacted files

@@                Coverage Diff                @@
##           November_release    #1679   +/-   ##
=================================================
  Coverage             86.79%   86.79%           
=================================================
  Files                     9        9           
  Lines                  1136     1136           
=================================================
  Hits                    986      986           
  Misses                  150      150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix: robbrad#1653

fix: robbrad#1641

fix: robbrad#1621

fix: robbrad#1625

fix: robbrad#1685 - New URL

fix: robbrad#1683

fix: robbrad#1676

fix: robbrad#1688 - BREAKING CHANGE

fix: robbrad#1690

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py (3)

25-26: Add error handling for the API request.

The code lacks error handling for network failures and invalid JSON responses. If the API is unreachable or returns malformed data, this will cause unhandled exceptions.

Consider adding error handling:

-        response = requests.get(URI)
-        response_table = response.json()
+        try:
+            response = requests.get(URI, timeout=10)
+            response.raise_for_status()
+            response_table = response.json()
+        except requests.RequestException as e:
+            raise ValueError(f"Failed to fetch bin collection data: {e}")
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON response: {e}")

28-28: Validate the JSON response structure.

Accessing response_table[0]["data"] without validation can raise IndexError or KeyError if the response structure is unexpected.

Add validation:

+        if not response_table or not isinstance(response_table, list):
+            raise ValueError("Invalid response structure: expected non-empty list")
+        if "data" not in response_table[0]:
+            raise ValueError("Invalid response structure: missing 'data' key")
+        
         soup = BeautifulSoup(response_table[0]["data"], "html.parser")

54-71: Add error handling for date parsing.

datetime.strptime will raise ValueError if the date string doesn't match the expected format "%d %B". This can happen if the council changes their date format or returns unexpected data.

Add error handling:

 def format_date(self, date_str):
     # Get the current date and year
     current_date = datetime.now()
     current_year = current_date.year
 
-    # Parse the provided date string (e.g. "23 January")
-    date_obj = datetime.strptime(date_str, "%d %B")
+    try:
+        # Parse the provided date string (e.g. "23 January")
+        date_obj = datetime.strptime(date_str, "%d %B")
+    except ValueError as e:
+        raise ValueError(f"Failed to parse date '{date_str}': {e}")
 
     # Check if the provided date has already passed this year
     if date_obj.replace(year=current_year) < current_date:
         # If the date has passed this year, assume the next year
         date_obj = date_obj.replace(year=current_year + 1)
     else:
         # Otherwise, use the current year
         date_obj = date_obj.replace(year=current_year)
 
     # Format the date in "DD/MM/YYYY" format
     return date_obj.strftime("%d/%m/%Y")

🧹 Nitpick comments (9)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)

83-85: Consider simplifying the exception message.

The error message includes user data (address) which could be simplified. However, including the address is useful for debugging, so this is optional.

If desired, you could simplify to:
-                raise ValueError(
-                    f"Could not find collection round for address: {user_paon}"
-                )
+                raise ValueError("Could not find collection round for the specified address")

uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)

32-39: Use caller-provided URL when present; remove unused vars.

Hardcoding the URL blocks overrides and complicates testing. Prefer fallback to the fixed URL. Also, uprn and headers are unused here.

Apply:
-            url = "https://enviroservices.brighton-hove.gov.uk/link/collections"
+            url = kwargs.get("url") or "https://enviroservices.brighton-hove.gov.uk/link/collections"
             uprn = kwargs.get("uprn")
@@
-            driver.get(url)
+            driver.get(url)
Outside this hunk:

Drop the unused headers variable at Line 31.

If not used elsewhere in this file, remove import requests at Line 7 and the uprn local (or rename to _uprn) to satisfy lint.

Please confirm tests don’t rely on overriding the URL for this council.

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py (1)

1-4: Make imports explicit and define a request timeout constant.

Avoid relying on star-imports for stdlib and date symbols; define a single timeout source.

Apply:
 import time
+import json
+from datetime import datetime, timedelta
@@
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import *  # date_format, check_uprn (kept for consistency)
+
+TIMEOUT = 20  # seconds

uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py (6)

18-21: UPRN-only: make postcode optional; don’t fail when postcode is absent.

PR objective states Rochdale now relies on UPRN only. Remove mandatory postcode validation and include postcode in payload only if provided.

-        user_postcode = kwargs.get("postcode")
+        user_postcode = kwargs.get("postcode")
         user_uprn = kwargs.get("uprn")
-        check_postcode(user_postcode)
         check_uprn(user_uprn)

Build the form values conditionally:

-        data = {
+        payload = {
             "formValues": {
                 "Location details": {
                     "propertyUPRN": {
                         "value": user_uprn,
                     },
-                    "postcode_search": {
-                        "value": user_postcode,
-                    },
                     "bartecToken": {
                         "value": token,
                     },

Then append postcode only when present:

+        form_loc = payload["formValues"]["Location details"]
+        if user_postcode:
+            form_loc["postcode_search"] = {"value": user_postcode}

Please confirm the backend truly doesn’t require postcode.
Expected: UPRN-only works for all tested addresses.

Also applies to: 67-69

61-83: Variable naming: avoid reusing data/params; use distinct names.

Reusing data and params for different payloads reduces clarity. Rename to payload, token_params, lookup_params, and resp.

-        data = {
+        payload = {
             "formValues": {
@@
-        params = {
+        lookup_params = {
             "id": "686e9147a867e",
@@
-        r = s.post(API_URL, json=data, headers=headers, params=params)
+        r = s.post(API_URL, json=payload, headers=headers, params=lookup_params)
         r.raise_for_status()
-
-        data = r.json()
-        rows_data = data["integration"]["transformed"]["rows_data"]
+        resp = r.json()
+        rows_data = resp["integration"]["transformed"]["rows_data"]

Also applies to: 97-101

74-81: Use date.today() for date fields; drop time component concerns.

Dates are date-only; using datetime.now() is unnecessary and can cause TZ/DST confusion.

-                    "dateMinimum": {
-                        "value": datetime.now().strftime("%Y-%m-%d"),
-                    },
+                    "dateMinimum": {"value": date.today().strftime("%Y-%m-%d")},
@@
-                    "dateMaximum": {
-                        "value": (datetime.now() + timedelta(days=30)).strftime(
-                            "%Y-%m-%d"
-                        ),
-                    },
+                    "dateMaximum": {"value": (date.today() + timedelta(days=30)).strftime("%Y-%m-%d")},

57-59: Exception types and messages: prefer TypeError for type mismatches; keep messages brief.

Aligns with Ruff TRY004/TRY003 hints.

-        if not isinstance(rows_data, dict):
-            raise ValueError("Invalid data returned from API")
+        if not isinstance(rows_data, dict):
+            raise TypeError("rows_data must be dict")

@@
-        if not isinstance(rows_data, dict):
-            raise ValueError("Invalid data returned from API")
+        if not isinstance(rows_data, dict):
+            raise TypeError("rows_data must be dict")

Also applies to: 102-104

105-113: Unused loop var; resilient ISO date parsing (handles Z/offset).

Rename unused key and parse ISO timestamps that may include timezone.

-        for key, value in rows_data.items():
+        for _, value in rows_data.items():
             dict_data = {
                 "type": value["bartecBinType"],
-                "collectionDate": datetime.strptime(
-                    value["bartecBinStartDate"], "%Y-%m-%dT%H:%M:%S"
-                ).strftime(date_format),
+                "collectionDate": (
+                    # supports '...Z' or '+HH:MM' by normalizing to fromisoformat
+                    (datetime.fromisoformat(value["bartecBinStartDate"].replace("Z", "+00:00")))
+                ).strftime(date_format),
             }
             bindata["bins"].append(dict_data)

If some rows lack time zone and still fail, add a small fallback:

try:
    dt = datetime.fromisoformat(value["bartecBinStartDate"].replace("Z", "+00:00"))
except ValueError:
    dt = datetime.strptime(value["bartecBinStartDate"], "%Y-%m-%dT%H:%M:%S")

28-35: Header hygiene: set a project UA.

Optional: use a descriptive User-Agent for observability and to aid council ops.

-            "User-Agent": "Mozilla/5.0",
+            "User-Agent": "UKBinCollectionData/rochdale (requests)",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 475e160 and 2d3efea.

📒 Files selected for processing (11)

uk_bin_collection/tests/input.json (5 hunks)
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/DerbyCityCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/HartDistrictCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2 hunks)
wiki/Councils.md (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

🪛 Ruff (0.14.2)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

34-34: create_webdriver may be undefined, or defined from star imports

(F405)

50-50: Do not use bare except

(E722)

50-51: try-except-pass detected, consider logging the exception

(S110)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHounslow.py

56-56: Prefer TypeError exception for invalid type

(TRY004)

56-56: Avoid specifying long messages outside the exception class

(TRY003)

69-69: datetime may be undefined, or defined from star imports

(F405)

72-72: datetime may be undefined, or defined from star imports

(F405)

72-72: timedelta may be undefined, or defined from star imports

(F405)

98-98: Prefer TypeError exception for invalid type

(TRY004)

98-98: Avoid specifying long messages outside the exception class

(TRY003)

100-100: json may be undefined, or defined from star imports

(F405)

105-105: datetime may be undefined, or defined from star imports

(F405)

107-107: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py

28-28: Probable use of requests call without timeout

(S113)

uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py

58-58: Prefer TypeError exception for invalid type

(TRY004)

58-58: Avoid specifying long messages outside the exception class

(TRY003)

74-74: datetime may be undefined, or defined from star imports

(F405)

77-77: datetime may be undefined, or defined from star imports

(F405)

77-77: timedelta may be undefined, or defined from star imports

(F405)

103-103: Prefer TypeError exception for invalid type

(TRY004)

103-103: Avoid specifying long messages outside the exception class

(TRY003)

105-105: Loop control variable key not used within loop body

Rename unused key to _key

(B007)

108-108: datetime may be undefined, or defined from star imports

(F405)

110-110: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py

12-12: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

17-17: Unused method argument: page

(ARG002)

28-28: create_webdriver may be undefined, or defined from star imports

(F405)

45-45: Do not use bare except

(E722)

45-46: try-except-pass detected, consider logging the exception

(S110)

83-85: Abstract raise to an inner function

(TRY301)

83-85: Avoid specifying long messages outside the exception class

(TRY003)

102-102: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py

98-98: datetime may be undefined, or defined from star imports

(F405)

103-103: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py

33-33: Local variable uprn is assigned to but never used

Remove assignment to unused variable uprn

(F841)

38-38: create_webdriver may be undefined, or defined from star imports

(F405)

🔇 Additional comments (15)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)

1-2: Effective anti-bot strategy to fix Boston Borough runtime failures.

The addition of a custom user agent, Cloudflare bypass wait, and guarded cookie acceptance effectively addresses the Chrome/driver stacktrace issue mentioned in #1690. The approach is sound and consistent with similar fixes across other council modules in this PR.

Also applies to: 33-51

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (3)

87-92: ICS event fetching setup looks good.

The 60-day window is appropriate for bin collection schedules, and the icalevents.events() call correctly filters events within the date range.

106-113: Exception handling and resource cleanup implemented correctly.

The try-except-finally pattern ensures the WebDriver is properly cleaned up, and exceptions are logged before being re-raised for upstream handling.

96-105: Multi-collection processing logic is correct and ready.

The code properly splits comma-separated collection types into individual bin entries (addressing issue #1688) and correctly strips whitespace. The approach handles single and multiple collections consistently with similar patterns elsewhere in the codebase.

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (3)

1-1: LGTM!

The BeautifulSoup import is correctly added to support the new HTML parsing approach.

106-114: LGTM!

The error handling pattern is well-structured. The finally block ensures the driver is always closed, preventing resource leaks, and the exception is properly re-raised to preserve error information for upstream handling.

98-105: Add defensive validation before indexing split results.

Lines 99 and 102 both split text and index results without verifying array length. Line 99 splits by space expecting index [1] to contain the date; line 102 splits by "(" expecting index [0] to exist. If the website format changes again (as evidenced by the recent fix on 2025-10-21), both will raise IndexError.

While the broad try-except catches these errors, defensive validation—as used in WestLindseyDistrictCouncil and ChorleyCouncil—is more robust. Consider checking len(split_result) > expected_index before accessing indexed elements.

wiki/Councils.md (3)

788-789: Brighton URL doc update matches scraper.
Looks good and aligns with the new enviroservices endpoint.

2299-2300: Hounslow command updated to new service path.
Consistent with the API-driven implementation.

2888-2889: Norwich protocol typo fixed.
Correcting hhttps→https is correct.

uk_bin_collection/tests/input.json (5)

317-317: Brighton test URL updated.
Matches the scraper’s fixed URL; OK.

1389-1389: Hounslow test URL switched to my.hounslow.gov.uk.
Aligned with new API flow.

1765-1765: North Tyneside wiki_name tweak.
No functional impact; fine.

1801-1801: Norwich https override fixed.
Good catch; aligns with wiki.

1961-1961: 

Verify that RochdaleCouncil API behavior aligns with non-padded UPRN implementation.

The codebase shows RochdaleCouncil passes UPRN values directly without zero-padding, unlike several other councils (Wigan, Reigate, Wiltshire, Leicester, etc.) that explicitly pad to 12 digits using zfill(12). The implementation difference is clear, but API requirements cannot be confirmed from the codebase alone.

Confirm: Does the Rochdale webforms API accept non-padded UPRNs, or should padding be added for consistency with councils that use zfill(12)?

coderabbitai · 2025-10-29T09:47:09Z

+        sid = session_data["auth-session"]
+        params = {
+            "id": "6846c784a46b5",
+            "repeat_against": "",
+            "noRetry": "true",
+            "getOnlyTokens": "undefined",
+            "log_id": "",
+            "app_name": "AF-Renderer::Self",
+            # unix_timestamp
+            "_": str(int(time.time() * 1000)),
+            "sid": sid,
+        }


⚠️ Potential issue | 🟠 Major

Validate presence of session id before use.

Direct indexing can KeyError. Fail fast with a clearer message.

- sid = session_data["auth-session"] + sid = session_data.get("auth-session") + if not sid: + raise RuntimeError("Missing 'auth-session' in session response from Rochdale")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sid = session_data["auth-session"]

params = {

"id": "6846c784a46b5",

"repeat_against": "",

"noRetry": "true",

"getOnlyTokens": "undefined",

"log_id": "",

"app_name": "AF-Renderer::Self",

# unix_timestamp

"_": str(int(time.time() * 1000)),

"sid": sid,

}

sid = session_data.get("auth-session")

if not sid:

raise RuntimeError("Missing 'auth-session' in session response from Rochdale")

params = {

"id": "6846c784a46b5",

"repeat_against": "",

"noRetry": "true",

"getOnlyTokens": "undefined",

"log_id": "",

"app_name": "AF-Renderer::Self",

# unix_timestamp

"_": str(int(time.time() * 1000)),

"sid": sid,

}

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/RochdaleCouncil.py around lines 39 to 50, the code directly indexes session_data["auth-session"] which can raise a KeyError; check for the key before using it (e.g. use session_data.get("auth-session") or an explicit if "auth-session" not in session_data) and if missing raise or return a clear, specific error message (or log and abort) so the failure is explicit and easier to debug, then populate params["sid"] from the validated value.

coderabbitai · 2025-10-29T09:47:09Z

+        data = r.json()
+        rows_data = data["integration"]["transformed"]["rows_data"]["0"]
+        if not isinstance(rows_data, dict):
+            raise ValueError("Invalid data returned from API")
+        token = rows_data["bartecToken"]



🛠️ Refactor suggestion | 🟠 Major

Token extraction is brittle; guard nested keys and types.

Raw indexing into integration/transformed/rows_data/"0" can explode on shape drift.

- data = r.json() - rows_data = data["integration"]["transformed"]["rows_data"]["0"] - if not isinstance(rows_data, dict): - raise ValueError("Invalid data returned from API") - token = rows_data["bartecToken"] + resp = r.json() + try: + rows_container = resp["integration"]["transformed"]["rows_data"] + first_row = rows_container.get("0") if isinstance(rows_container, dict) else rows_container[0] + if not isinstance(first_row, dict): + raise TypeError("rows_data[0] must be a dict") + token = first_row["bartecToken"] + except (KeyError, IndexError, TypeError) as e: + raise RuntimeError("Unexpected token response shape from Rochdale API") from e

🧰 Tools

🪛 Ruff (0.14.2)

58-58: Prefer TypeError exception for invalid type

(TRY004)

58-58: Avoid specifying long messages outside the exception class

(TRY003)

coderabbitai · 2025-10-29T09:47:10Z

            collection_date_cards = WebDriverWait(driver, timeout).until(
                EC.presence_of_all_elements_located(
-                    (By.XPATH, '//div[@class = "card__content"]')
+                    (By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
                )
            )


⚠️ Potential issue | 🟡 Minor

Unused variable assignment.

The collection_date_cards variable is assigned but never used before being shadowed at line 84. The wait itself is still valuable to ensure the page has loaded, but consider using _ to indicate the result is intentionally unused.

Apply this diff:

- collection_date_cards = WebDriverWait(driver, timeout).until( + _ = WebDriverWait(driver, timeout).until( EC.presence_of_all_elements_located( (By.XPATH, '//div[@class = "card card--waste card--blue-light"]') ) )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

collection_date_cards = WebDriverWait(driver, timeout).until(

EC.presence_of_all_elements_located(

(By.XPATH, '//div[@class = "card__content"]')

(By.XPATH, '//div[@class = "card card--waste card--blue-light"]')

)

)

_ = WebDriverWait(driver, timeout).until(

EC.presence_of_all_elements_located(

(By.XPATH, '//div[@class = "card card--waste card--blue-light"]')

)

)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py around lines 70 to 74, the result of WebDriverWait is assigned to collection_date_cards but that variable is never used and is shadowed later; change the assignment to use a throwaway name (e.g., _ = WebDriverWait(driver, timeout).until(...)) so the wait still occurs but the unused result is explicitly ignored, avoiding the misleading variable and preventing shadowing at line 84.

coderabbitai · 2025-10-29T09:47:10Z

+            for collection_card in collection_cards:
+                collection_date_cards = collection_card.find_all(
+                    "div", {"class": "card__content"}
                )
-                dict_data = {
-                    "type": waste_type.text,
-                    "collectionDate": dt_collection_date.strftime(date_format),
-                }
-                data["bins"].append(dict_data)
+
+                for collection_date_card in collection_date_cards:
+
+                    waste_type = collection_date_card.find(
+                        "h3", {"class": "heading heading--sub heading--tiny"}
+                    )
+
+                    collection_date = collection_date_card.find(
+                        "span", {"class": "card__date"}
+                    )


⚠️ Potential issue | 🟠 Major

Add None checks for element lookups.

The find() calls on lines 90-92 and 94-96 return None if elements are not found. The subsequent code at lines 98-105 accesses .text without checking for None, which would raise AttributeError if the page structure doesn't match expectations.

Consider adding defensive checks:

for collection_date_card in collection_date_cards: waste_type = collection_date_card.find( "h3", {"class": "heading heading--sub heading--tiny"} ) collection_date = collection_date_card.find( "span", {"class": "card__date"} ) + + if not waste_type or not collection_date: + continue dt_collection_date = datetime.strptime( collection_date.text.strip().split(" ")[1], source_date_format )

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py around lines 83 to 96, the .find() calls for waste_type and collection_date can return None and later code accesses .text causing AttributeError; update the parsing to defensively check each result (e.g., if waste_type is None or collection_date is None) and skip that card (continue) or provide a safe default before accessing .text, or use conditional extraction (e.g., element.get_text(strip=True) only when element is truthy) so the loop safely handles missing elements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3efea and e2d12d7.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

🪛 Ruff (0.14.2)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

39-39: create_webdriver may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run Integration Tests (3.12, 1.8.4)
GitHub Check: Run Unit Tests (3.12, 1.8.4)

🔇 Additional comments (2)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2)

1-40: Strong foundation for bot detection bypass.

The addition of a custom user agent and Selenium exception imports sets up robust handling for the Boston Borough Council site. The user agent string helps bypass bot detection mechanisms that were likely causing the runtime failures mentioned in issue #1690.

42-46: Effective Cloudflare bypass implementation.

The wait condition correctly handles Cloudflare's "Just a moment" challenge page with an appropriate 30-second timeout. The subsequent 3-second sleep allows the page to fully stabilize before interacting with form elements.

fix: robbrad#1382 - Removed the need for Selenium

fix: robbrad#1229

Removing the requirement for Selenium and moving to UPRN

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

uk_bin_collection/tests/input.json (1)

1555-1560: Update Middlesbrough instructions to drop Selenium requirement

The new Middlesbrough scraper no longer uses Selenium, but the wiki_note still tells users a webdriver is required. That will mislead anyone setting this up. Please amend the note (and, by extension, the generated wiki entry) to reflect the new API-based flow.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2d12d7 and 87a3ad4.

📒 Files selected for processing (5)

uk_bin_collection/tests/input.json (7 hunks)
uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py (1 hunks)
wiki/Councils.md (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py (2)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (1)

check_paon (52-64)

uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py (2)

uk_bin_collection/uk_bin_collection/common.py (1)

check_uprn (67-78)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py (3)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (1)

check_uprn (67-78)

custom_components/uk_bin_collection/calendar.py (1)

event (54-63)

🪛 Ruff (0.14.2)

uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py

7-7: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

12-12: Unused method argument: page

(ARG002)

18-18: check_paon may be undefined, or defined from star imports

(F405)

30-30: Probable use of requests call without timeout

(S113)

39-39: f-string without any placeholders

Remove extraneous f prefix

(F541)

52-52: Probable use of requests call without timeout

(S113)

104-104: date_format may be undefined, or defined from star imports

(F405)

116-116: Consider moving this statement to an else block

(TRY300)

uk_bin_collection/uk_bin_collection/councils/DumfriesandGallowayCouncil.py

12-12: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

17-17: Unused method argument: page

(ARG002)

23-23: check_uprn may be undefined, or defined from star imports

(F405)

42-42: date_format may be undefined, or defined from star imports

(F405)

coderabbitai · 2025-10-30T09:26:29Z

+            for address in addresses:
+                if "place_id" in address:
+                    place_id = address["place_id"]
+                    break
+
+            if not place_id:
+                print(f"An error occurred: retrieving the address")
+                return


⚠️ Potential issue | 🔴 Critical

Handle missing place_id without crashing or returning the wrong type

When the address lookup comes back empty, place_id is never defined, so the if not place_id: branch raises UnboundLocalError and aborts the scraper. Even if it didn’t crash, the bare return would hand None back to the pipeline instead of the expected {"bins": []} structure. Please initialise place_id before the loop and return the empty data dict when nothing is found, e.g.:

+ place_id = None addresses = response.json() for address in addresses: if "place_id" in address: place_id = address["place_id"] break if not place_id: - print(f"An error occurred: retrieving the address") - return + print("An error occurred: retrieving the address") + return data

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.2)

39-39: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/MiddlesbroughCouncil.py around lines 33 to 40, the variable place_id may be referenced before assignment when the address lookup returns no entries and the current bare return yields None; initialize place_id (e.g. place_id = None) before the loop, keep scanning addresses for a place_id, and if none is found return the expected empty data structure (for example return {"bins": []}) instead of a bare return so the scraper does not raise UnboundLocalError and the pipeline receives the correct type.

robbrad · 2025-11-05T08:24:34Z

@m26dvd - happy with 2 failed, 10 passed ? Ok to merge?

also some good suggestions on from the rabbit above

…ow.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (2)
37-37: Consider removing debug print statement.

The print statement outputs collection times to stdout, which can clutter logs in production environments.

Apply this diff to remove it:
-            CollectTime = (collection["eventTime"]).split("T")[0]
-            print(CollectTime)
-
+            CollectTime = (collection["eventTime"]).split("T")[0]
28-31: Consider adding error handling for the HTTP request and JSON parsing.

The current implementation lacks error handling for network failures, non-200 responses, or invalid JSON, which could cause unhandled exceptions in Home Assistant integrations.

Consider adding status code validation and exception handling:
         headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}
         response = requests.get(URI, headers=headers, timeout=30)
+        response.raise_for_status()
 
         # Parse the JSON response
         bin_collection = response.json()
Alternatively, wrap the entire request block in a try-except for more granular error handling:
try:
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}
    response = requests.get(URI, headers=headers, timeout=30)
    response.raise_for_status()
    bin_collection = response.json()
except requests.exceptions.RequestException as e:
    # Handle network/HTTP errors
    raise Exception(f"Failed to retrieve bin data from Harrow: {e}")
except ValueError as e:
    # Handle JSON parsing errors
    raise Exception(f"Failed to parse bin data from Harrow: {e}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87a3ad4 and 1baa677.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run Integration Tests (3.12, 1.8.4)
GitHub Check: Run Unit Tests (3.12, 1.8.4)

🔇 Additional comments (1)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarrow.py (1)

27-28: LGTM! Previous review concern addressed.

The addition of the timeout parameter and User-Agent header resolves the hanging request issue flagged in the previous review.

…cil.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…ouncil.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)
81-81: UNFIXED: Hardcoded calendar year will break in April 2026.

This issue was previously flagged but remains unresolved. The hardcoded 2025-26 string will cause the scraper to fail when the council publishes the 2026-27 calendar (likely around April 2026).

Please implement the dynamic calendar year calculation suggested in the previous review, or add fallback logic to try multiple years if the URL returns 404.

Example implementation for dynamic year:
# Determine calendar year (Apr-Mar boundary)
current_year = datetime.now().year
current_month = datetime.now().month
if current_month >= 4:
    calendar_year = f"{current_year}-{str(current_year + 1)[2:]}"
else:
    calendar_year = f"{current_year - 1}-{str(current_year)[2:]}"
    
ics_url = f"https://www.chelmsford.gov.uk/media/t03c4mik/{day}-{letter}-{calendar_year}.ics"

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)
45-47: Consider removing unused exception variable.

The exception variable e is captured but never used. Since the exception is being silently suppressed (which may be acceptable for optional cookie banners), you can simplify to just except Exception:.

Apply this diff:
-            except Exception as e:
+            except Exception:
                 # Cookie banner not present or already accepted
                 pass

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1baa677 and 2715ff4.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

🪛 Ruff (0.14.3)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py

12-12: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

17-17: Unused method argument: page

(ARG002)

28-28: create_webdriver may be undefined, or defined from star imports

(F405)

45-47: try-except-pass detected, consider logging the exception

(S110)

45-45: Do not catch blind exception: Exception

(BLE001)

45-45: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

84-86: Abstract raise to an inner function

(TRY301)

84-86: Avoid specifying long messages outside the exception class

(TRY003)

103-103: date_format may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run Integration Tests (3.12, 1.8.4)

🔇 Additional comments (2)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (2)

21-28: Previous issue fixed: user agent now correctly passed as string.

The user agent is now properly defined as a string variable and passed correctly to create_webdriver, resolving the previous critical issue where a dict was incorrectly passed.

95-106: Excellent: Multi-bin split logic correctly implements PR objective.

The new logic that splits event.summary by comma and creates separate bin entries addresses issue #1688 perfectly. Each collection type now reports independently with its own entry, which aligns with the requirement to split bins into different sensors.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2)
70-74: Unused variable assignment.

The collection_date_cards variable is assigned but never used before being shadowed at line 84. The wait itself is valuable for ensuring page load, but the variable should use _ to indicate the result is intentionally unused.

Apply this diff:
-            collection_date_cards = WebDriverWait(driver, timeout).until(
+            _ = WebDriverWait(driver, timeout).until(
                 EC.presence_of_all_elements_located(
                     (By.XPATH, '//div[@class = "card card--waste card--blue-light"]')
                 )
             )
87-104: Add None checks for element lookups to prevent AttributeError.

The find() calls at lines 89-95 can return None if elements are not found. Lines 98 and 101 access .text without checking for None, which will raise AttributeError if the page structure doesn't match expectations. This is especially critical given that issue #1641 specifically mentions addressing "no-such-element" errors for this council.

Apply this diff to add defensive checks:
                 for collection_date_card in collection_date_cards:
 
                     waste_type = collection_date_card.find(
                         "h3", {"class": "heading heading--sub heading--tiny"}
                     )
 
                     collection_date = collection_date_card.find(
                         "span", {"class": "card__date"}
                     )
+
+                    if not waste_type or not collection_date:
+                        continue
 
                     dt_collection_date = datetime.strptime(
                         collection_date.text.strip().split(" ")[1], source_date_format
                     )
                     dict_data = {
                         "type": waste_type.text.strip().split("(")[0].strip(),
                         "collectionDate": dt_collection_date.strftime(date_format),
                     }
                     data["bins"].append(dict_data)

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (1)
7-7: Consider explicit imports instead of star import.

The star import at line 7 makes it unclear what symbols are being imported (e.g., datetime, date_format). While this pattern appears consistent across the codebase, explicit imports improve code clarity and help avoid potential naming conflicts.

Example of explicit imports:
from uk_bin_collection.uk_bin_collection.common import (
    check_paon,
    check_postcode,
    create_webdriver,
    date_format,
    datetime,
)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2715ff4 and 7ba0cc9.

📒 Files selected for processing (2)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

🪛 Ruff (0.14.3)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

39-39: create_webdriver may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py

98-98: datetime may be undefined, or defined from star imports

(F405)

103-103: date_format may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run Integration Tests (3.12, 1.8.4)

🔇 Additional comments (5)

uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py (3)

1-8: LGTM! Imports support the enhanced error handling.

The added time module and specific Selenium exceptions are all used appropriately in the cookie banner handling and page load stabilization logic below.

38-46: LGTM! Cloudflare bypass and user agent properly implemented.

The user agent string and Cloudflare protection bypass logic are well-structured. The 30-second timeout is appropriate for waiting out Cloudflare checks (distinct from the faster cookie banner timeout below), and the title condition correctly waits for a non-empty title that's not "Just a moment".

48-61: Past review comment addressed; implementation looks solid.

The timeout has been reduced to 10 seconds as suggested in the previous review, and the exception handling correctly covers all relevant failure modes (timeout, element not found, click intercepted). The graceful fallback with pass ensures the scraper continues even when the cookie banner is absent.

uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughCouncil.py (2)

1-1: LGTM: BeautifulSoup import added.

The new import is necessary for the updated parsing strategy and correctly replaces the previous Selenium-only approach.

76-80: LGTM: BeautifulSoup parsing setup.

The parsing setup correctly uses the driver's page source and targets the same elements that were waited for in the previous step.

m26dvd · 2025-11-05T09:17:46Z

@m26dvd - happy with 2 failed, 10 passed ? Ok to merge?

also some good suggestions on from the rabbit above

@robbrad Happy to merge now. I've accepted some of the suggestions from rabbit although it commits changes with incorrect lint messages

fix: robbrad#1698

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py (1)
25-40: Review header values for accuracy and maintainability.

The headers include Chrome version "141" (line 32), which doesn't exist yet. Additionally, the headers are very detailed with browser fingerprinting information that may be brittle and require frequent updates.

Consider:

Using a realistic, current Chrome version

Simplifying headers to only those required by the server

Testing whether all these headers are necessary for successful requests

If only basic headers are needed, consider simplifying to:
 headers = {
-    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
-    "accept-language": "en-GB,en;q=0.9",
-    "cache-control": "max-age=0",
-    "dnt": "1",
-    "priority": "u=0, i",
+    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
     "referer": "https://www.southampton.gov.uk",
-    "sec-ch-ua": '"Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"',
-    "sec-ch-ua-mobile": "?0",
-    "sec-ch-ua-platform": '"Windows"',
-    "sec-fetch-dest": "document",
-    "sec-fetch-mode": "navigate",
-    "sec-fetch-site": "same-origin",
-    "upgrade-insecure-requests": "1",
-    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
 }
Test to ensure the simplified headers still work with the server.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ba0cc9 and 75cbf30.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.14.3)

uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py

46-46: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run Integration Tests (3.12, 1.8.4)

coderabbitai · 2025-11-06T10:25:16Z

+        params = {
+            "UPRN": {user_uprn},
+        }


⚠️ Potential issue | 🔴 Critical

Critical: Fix set syntax in params dictionary.

Line 43 uses {user_uprn} which creates a Python set, not a string value. This will cause the UPRN to be sent as a string representation of a set (e.g., "{'12345'}") instead of the actual UPRN value, breaking the API request.

Apply this diff to fix the syntax:

params = { - "UPRN": {user_uprn}, + "UPRN": user_uprn, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

params = {

"UPRN": {user_uprn},

}

params = {

"UPRN": user_uprn,

}

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py around lines 42 to 44, the params dict uses {user_uprn} which creates a Python set rather than passing the UPRN value; replace the set literal with the actual value (e.g., use user_uprn or str(user_uprn) if the API expects a string) so the "UPRN" key holds the correct scalar/string value instead of a set.

coderabbitai · 2025-11-06T10:25:17Z

+        r = requests.get(
+            "https://www.southampton.gov.uk/whereilive/waste-calendar",
+            params=params,
+            headers=headers,
        )


⚠️ Potential issue | 🟠 Major

Add timeout to prevent indefinite hangs.

The requests.get call is missing a timeout parameter, which can cause the request to hang indefinitely if the server becomes unresponsive. This is especially problematic in automated systems like Home Assistant.

Apply this diff to add a timeout:

r = requests.get( "https://www.southampton.gov.uk/whereilive/waste-calendar", params=params, headers=headers, + timeout=30, )

Based on static analysis hints.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

r = requests.get(

"https://www.southampton.gov.uk/whereilive/waste-calendar",

params=params,

headers=headers,

)

r = requests.get(

"https://www.southampton.gov.uk/whereilive/waste-calendar",

params=params,

headers=headers,

timeout=30,

)

🧰 Tools

🪛 Ruff (0.14.3)

46-46: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/SouthamptonCityCouncil.py around lines 46 to 50, the requests.get call lacks a timeout which can cause indefinite hangs; add a timeout parameter (for example timeout=10) to the requests.get(...) call to bound the request duration and prevent blocking, and update any similar requests in this file to include the same timeout.

fix: Rochdale Council

1258d3d

fix: robbrad#1675 fix: robbrad#1259

m26dvd changed the title ~~fix: Council Fix Pack - October 2025~~ fix: Council Fix Pack - November 2025 Oct 21, 2025

m26dvd and others added 12 commits October 21, 2025 22:49

fix: Norwich City Council

bda23a7

fix: robbrad#1653

fix: Wokingham Borough Council

07a3cbf

fix: robbrad#1641

fix: London Borough of Harrow

1c6fa63

fix: robbrad#1621

fix: Hart District Council

7d86ba7

fix: robbrad#1625

fix: Brighton & Hove

0c7cdd6

fix: robbrad#1685 - New URL

docs: Update Councils.md from input.json

6d82c7c

fix: Brighton & Hove

ced5793

fix: robbrad#1685 - New URL

Merge branch 'master' of https://github.com/m26dvd/UKBinCollectionData

c408ecc

fix: London Borough of Hounslow

43e11ae

fix: robbrad#1683

fix: Derby City Council

ae00c0d

fix: robbrad#1676

fix: Chelmsford City Council

404596f

fix: robbrad#1688 - BREAKING CHANGE

fix: Boston Borough Council

2d3efea

fix: robbrad#1690

m26dvd marked this pull request as ready for review October 29, 2025 09:39

coderabbitai Bot reviewed Oct 29, 2025

View reviewed changes

Update BostonBoroughCouncil.py

e2d12d7

m26dvd marked this pull request as draft October 29, 2025 10:04

coderabbitai Bot reviewed Oct 29, 2025

View reviewed changes

Comment thread uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py

m26dvd added 2 commits October 29, 2025 11:21

fix: Middlesborough Council

da7bf56

fix: robbrad#1382 - Removed the need for Selenium

feat: Dumfries and Galloway Council

a89320b

fix: robbrad#1229

m26dvd force-pushed the master branch from d2c4f5b to a89320b Compare October 29, 2025 11:43

fix: Newport City Council

87a3ad4

Removing the requirement for Selenium and moving to UPRN

Mistrical mentioned this pull request Oct 29, 2025

Cannot access local variable bin type #1625

Closed

4 tasks

m26dvd marked this pull request as ready for review October 30, 2025 09:21

coderabbitai Bot reviewed Oct 30, 2025

View reviewed changes

Update uk_bin_collection/uk_bin_collection/councils/LondonBoroughHarr…

1baa677

…ow.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

m26dvd added 2 commits November 5, 2025 09:08

Update ChelmsfordCityCouncil.py

806d2ed

Merge branch 'master' of https://github.com/m26dvd/UKBinCollectionData

2715ff4

coderabbitai Bot reviewed Nov 5, 2025

View reviewed changes

m26dvd and others added 2 commits November 5, 2025 09:11

Update uk_bin_collection/uk_bin_collection/councils/BostonBoroughCoun…

b7de4ca

…cil.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update uk_bin_collection/uk_bin_collection/councils/WokinghamBoroughC…

7ba0cc9

…ouncil.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai Bot reviewed Nov 5, 2025

View reviewed changes

fix: Southampton City Council

75cbf30

fix: robbrad#1698

coderabbitai Bot reviewed Nov 6, 2025

View reviewed changes

robbrad changed the base branch from master to November_release November 8, 2025 09:02

robbrad merged commit d6ecb0b into robbrad:November_release Nov 8, 2025
12 of 14 checks passed

coderabbitai Bot mentioned this pull request Nov 17, 2025

fix: fix UPRN param encoding for SouthamptonCityCouncil #1722

Merged

coderabbitai Bot mentioned this pull request Nov 24, 2025

fix: Council Fix Pack - December 2025 #1705

Merged

coderabbitai Bot mentioned this pull request Jan 8, 2026

fix: Council Fix Pack - January 2026 #1772

Closed

coderabbitai Bot mentioned this pull request Apr 11, 2026

fix: add realistic user_agent to all Selenium scrapers missing one #1947

Closed

coderabbitai Bot mentioned this pull request May 1, 2026

May 2026 Release #1992

Merged

Conversation

m26dvd commented Oct 21, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov Bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

robbrad commented Nov 5, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

m26dvd commented Nov 5, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

m26dvd commented Oct 21, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 21, 2025 •

edited

Loading

codecov Bot commented Oct 21, 2025 •

edited

Loading