Skip to content

May 2026 Release#1992

Merged
robbrad merged 144 commits into
masterfrom
may-2026-release
May 1, 2026
Merged

May 2026 Release#1992
robbrad merged 144 commits into
masterfrom
may-2026-release

Conversation

@robbrad
Copy link
Copy Markdown
Owner

@robbrad robbrad commented May 1, 2026

May 2026 Release — 60 PRs Consolidated

All open PRs merged into a single release branch with 7 merge conflicts resolved.

Council Fixes (53 PRs)

PR Council(s) Change
#1899 Babergh, MidSuffolk, Rother, Wirral Date parsing, null handling, input.json
#1909 EastSuffolk Fix collection parsing
#1916 Southampton Include food waste collections
#1918 Blaby Add User-Agent header (403 fix)
#1919 Brent Add User-Agent header
#1920 Swale Add user_agent to webdriver
#1921 SouthHams Init fcc_session_token before try
#1922 Oxford Handle multiple editor divs
#1923 ReigateAndBanstead Select populated span
#1924 Shropshire Use full link text for bin type
#1925 Salford Use lxml parser for malformed HTML
#1926 Rushmoor Handle raw JSON response
#1927 Plymouth Fix garden collection
#1929 Crawley Add XML response parsing fallback
#1930 Newham Updated card selectors, date format fallback
#1932 Torridge Parse new relative-date SOAP format
#1933 Peterborough user_agent and container selector
#1934 Horsham user_agent and postcode ID selector
#1935 API server Fix double-encoded JSON string
#1936 NeathPortTalbot user_agent and JS click for overlay
#1937 Bexley Replace Selenium with requests
#1938 Stirling Replace Selenium with Recollect API
#1939 WindsorAndMaidenhead Replace Selenium with requests
#1940 EppingForest Replace Selenium with ArcGIS REST API
#1941 Moray Rewrite for annual calendar
#1942 EastLothian Rewrite for Drupal migration
#1943 Barnet Rewrite for Jadu migration
#1944 NorthNorfolk Rewrite for X-Forms
#1947 63 councils Add realistic user_agent to all Selenium scrapers
#1948 Richmond Rewrite for My Richmond page
#1949 NorthYorkshire Find HTML data dynamically in AJAX
#1950 GreatYarmouth Use current year instead of hardcoded 2025
#1952 Wyre Skip non-bin divs, harden date parsing
#1953 Chichester Rewrite for AchieveForms V7
#1954 StocktonOnTees Update form IDs to V2, dismiss cookie banner
#1955 WestSuffolk Send User-Agent header
#1956 Midlothian Rewrite for MyMidlothian Granicus
#1957 SouthStaffordshire Use objectId query param
#1958 Chorley, Islington, Medway, Hastings, Conwy Reliability sweep
#1971 Tewkesbury Handle new API response format
#1972 Boston Drive chosen.js widget
#1973 Chelmsford Dismiss email-signup modal
#1974 ArmaghBanbridgeCraigavon Full browser User-Agent for WAF
#1975 NorthHertfordshire Migrate to Netcall Liberty Create
#1978 WestSuffolk User-Agent and renamed h4 heading
#1979 EnvironmentFirst, Eastbourne, Lewes Handle restructured page
#1980 MidSussex Migrate to Whitespace portal
#1981 Rotherham Rewrite for Imactivate API
#1983 SouthOxfordshire Session establishment and bin type cleanup
#1984 Chelmsford Replace Selenium with requests
#1985 MidSuffolk Replace sleep with WebDriverWait
#1987 NorthKesteven Make own HTTP request
#1988 Fareham Switch to calendar page
#1989 EastRiding Replace Selenium with JSON API
#1990 Edinburgh Accept real postcode and house number
#1991 Thurrock Accept real postcode and house number

Dependencies (4 PRs)

PR Change
#1963 Bump pillow 12.1.1 to 12.2.0
#1967 Bump mako 1.3.10 to 1.3.11
#1976 Bump python-dotenv 1.1.0 to 1.2.2
#1977 Bump lxml 5.3.2 to 6.1.0

Merge Conflicts Resolved (7)

Issues Resolved (20)

This release fixes the following open issues:

Closes #1555, #1616, #1725, #1768, #1804, #1889, #1890, #1893, #1902, #1904, #1906, #1908, #1910, #1911, #1914, #1931, #1951, #1962, #1970, #1982

Code Review Notes

  • No malicious code found — all URLs point to council .gov.uk sites or standard libraries
  • No eval/exec/subprocess/os.system calls added
  • No hardcoded credentials or API keys
  • Fixed indentation bug in server.py from PR Fix: API server returning double-encoded JSON string #1935 merge (try block was at column 0)
  • A few verify=False SSL calls exist but are consistent with existing codebase patterns for councils with bad certs

robbrad and others added 30 commits March 28, 2026 10:49
…tCouncil, WirralCouncil - date parsing, null handling, input.json config
Signed-off-by: Tim Collins <tim@thecollins.team>
Fix double-encoded JSON output in API server
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (19)
uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)

52-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused headers variable.

The headers dictionary is defined but never used anywhere in the function. This is dead code.

🧹 Proposed fix
-            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py`
at line 52, The file defines an unused variable headers = {"User-Agent":
"Mozilla/5.0 (Windows NT 6.1; Win64; x64)"} in the BrightonandHoveCityCouncil
code; remove that dead assignment (search for the symbol headers in the
BrightonandHoveCityCouncil module or the surrounding function where it appears)
so the unused variable is deleted, and run tests/lint to ensure no other
references remain; if the User-Agent was intended for HTTP requests instead,
replace the direct requests call to pass headers rather than keeping the unused
variable.
uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py (1)

27-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent ValueError suppression hides API format regressions.

Both the legacy and new-format branches swallow ValueError with continue, silently dropping bins whose dates can't be parsed. Per the project's established convention, parsers should raise explicitly so format changes are caught immediately rather than producing a quietly empty result.

💡 Proposed fix — re-raise instead of silently skipping
-                    except ValueError:
-                        continue
+                    except ValueError as e:
+                        raise ValueError(
+                            f"Unexpected date format in legacy response for '{bin_type}': {date_str!r}"
+                        ) from e
-                except ValueError:
-                    continue
+                except ValueError as e:
+                    raise ValueError(
+                        f"Unexpected nextCollectionDate format for '{key}': {date_str!r}"
+                    ) from e

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors … ensures format changes are detected early."

Also applies to: 50-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py`
around lines 27 - 34, The ValueError from datetime.strptime in
TewkesburyBoroughCouncil.py is being silently swallowed (the try/except uses
continue), so change the handlers in both parsing branches (the new-format block
around date_str -> datetime.strptime and the legacy-format block around lines
~50-58) to re-raise the exception (or remove the except) instead of continuing;
ensure the error includes context (e.g., the offending date_str and bin_type) so
unexpected API format changes surface immediately rather than silently dropping
bins.
uk_bin_collection/uk_bin_collection/councils/NorthDevonCountyCouncil.py (1)

130-132: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unchecked find() return values will raise AttributeError on missing spans, aborting the entire parse.

All three find(...) calls are outside the try/except block (which starts at line 135), so any missing span propagates to the outer handler and raises, producing an empty result. Council sites frequently change their HTML structure.

🛡️ Proposed fix — guard each span lookup
-                        bin_type = li.find("span", class_="wasteType").text.strip()
-                        day = li.find("span", class_="wasteDay").text.strip()
-                        weekday = li.find("span", class_="wasteName").text.strip()
+                        waste_type_span = li.find("span", class_="wasteType")
+                        waste_day_span = li.find("span", class_="wasteDay")
+                        if not waste_type_span or not waste_day_span:
+                            continue
+                        bin_type = waste_type_span.text.strip()
+                        day = waste_day_span.text.strip()
+                        weekday_span = li.find("span", class_="wasteName")
+                        weekday = weekday_span.text.strip() if weekday_span else ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/NorthDevonCountyCouncil.py`
around lines 130 - 132, The three span lookups that set bin_type, day and
weekday in NorthDevonCountyCouncil.py currently call .text on the result of
li.find(...) and can raise AttributeError if a span is missing; wrap each lookup
in a safe guard (e.g., assign the result of li.find("span", class_="wasteType")
to a temp, check for None, then use .get_text(strip=True) or a default like ""),
or move them inside the existing try/except block and check for None before
accessing .text so bin_type, day and weekday are always set to a safe default
rather than raising.
uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py (1)

84-87: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cells[0] and cells[3] are accessed without a bounds check — rows with fewer <td> elements raise IndexError.

cells[0] on line 85 is unconditional. The cells[3] in the if condition on line 87 looks like a guard, but Python list indexing raises IndexError when the index is out of range rather than returning a falsy value, so it provides no protection for short rows (e.g. a header <tr> containing only <th> elements, or an empty row).

🛡️ Proposed fix
                     for row in bins_table.find_all("tr"):
                         cells = row.find_all("td")
-                        bin_type = cells[0].get_text(strip=True)
-                        # Skip header row
-                        if bin_type and cells[3] and bin_type != "Bin Type":
+                        if len(cells) < 4:
+                            continue
+                        bin_type = cells[0].get_text(strip=True)
+                        # Skip header row
+                        if bin_type and bin_type != "Bin Type":
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py`
around lines 84 - 87, The code reads cells = row.find_all("td") and then
accesses cells[0] and cells[3] without ensuring the list has enough elements,
which can raise IndexError; update the loop in BroxtoweBoroughCouncil (the block
using row, cells and bin_type) to first check len(cells) >= 4 before accessing
cells[0] or cells[3] and skip the row if it’s shorter (or treat header rows
separately), and adjust the bin_type extraction/condition to only run when the
length check passes so short/empty/header rows are safely ignored.
uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py (1)

20-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Backwards-compatible url fallback is blocked by the upfront UPRN validation.

If a caller only passes url, check_uprn(user_uprn) raises before Lines 24-26 execute, so the fallback path is unreachable in practice.

Suggested fix
         try:
             user_uprn = kwargs.get("uprn")
-            check_uprn(user_uprn)
-            url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
-            if not user_uprn:
-                # Backwards compatibility for users who stored a URL.
-                url = kwargs.get("url")
+            if user_uprn:
+                check_uprn(user_uprn)
+                url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
+            else:
+                # Backwards compatibility for users who stored a URL.
+                url = kwargs.get("url")
+                if not url:
+                    raise ValueError("Either 'uprn' or 'url' must be provided")
         except Exception as e:
             raise ValueError(f"Error getting identifier: {str(e)}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py` around
lines 20 - 28, The current logic calls check_uprn(user_uprn) before falling back
to kwargs.get("url"), so callers that provide only url never reach the fallback;
fix by first assigning user_uprn = kwargs.get("uprn") and url =
kwargs.get("url") (or assign url after fallback) and only call
check_uprn(user_uprn) if user_uprn is truthy, otherwise keep the provided url;
update the block around check_uprn, user_uprn and url to perform conditional
validation so the backwards-compatible url path is reachable.
uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py (1)

20-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Backwards-compatible url fallback never actually runs.

check_uprn(user_uprn) executes before Lines 24-26, so callers that only provide url still fail validation and never reach the fallback branch. That breaks the compatibility path this change is trying to preserve.

Suggested fix
         try:
             user_uprn = kwargs.get("uprn")
-            check_uprn(user_uprn)
-            url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
-            if not user_uprn:
-                # Backwards compatibility for users who stored a URL.
-                url = kwargs.get("url")
+            if user_uprn:
+                check_uprn(user_uprn)
+                url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
+            else:
+                # Backwards compatibility for users who stored a URL.
+                url = kwargs.get("url")
+                if not url:
+                    raise ValueError("Either 'uprn' or 'url' must be provided")
         except Exception as e:
             raise ValueError(f"Error getting identifier: {str(e)}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py`
around lines 20 - 28, The fallback URL branch never runs because
check_uprn(user_uprn) is called before testing for a missing uprn; modify the
logic in the handler that uses user_uprn, check_uprn and url so you first read
user_uprn = kwargs.get("uprn") and if user_uprn is truthy call
check_uprn(user_uprn) and build url =
f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}", otherwise set url
= kwargs.get("url") for backward compatibility; keep the existing exception
handling (raising ValueError with the error string) unchanged and reference the
symbols user_uprn, check_uprn and url when making the change.
uk_bin_collection/uk_bin_collection/councils/SwaleBoroughCouncil.py (1)

99-106: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace the "No future date found" sentinel with an explicit None-guard.

If future_collection_date_tag is None but future_bins is non-empty, parse_collection_date("No future date found") produces a confusing ValueError: time data 'No future date found' does not match format '%A, %d %B' rather than a clear signal that the date element itself was absent. Per the project's explicit-failure convention, the cause should be surfaced directly.

🛡️ Proposed fix
-future_collection_date = (
-    future_collection_date_tag.text.split("starting from")[-1].strip()
-    if future_collection_date_tag
-    else "No future date found"
-)
+future_collection_date = (
+    future_collection_date_tag.text.split("starting from")[-1].strip()
+    if future_collection_date_tag
+    else None
+)

Then inside the loop:

 for bin in future_bins:
+    if future_collection_date is None:
+        raise ValueError(
+            "Future bins were found but no 'starting from' date element is present on the page"
+        )
     collection_date = parse_collection_date(future_collection_date)

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors; use clear exception types and document error causes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/SwaleBoroughCouncil.py` around
lines 99 - 106, The code currently assigns "No future date found" to
future_collection_date which causes parse_collection_date to raise an unclear
ValueError; change the assignment so future_collection_date is None when
future_collection_date_tag is missing (i.e., future_collection_date =
future_collection_date_tag.text... if future_collection_date_tag else None) and
then inside the loop that iterates future_bins, check if future_collection_date
is None and, if so, raise a clear exception (e.g., ValueError with a message
referencing future_collection_date_tag/future_bins) before calling
parse_collection_date so the absence of the element is surfaced explicitly.
uk_bin_collection/uk_bin_collection/councils/Hillingdon.py (2)

266-268: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Silent except Exception: continue masks page-format changes

If the council changes its HTML structure, the selector/parse errors here are printed and swallowed, returning a silently incomplete data["bins"] with no signal to operators. The same pattern exists in get_bank_holiday_changes (Line 88–90 and Line 92–93).

Prefer re-raising (or at minimum propagating a recognisable exception) so that format regressions are caught immediately rather than producing quietly wrong data. Based on learnings, this project explicitly favours explicit failures over silent defaults or swallowed errors.

♻️ Suggested refactor
-                except Exception as e:
-                    print(f"Error processing item: {e}")
-                    continue
+                except Exception as e:
+                    raise ValueError(
+                        f"Failed to parse bin row; page format may have changed: {e}"
+                    ) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` around lines 266
- 268, The except block in the item-processing loop (and the similar handlers in
get_bank_holiday_changes) currently swallows all exceptions with print(...) and
continue, hiding HTML/selector regressions; change these handlers to re-raise
the caught exception (or raise a custom descriptive exception) instead of
continuing so failures surface—locate the try/except in the function that
populates data["bins"] (and the try/except blocks inside
get_bank_holiday_changes) and replace the except Exception as e: print(...);
continue with either: raise or raise RuntimeError(f"Hillingdon parser error
processing item: {e}") to propagate a clear error to callers/operators.

235-268: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Duplicate bin entries: hardcoded fallback and scraped rows are both unconditionally appended

Lines 235–242 always append three hardcoded entries (General Waste, Recycling, Food Waste) before lines 244–268 parse and also append the actual scraped div.bin--row entries. When the HTML scrape succeeds, data["bins"] ends up with each bin type twice — once with the day-of-week–computed date and once with the date parsed from the page — which will produce incorrect output for any downstream consumer.

Either remove the hardcoded block (let the scraper stand alone) or guard it as a genuine fallback that fires only when the div.bin--row parse yields nothing:

🐛 Proposed fix
-            # Add collection dates for each bin type
-            bin_types = ["General Waste", "Recycling", "Food Waste"]
-            for bin_type in bin_types:
-                data["bins"].append(
-                    {
-                        "type": bin_type,
-                        "collectionDate": next_collection.strftime("%d/%m/%Y"),
-                    }
-                )
-
             # Process collection details
             bin_rows = soup.select("div.bin--row:not(:first-child)")
             for row in bin_rows:
                 try:
                     bin_type = row.select_one("div.col-md-3").text.strip()
                     collection_dates_div = row.select("div.col-md-3")[1]
                     next_collection_text = "".join(
                         collection_dates_div.find_all(text=True, recursive=False)
                     ).strip()
                     cleaned_date_text = remove_ordinal_indicator_from_date_string(
                         next_collection_text
                     )
                     parsed_date = parse(cleaned_date_text, fuzzy=True)
                     bin_date = parsed_date.strftime("%d/%m/%Y")

                     if bin_type and bin_date:
                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": bin_date,
                             }
                         )
                 except Exception as e:
                     print(f"Error processing item: {e}")
                     continue

+            # Fallback: if no bin rows were scraped, use day-of-week computed date
+            if not data["bins"]:
+                bin_types = ["General Waste", "Recycling", "Food Waste"]
+                for bin_type in bin_types:
+                    data["bins"].append(
+                        {
+                            "type": bin_type,
+                            "collectionDate": next_collection.strftime("%d/%m/%Y"),
+                        }
+                    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` around lines 235
- 268, The code currently always appends three hardcoded entries via bin_types
into data["bins"] and then also appends parsed rows from bin_rows, causing
duplicates; change the logic so the hardcoded fallback (the loop that iterates
bin_types and appends to data["bins"]) only runs when no parsed rows are found
(i.e., when bin_rows is empty or when parsing yields no valid entries), or
remove that fallback entirely—update the block around bin_types and the
subsequent bin_rows parsing so that either parsed results are used
preferentially (append hardcoded entries only if parsed results list is empty)
and ensure references like data["bins"], bin_types, and bin_rows remain
consistent.
uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py (2)

142-142: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent fallback may mask JS-mapping format changes.

If the JS parsing above yields no entries, every bin type silently resolves to its raw table-cell text (svc). Per the project's convention of preferring explicit failures, consider at least logging a warning when svc is not found in a non-empty map, so format changes are surfaced quickly.

-                    "type": bin_type_map.get(svc, svc),
+                    "type": bin_type_map.get(svc) if not bin_type_map or svc in bin_type_map else (
+                        # bin_type_map is populated but this key is absent — surface it
+                        (_ for _ in ()).throw(ValueError(f"Unknown bin service key: {svc!r}"))
+                    ) if bin_type_map else svc,

A cleaner approach is a small helper or an explicit check before the loop:

if bin_type_map:
    mapped_type = bin_type_map.get(svc)
    if mapped_type is None:
        raise ValueError(f"Bin service key {svc!r} not found in JS type map")
else:
    mapped_type = svc   # JS block absent entirely — use raw name

Based on learnings: the project prefers explicit failures (raising exceptions on unexpected formats) over silent defaults, to ensure format changes are detected early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py` at line 142,
Replace the silent fallback in the ChorleyCouncil mapping (currently using
"type": bin_type_map.get(svc, svc)) with an explicit check: if bin_type_map is
non-empty try to get mapped_type = bin_type_map.get(svc) and raise a ValueError
(or log a warning and raise) when mapped_type is None so unexpected JS-format
changes fail fast; if bin_type_map is empty (JS block absent) set mapped_type =
svc and use that for the "type" field. Ensure you update the code paths in the
loop that reference svc to use mapped_type and include the unique symbols
bin_type_map and svc in your changes.

129-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

table.find("tbody") can return None, causing AttributeError.

Python's html.parser does not always auto-insert <tbody>; if the server's HTML omits it, the chained .find_all("tr") call will crash.

🛡️ Proposed fix
-        for row in table.find("tbody").find_all("tr"):
+        tbody = table.find("tbody") or table
+        for row in tbody.find_all("tr"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py` around lines
129 - 130, The loop in ChorleyCouncil.py uses table.find("tbody").find_all("tr")
which can raise AttributeError if tbody is missing; update the code in the
function/method that iterates over table (look for the for row in
table.find("tbody").find_all("tr") block) to first assign tbody =
table.find("tbody") and then set rows = tbody.find_all("tr") if tbody is not
None else table.find_all("tr"), and iterate over rows so the parser works
whether or not a <tbody> element is present.
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale copy-paste comment references the wrong council.

Line 1 reads # This script pulls (in one hit) the data from Bromley Council Bins Data but this file is for Calderdale.

✏️ Proposed fix
-# This script pulls (in one hit) the data from Bromley Council Bins Data
+# This script pulls the bin collection data for Calderdale Council
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Update the incorrect top-line comment that mentions "Bromley Council" to
reference Calderdale instead; locate the module/comment at the top of
CalderdaleCouncil.py (the CalderdaleCouncil class/module) and replace the stale
copy-paste comment so it accurately describes that the script pulls Calderdale
council bin data.
uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py (2)

32-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dead/broken else branch produces "None, {postcode}" as the search string.

When house_number is None (absent paon), the condition is falsy and the else branch executes f"{house_number}, {postcode}" where house_number is still None, yielding "None, EC1A 1BB". The comment above even says "Use house_number as full address since it contains the complete address" — so the else clause is never intended to be reached, or it should fall back to just the postcode.

🐛 Proposed fix
-# Use house_number as full address since it contains the complete address
-full_address = house_number if house_number else f"{house_number}, {postcode}"
+if not house_number:
+    raise ValueError("No house number/address (paon) provided.")
+full_address = house_number
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`
around lines 32 - 33, The current full_address assignment in
CotswoldDistrictCouncil.py uses f"{house_number}, {postcode}" when house_number
is falsy which produces "None, {postcode}"; change the logic in the full_address
assignment so that when house_number is present use house_number, otherwise fall
back to just postcode (i.e. set full_address based on house_number if truthy,
else postcode) to avoid embedding the literal "None" into the search string.

223-226: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent empty-data return violates the explicit-failure convention used across this codebase.

When the page yields no results, the scraper returns {"bins": []} silently. Downstream consumers will see an empty collection and have no way to tell whether there genuinely are no upcoming collections or whether the page structure changed. A ValueError/RuntimeError here would surface scraper breakage immediately. Based on learnings, this project prefers explicit failures over silent defaults.

🐛 Proposed fix
-# If no data found, add a debug entry
 if not data["bins"]:
-    print("No bin collection data found. Page source:")
-    print(driver.page_source[:1000])  # Print first 1000 chars for debugging
+    raise ValueError(
+        "No bin collection data found. The page structure may have changed."
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`
around lines 223 - 226, The current silent return when no bins are found
(checking data["bins"]) should be replaced with an explicit error: in the
CotswoldDistrictCouncil scraper (the block that inspects data["bins"] and uses
driver.page_source), raise a RuntimeError or ValueError instead of printing and
returning {"bins": []}; include a short context message and a truncated
page-source snippet (e.g., first 1000 chars) in the exception to aid debugging,
and remove the print statements so the scraper surfaces a clear failure when the
page yields no results.
uk_bin_collection/uk_bin_collection/councils/HyndburnBoroughCouncil.py (1)

132-132: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove debug print(bin_data) from the production code path.

This emits the full parsed bin collection to stdout on every successful invocation.

🐛 Proposed fix
-print(bin_data)
-
 return bin_data
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/HyndburnBoroughCouncil.py` at
line 132, Remove the debug print that emits full parsed bin data to stdout by
deleting the print(bin_data) call in HyndburnBoroughCouncil.py; instead, if
diagnostic information is required use the module's logger (e.g.,
logger.debug(...) or processLogger.debug(...)) inside the same function where
print(bin_data) appears so production runs no longer print sensitive/verbose
data to stdout.
uk_bin_collection/uk_bin_collection/councils/MidUlsterDistrictCouncil.py (1)

101-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Year-rollover bug: collection dates in January will be assigned the wrong year in December.

current_year is appended unconditionally. If a user runs this in late November/December and the next collection falls in January, the resulting date will be dated to the current year, not the following one. Compare with HyndburnBoroughCouncil.py in the same PR which correctly adds 1 to the year when the parsed date is already in the past.

🐛 Proposed fix
-current_year = datetime.now().year
-full_date = f"{date_text} {current_year}"  # e.g., "18 Apr 2025"
-collection_date = datetime.strptime(full_date, "%d %b %Y").strftime(
-    date_format
-)
+current_year = datetime.now().year
+full_date = f"{date_text} {current_year}"
+parsed = datetime.strptime(full_date, "%d %b %Y")
+if parsed.date() < datetime.now().date():
+    parsed = parsed.replace(year=current_year + 1)
+collection_date = parsed.strftime(date_format)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/MidUlsterDistrictCouncil.py`
around lines 101 - 110, The date parsing logic in MidUlsterDistrictCouncil.py
unconditionally appends current_year to date_text (variables current_year,
full_date, collection_date), causing January dates parsed in December to be
assigned the wrong year; change the flow to build a tentative datetime from
date_text + current_year, then if that tentative date is before today (use
datetime.today() or datetime.now().date()), add 1 to current_year before
re-parsing so the final collection_date uses the next year; update the try block
around current_year/full_date/collection_date to compute tentative_date and bump
the year when needed, mirroring the approach used in HyndburnBoroughCouncil.py.
uk_bin_collection/uk_bin_collection/councils/NorthEastDerbyshireDistrictCouncil.py (1)

76-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove debug print statements from the hot path.

Lines 76, 100, 117, and 118 emit parsing progress and the full parsed result to stdout on every successful call. These should be removed (or at minimum guarded behind a logging.debug call) before merging.

🐛 Proposed fix
-print("Parsing HTML content...")
-print(f"Found collection: {collection_date} - {bin_types}")
-print(f"Found {len(data['bins'])} collections")
-print(f"Final data: {data}")

Also applies to: 100-100, 117-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/NorthEastDerbyshireDistrictCouncil.py`
at line 76, Remove the debug print statements that write parsing
progress/results to stdout in the NorthEastDerbyshireDistrictCouncil class;
replace them with logging.debug calls (or remove entirely) so the hot path
doesn’t print to stdout. Locate the prints (e.g., "Parsing HTML content..." and
the subsequent prints of the parsed result) inside the HTML parsing method(s) of
NorthEastDerbyshireDistrictCouncil, import Python’s logging if not present, and
change each print(...) to logging.debug(...) with a short descriptive message
and the variable (if needed) or drop the output entirely.
uk_bin_collection/uk_bin_collection/councils/PeterboroughCityCouncil.py (1)

108-154: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when the new panel selector stops matching.

If waste-service-grid matches nothing, or one matched panel no longer contains the expected heading/date rows, this currently falls through as a successful scrape with partial or empty data. Add an explicit selector check and let malformed panels raise instead of print+continue.

🔎 Suggested tightening
             # Each bin section is within a waste-service-grid div
             collection_panels = soup.find_all("div", class_="waste-service-grid")
+            if not collection_panels:
+                raise ValueError("Expected waste-service-grid collection panels")

             for panel in collection_panels:
-                try:
-                    # Bin type
-                    bin_type_tag = panel.find("h3", class_="waste-service-name")
-                    if not bin_type_tag:
-                        continue
-                    bin_type = bin_type_tag.get_text(strip=True)
+                # Bin type
+                bin_type_tag = panel.find("h3", class_="waste-service-name")
+                if not bin_type_tag:
+                    raise ValueError("Expected waste-service-name heading in collection panel")
+                bin_type = bin_type_tag.get_text(strip=True)

-                    # Get 'Next collection' date
-                    rows = panel.find_all("div", class_="govuk-summary-list__row")
-                    next_collection = None
-                    for row in rows:
-                        key = row.find("dt", class_="govuk-summary-list__key")
-                        value = row.find("dd", class_="govuk-summary-list__value")
-                        if key and value and "Next collection" in key.get_text():
-                            raw_date = " ".join(value.get_text().split())
+                # Get 'Next collection' date
+                rows = panel.find_all("div", class_="govuk-summary-list__row")
+                next_collection = None
+                for row in rows:
+                    key = row.find("dt", class_="govuk-summary-list__key")
+                    value = row.find("dd", class_="govuk-summary-list__value")
+                    if key and value and "Next collection" in key.get_text():
+                        raw_date = " ".join(value.get_text().split())

-                            cleaned_date = re.sub(
-                                r"(\d{1,2})(st|nd|rd|th)", r"\1", raw_date
-                            )
-                            next_collection = cleaned_date
-                            break
+                        cleaned_date = re.sub(
+                            r"(\d{1,2})(st|nd|rd|th)", r"\1", raw_date
+                        )
+                        next_collection = cleaned_date
+                        break

-                    if not next_collection:
-                        continue
+                if not next_collection:
+                    raise ValueError(f"Expected 'Next collection' row for {bin_type}")

-                    print(f"Found next collection for {bin_type}: '{next_collection}'")
+                print(f"Found next collection for {bin_type}: '{next_collection}'")

-                    parsed_date = datetime.strptime(next_collection, input_date_format)
-                    formatted_date = parsed_date.strftime(output_date_format)
+                parsed_date = datetime.strptime(next_collection, input_date_format)
+                formatted_date = parsed_date.strftime(output_date_format)

-                    data["bins"].append(
-                        {
-                            "type": bin_type,
-                            "collectionDate": formatted_date,
-                        }
-                    )
-
-                except Exception as e:
-                    print(
-                        f"Error processing panel for bin '{bin_type if 'bin_type' in locals() else 'unknown'}': {e}"
-                    )
+                data["bins"].append(
+                    {
+                        "type": bin_type,
+                        "collectionDate": formatted_date,
+                    }
+                )

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures over silent parsing fallbacks so council format changes are detected early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/PeterboroughCityCouncil.py`
around lines 108 - 154, The parser currently silently skips when
collection_panels or individual panels lack expected elements; change this to
fail fast: after computing collection_panels in the PeterboroughCityCouncil
scraper, if collection_panels is empty raise a ValueError (include context like
"no waste-service-grid panels found"); inside the for panel loop, if
bin_type_tag is missing or next_collection remains None, raise a ValueError
indicating the malformed panel (mention the panel or bin_type if available)
instead of continue/print; replace the broad try/except that swallows errors
(the except block around the panel processing) with either no catch or re-raise
the exception after logging so malformed HTML surfaces quickly (references:
collection_panels, bin_type_tag, rows, next_collection and the except block).
uk_bin_collection/tests/input.json (1)

853-858: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh these wiki notes to the new real-address inputs.

The examples now use actual house_number/postcode values, but the notes still tell users to pass weekday/week or day/round placeholders. That will steer CLI users to the old calling convention.

Also applies to: 2517-2523

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/tests/input.json` around lines 853 - 858, The wiki_note
text still instructs users to pass DAY/WEEK placeholders even though this record
now uses real address fields (house_number and postcode); update the "wiki_note"
value to describe the new real-address input format (e.g., state that
house_number should contain the house number and postcode the postcode, and
remove any guidance about passing weekday/week placeholders), and make the same
change for the other matching records that currently contain the placeholder
guidance (the entries with the same "wiki_note" pattern).
🟡 Minor comments (12)
uk_bin_collection/uk_bin_collection/councils/EppingForestDistrictCouncil.py-56-65 (1)

56-65: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Raise on unparseable dates instead of silently continuing.

except ValueError: continue silently drops a bin type if its date string has an unexpected format. This means API-side format changes go undetected at runtime — the bin entry simply disappears from output with no error. Based on learnings, the project convention is to prefer explicit failures so format changes surface immediately rather than going unnoticed.

🐛 Proposed fix
-                try:
-                    parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y")
+                parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y")
                    data["bins"].append(
                        {
                            "type": bin_type,
                            "collectionDate": parsed.strftime(date_format),
                        }
                    )
-                except ValueError:
-                    continue

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, so format changes are detected early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EppingForestDistrictCouncil.py`
around lines 56 - 65, The code in EppingForestDistrictCouncil where parsed =
datetime.strptime(date_str.strip(), "%d/%m/%Y") silently swallows ValueError and
continues; change this to raise an explicit exception (or re-raise with a clear
message) instead of continue so parse failures fail fast; update the exception
handling in the method (the block that builds data["bins"] with "type": bin_type
and "collectionDate") to raise a ValueError that includes the offending
date_str, the expected format "%d/%m/%Y", and the bin_type to make debugging and
upstream detection immediate.
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py-140-146 (1)

140-146: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

XPath breaks if user_paon contains a single quote.

The user_paon value is directly interpolated into the XPath string. If the address contains an apostrophe (e.g., "St John's Close"), the XPath will be malformed and raise an exception or match nothing.

Use concat() or escape quotes to handle this edge case:

🛠️ Proposed fix using XPath escaping
-                desired_option = WebDriverWait(driver, 10).until(
-                    EC.element_to_be_clickable((
-                        By.XPATH,
-                        f"//li[contains(`@class`, 'active-result') and contains(., '{user_paon}')]"
-                    ))
-                )
+                # Escape single quotes in user_paon for XPath
+                if "'" in user_paon:
+                    # Use concat() to handle quotes: concat("St John", "'", "s Close")
+                    escaped_paon = "concat('" + user_paon.replace("'", "',\"'\",'" ) + "')"
+                    xpath_expr = f"//li[contains(`@class`, 'active-result') and contains(., {escaped_paon})]"
+                else:
+                    xpath_expr = f"//li[contains(`@class`, 'active-result') and contains(., '{user_paon}')]"
+                desired_option = WebDriverWait(driver, 10).until(
+                    EC.element_to_be_clickable((By.XPATH, xpath_expr))
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py` around
lines 140 - 146, The XPath used to locate the PAON option interpolates user_paon
directly and will break if user_paon contains a single quote; update the locator
construction in the block that uses WebDriverWait / desired_option / driver to
escape quotes or build a safe XPath (e.g., use XPath concat() to join pieces
around any single quotes or alternatively switch to a locator that matches text
via normalize-space() combined with translate() or use a CSS selector). Ensure
the code that forms f"//li...contains(., '{user_paon}')" is replaced with a
function that returns a quoted-safe XPath string before passing it to
WebDriverWait and keep the click on desired_option the same.
uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py-36-58 (1)

36-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unrecognized dict schema silently returns empty bins.

The elif isinstance(json_data, dict) branch matches any dict that isn't the legacy format — including a completely unexpected API response. If none of the type_map keys (refuse, recycling, garden, food) are present, the function returns {"bins": []} with no diagnostic, making it indistinguishable from a valid "no collections due" response.

Consider adding a guard after the loop (or at entry to the branch) to raise when the response matches neither expected schema:

💡 Proposed fix
         elif isinstance(json_data, dict):
+            if not any(k in json_data for k in type_map):
+                raise ValueError(
+                    f"Unrecognised API response format; no expected bin-type keys found: {list(json_data.keys())}"
+                )
             for key, display_name in type_map.items():

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py`
around lines 36 - 58, The branch handling elif isinstance(json_data, dict)
silently returns an empty data["bins"] when the dict doesn't contain any
expected keys; update this branch (using json_data and the existing type_map) to
detect whether any of the type_map keys were present/processed (e.g., track a
matched flag or count while iterating), and if none were matched raise a clear
exception (or log and raise) describing "unexpected API schema" and include the
raw json_data or its keys so callers can surface the error instead of treating
it as "no collections".
uk_bin_collection/uk_bin_collection/councils/MidAndEastAntrimBoroughCouncil.py-35-36 (1)

35-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

UA Chrome version is 10 major releases behind current stable (Chrome 148)

Chrome 148 is the current stable version, released May 5, 2026. The hardcoded string Chrome/138.0.0.0 is ~10 versions stale. While most council sites don't enforce strict UA version checks, bot-detection middleware increasingly flags very outdated Chrome versions. Easy to bump.

🔧 Proposed fix
-            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/MidAndEastAntrimBoroughCouncil.py`
around lines 35 - 36, The user-agent string in MidAndEastAntrimBoroughCouncil.py
is hardcoded to an outdated Chrome version; update the user_agent variable used
in the create_webdriver(web_driver, headless, user_agent, __name__) call to a
current Chrome stable string (e.g. replace "Chrome/138.0.0.0" with
"Chrome/148.0.0.0") or make it configurable via a constant/setting so future
bumps are easy (update the user_agent assignment and any related constant used
by create_webdriver).
uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py-34-35 (1)

34-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

User-agent version Chrome/138 is already ~10 versions behind the current stable release.

Chrome 148 is the current stable version as of May 5, 2026. The hardcoded Chrome/138.0.0.0 string is 10 major versions stale, which could trigger bot-detection heuristics on sites that check for suspiciously outdated browser signatures.

🔧 Suggested update
-            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py`
around lines 34 - 35, The hardcoded user_agent string in
BroxtoweBoroughCouncil.py (variable user_agent) uses Chrome/138 which is
outdated; update it to a current stable Chrome version (e.g., Chrome/148.0.0.0)
or replace the hardcoded string with a dynamically generated/maintained value
(for example reading from a CHROME_MAJOR_VERSION constant or using a user-agent
helper) and pass that to create_webdriver(web_driver, headless, user_agent,
__name__); ensure you update the user_agent variable rather than only changing
the create_webdriver call so tests and other modules receive the corrected UA.
uk_bin_collection/uk_bin_collection/councils/BarnetCouncil.py-93-97 (1)

93-97: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fallback PAON substring match can select the wrong address for short numeric house numbers.

The fallback at line 94 uses paon_str in option.text.upper() which is a plain substring check. If user_paon is "1", it will match option texts like "10 HIGH STREET", "11 HIGH STREET", "21 ...", etc. and select the first such match, silently returning bin data for the wrong property.

A safer fallback is to match against word boundaries, or at minimum require a space or comma immediately after the PAON:

🛡️ Proposed fix
             if not selected:
                 # Fallback: select first option containing the paon
                 for option in dropdown.options:
-                    if paon_str in option.text.upper():
+                    option_text_fb = option.text.upper().strip()
+                    if (
+                        option_text_fb.startswith(paon_str + " ")
+                        or option_text_fb.startswith(paon_str + ",")
+                        or f", {paon_str} " in option_text_fb
+                        or f", {paon_str}," in option_text_fb
+                    ):
                         dropdown.select_by_value(option.get_attribute("value"))
                         selected = True
                         break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BarnetCouncil.py` around lines
93 - 97, The current fallback substring check "paon_str in option.text.upper()"
(used in the loop over dropdown.options, setting selected via
dropdown.select_by_value(...)) can falsely match short numeric PAONs like "1"
against "10 HIGH STREET"; replace this with a word-boundary-aware match: build a
regex from paon_str (escape it) and test option.text.upper() against r'\b' +
PAON + r'(?:\b|[ ,])' (or split option.text into tokens and compare whole-token
equality) so only whole PAON matches (or followed by space/comma) are accepted;
ensure you import re and keep setting selected and breaking only on a true
whole-word match.
uk_bin_collection/uk_bin_collection/councils/Hillingdon.py-112-112 (1)

112-112: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Chrome/138.0.0.0 in the user-agent is ~10–11 months stale

Chrome 138 was in beta as of 28 May, 2025, and ChromeOS LTS 144 was released on April 21, 2026, meaning the current stable channel is around Chrome 144–145. Presenting an outdated version may increase bot-detection risk on the target site.

🔧 Suggested update
-            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` at line 112, The
user_agent string in Hillingdon.py is stale (Chrome/138.0.0.0); update the
user_agent variable to a current, realistic Chrome stable identifier (e.g.,
Chrome/145.0.0.0 or use a more generic recent UA) to reduce bot-detection risk;
locate the user_agent assignment in the Hillingdon class or module and replace
the hard-coded value accordingly (or implement a small helper to generate/rotate
a recent UA if preferred).
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py-105-108 (1)

105-108: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent None return for invalid-but-supplied paon produces a misleading error

When a caller provides a non-None paon that fails the digit/length test (e.g., a regular house number like "42" or a 9-digit string), _pid_from_paon returns None. Combined with no URL-based PID, the caller then hits:

ValueError: Richmond: supply a URL with ?pid=... OR put PID in the House Number field.

This is confusing because the user did supply something in the House Number field. Per the project's preference for explicit failures, an out-of-range-but-non-empty paon should raise a descriptive error rather than silently fall through.

🛠️ Proposed fix
     def _pid_from_paon(self, paon):
-        if paon and str(paon).isdigit() and 10 <= len(str(paon)) <= 14:
-            return str(paon)
-        return None
+        if not paon:
+            return None
+        s = str(paon)
+        if s.isdigit() and 10 <= len(s) <= 14:
+            return s
+        raise ValueError(
+            f"Richmond: House Number '{s}' does not look like a valid PID "
+            "(expected a 10–14 digit UPRN). Supply a URL with ?pid=... instead."
+        )

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, so format changes are detected early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 105 - 108, The function _pid_from_paon currently returns None
silently when paon is provided but doesn't meet the digit/length check; instead,
change its behavior to raise a ValueError with a clear message (e.g., "Richmond:
invalid PID in House Number; expected 10–14 digit PID") whenever paon is
non-empty but fails the str(paon).isdigit() or length check so callers get an
explicit error; keep returning the string when valid and only return None when
paon is truly empty/None.
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py-25-29 (1)

25-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

"pid=" in url substring check can give a false positive

If base_url contains any parameter whose name happens to include pid= as a substring (e.g., &upid=..., &notpid=...), the condition evaluates to False, so the correctly-resolved pid is never appended to the URL, and the request goes out with the wrong query string.

parse_qs is already imported and used in _pid_from_url — use it here for a consistent, correct check:

🛠️ Proposed fix
-        if "pid=" not in (base_url or ""):
+        if not parse_qs(urlparse(base_url or "").query).get("pid"):
             sep = "&" if "?" in (base_url or "") else "?"
             target_url = f"{base_url}{sep}pid={pid}"
         else:
             target_url = base_url
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 25 - 29, The substring check "pid=" on base_url can give false
positives; instead parse the URL query and check for a real pid parameter. In
the block that sets target_url (referencing variables base_url, pid, target_url
in LondonBoroughOfRichmondUponThames.py), call urlparse(base_url).query and then
parse_qs(...) to get params, test whether 'pid' is a key in that dict, and only
append "?pid=..." or "&pid=..." when 'pid' is absent; otherwise leave target_url
as base_url. Use the same parse_qs usage pattern as in the existing
_pid_from_url helper to keep behavior consistent.
uk_bin_collection/uk_bin_collection/councils/GreatYarmouthBoroughCouncil.py-109-111 (1)

109-111: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Year-rollover causes upcoming Jan–Mar dates to be silently dropped when scraped in Oct–Dec.

With datetime.today().year applied unconditionally, any date the council returns for January/February/March (which appear in the schedule during Oct/Nov/Dec of the preceding year) is parsed with the current year, immediately compared as a past date, and skipped via continue. Users scraping in November or December would silently receive no data for their next collections.

NorthumberlandCouncil.py in the same PR applies an explicit rollover guard. Apply the same fix here:

🐛 Proposed fix — mirror the NorthumberlandCouncil year-rollover guard
+                today = datetime.today()
+                current_year = today.year
+                current_month = today.month
                 try:
                     # Parse the date text
-                    date_obj = datetime.strptime(date_text + " " + str(datetime.today().year), "%A %d %B %Y")
-                    if date_obj.date() < datetime.today().date():
+                    date_obj = datetime.strptime(date_text + " " + str(current_year), "%A %d %B %Y")
+                    # If Oct–Dec and collection falls in Jan–Mar, it belongs to next year
+                    if current_month >= 10 and date_obj.month in [1, 2, 3]:
+                        date_obj = date_obj.replace(year=current_year + 1)
+                    if date_obj.date() < today.date():
                         continue  # Skip past dates
                 except ValueError:
                     continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/GreatYarmouthBoroughCouncil.py`
around lines 109 - 111, The date parsing always appends datetime.today().year
which causes Jan–Mar dates scraped in Oct–Dec to be treated as past and skipped;
update the logic around the date parsing in GreatYarmouthBoroughCouncil.py
(where date_obj is created from date_text) to apply a year-rollover guard like
NorthumberlandCouncil.py: parse the month from date_text (or a temp date with
the current year), and if that month is less than the current month, add 1 to
the year before constructing date_obj so Jan–Mar entries scraped late in the
year are assigned the next year instead of being dropped.
uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py-15-15 (1)

15-15: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include the full alphabet in LETTERS.

X and Z are missing here, so streets starting with those letters fall back to the A page and cannot resolve correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py` at line 15,
The LETTERS constant in ThurrockCouncil.py is missing 'X' and 'Z', causing
streets starting with those letters to map incorrectly; update the LETTERS
variable (used in the ThurrockCouncil class/file) to include the full alphabet
(A–Z) so all initial letters are handled rather than the truncated
"ABCDEFGHIJKLMNOPQRSTUVWY".
uk_bin_collection/tests/generate_map_test_results.py-80-81 (1)

80-81: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the usage string.

The help text still references generate_test_results.py, but this script is generate_map_test_results.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/tests/generate_map_test_results.py` around lines 80 - 81,
Update the usage/help string printed in the startup argument check so it
references the correct script name; in the block that tests len(sys.argv) < 2
(the startup argument validation), replace "Usage: python
generate_test_results.py <junit.xml path> [--detailed]" with "Usage: python
generate_map_test_results.py <junit.xml path> [--detailed]" (or an equivalent
accurate invocation string).

Comment on lines +30 to +32
# Create the Selenium WebDriver
driver = create_webdriver(web_driver, headless, None, __name__)
user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
driver = create_webdriver(web_driver, headless, user_agent, __name__)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)"
echo "Inspecting: $file"
sed -n '28,35p' "$file"

python - <<'PY'
from pathlib import Path
import ast

path = next(Path('.').rglob('ThanetDistrictCouncil.py'))
source = path.read_text()

try:
    ast.parse(source)
    print("AST parse succeeded")
except SyntaxError as exc:
    print(f"SyntaxError: {exc.msg} at line {exc.lineno}, column {exc.offset}")
    raise
PY

Repository: robbrad/UKBinCollectionData

Length of output: 941


🏁 Script executed:

#!/bin/bash
# Check broader context of the problematic section
file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)"
echo "=== Full context around the error (lines 20-55) ==="
sed -n '20,55p' "$file"

Repository: robbrad/UKBinCollectionData

Length of output: 1588


🏁 Script executed:

#!/bin/bash
# Check for the finally/except blocks
file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)"
echo "=== Looking for try/except/finally structure (lines 34-100) ==="
sed -n '34,100p' "$file"

Repository: robbrad/UKBinCollectionData

Length of output: 1743


Fix the mis-indented WebDriver assignment.

Line 32 is indented one level deeper than the surrounding block, causing an IndentationError: unexpected indent that prevents the module from parsing. Move the driver assignment inside the try block at the correct indentation level, initialize driver = None before the try, and guard the finally cleanup with an if driver: check.

🐛 Proposed fix
     def parse_data(self, page: str, **kwargs) -> dict:
         user_uprn = kwargs.get("uprn")
         check_uprn(user_uprn)
         bindata = {"bins": []}
+        driver = None

         url = f"https://www.thanet.gov.uk/wp-content/mu-plugins/collection-day/incl/mu-collection-day-calls.php?pAddress={user_uprn}"
         web_driver = kwargs.get("web_driver")
         headless = kwargs.get("headless")

-        # Create the Selenium WebDriver
-        user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
-            driver = create_webdriver(web_driver, headless, user_agent, __name__)
-
         try:
+            # Create the Selenium WebDriver
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            driver = create_webdriver(web_driver, headless, user_agent, __name__)
             print(f"Navigating to URL: {url}")
             driver.get(url)
         finally:
             print("Cleaning up WebDriver...")
-            driver.quit()
+            if driver:
+                driver.quit()
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 32-32: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ThanetDistrictCouncil.py` around
lines 30 - 32, The driver assignment is mis-indented causing a parse error;
initialize driver = None before the try block, move the call to
create_webdriver(web_driver, headless, user_agent, __name__) into the try at the
same indentation level as the rest of the block (assigning to driver), and in
the finally section only call driver.quit() or similar if driver is truthy (if
driver:) to avoid errors when construction fails; the symbols to adjust are the
driver variable, the create_webdriver(...) call, and the surrounding try/finally
that performs cleanup.

"(KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36"
)
}
response = requests.get(api_url, headers=headers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add a timeout to requests.get — blocking call with no deadline.

Without a timeout this call can hang the worker thread indefinitely if West Suffolk's IIS is slow or unresponsive, which can exhaust the thread pool in a multi-scraper run.

🔧 Proposed fix
-        response = requests.get(api_url, headers=headers)
+        response = requests.get(api_url, headers=headers, timeout=60)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = requests.get(api_url, headers=headers)
response = requests.get(api_url, headers=headers, timeout=60)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` at line
26, The requests.get call using api_url and headers in WestSuffolkCouncil should
include a timeout to avoid blocking indefinitely; update the
requests.get(api_url, headers=headers) invocation to pass a sensible timeout
(e.g. timeout=10 or a configurable constant) and ensure the surrounding code
handles requests.exceptions.Timeout and other requests.RequestException errors
so the scraper fails fast and logs the error instead of hanging the worker
thread.

@robbrad robbrad merged commit fe0a245 into master May 1, 2026
7 of 11 checks passed
@robbrad robbrad deleted the may-2026-release branch May 1, 2026 21:06
This was referenced May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calendar setup async_config_entry_first_refresh causing unavailable sensors

6 participants