May 2026 Release by robbrad · Pull Request #1992 · robbrad/UKBinCollectionData

robbrad · 2026-05-01T20:20:15Z

May 2026 Release — 60 PRs Consolidated

All open PRs merged into a single release branch with 7 merge conflicts resolved.

Council Fixes (53 PRs)

PR	Council(s)	Change
#1899	Babergh, MidSuffolk, Rother, Wirral	Date parsing, null handling, input.json
#1909	EastSuffolk	Fix collection parsing
#1916	Southampton	Include food waste collections
#1918	Blaby	Add User-Agent header (403 fix)
#1919	Brent	Add User-Agent header
#1920	Swale	Add user_agent to webdriver
#1921	SouthHams	Init fcc_session_token before try
#1922	Oxford	Handle multiple editor divs
#1923	ReigateAndBanstead	Select populated span
#1924	Shropshire	Use full link text for bin type
#1925	Salford	Use lxml parser for malformed HTML
#1926	Rushmoor	Handle raw JSON response
#1927	Plymouth	Fix garden collection
#1929	Crawley	Add XML response parsing fallback
#1930	Newham	Updated card selectors, date format fallback
#1932	Torridge	Parse new relative-date SOAP format
#1933	Peterborough	user_agent and container selector
#1934	Horsham	user_agent and postcode ID selector
#1935	API server	Fix double-encoded JSON string
#1936	NeathPortTalbot	user_agent and JS click for overlay
#1937	Bexley	Replace Selenium with requests
#1938	Stirling	Replace Selenium with Recollect API
#1939	WindsorAndMaidenhead	Replace Selenium with requests
#1940	EppingForest	Replace Selenium with ArcGIS REST API
#1941	Moray	Rewrite for annual calendar
#1942	EastLothian	Rewrite for Drupal migration
#1943	Barnet	Rewrite for Jadu migration
#1944	NorthNorfolk	Rewrite for X-Forms
#1947	63 councils	Add realistic user_agent to all Selenium scrapers
#1948	Richmond	Rewrite for My Richmond page
#1949	NorthYorkshire	Find HTML data dynamically in AJAX
#1950	GreatYarmouth	Use current year instead of hardcoded 2025
#1952	Wyre	Skip non-bin divs, harden date parsing
#1953	Chichester	Rewrite for AchieveForms V7
#1954	StocktonOnTees	Update form IDs to V2, dismiss cookie banner
#1955	WestSuffolk	Send User-Agent header
#1956	Midlothian	Rewrite for MyMidlothian Granicus
#1957	SouthStaffordshire	Use objectId query param
#1958	Chorley, Islington, Medway, Hastings, Conwy	Reliability sweep
#1971	Tewkesbury	Handle new API response format
#1972	Boston	Drive chosen.js widget
#1973	Chelmsford	Dismiss email-signup modal
#1974	ArmaghBanbridgeCraigavon	Full browser User-Agent for WAF
#1975	NorthHertfordshire	Migrate to Netcall Liberty Create
#1978	WestSuffolk	User-Agent and renamed h4 heading
#1979	EnvironmentFirst, Eastbourne, Lewes	Handle restructured page
#1980	MidSussex	Migrate to Whitespace portal
#1981	Rotherham	Rewrite for Imactivate API
#1983	SouthOxfordshire	Session establishment and bin type cleanup
#1984	Chelmsford	Replace Selenium with requests
#1985	MidSuffolk	Replace sleep with WebDriverWait
#1987	NorthKesteven	Make own HTTP request
#1988	Fareham	Switch to calendar page
#1989	EastRiding	Replace Selenium with JSON API
#1990	Edinburgh	Accept real postcode and house number
#1991	Thurrock	Accept real postcode and house number

Dependencies (4 PRs)

PR	Change
#1963	Bump pillow 12.1.1 to 12.2.0
#1967	Bump mako 1.3.10 to 1.3.11
#1976	Bump python-dotenv 1.1.0 to 1.2.2
#1977	Bump lxml 5.3.2 to 6.1.0

Merge Conflicts Resolved (7)

ChichesterDistrictCouncil: took AchieveForms V7 rewrite (fix: ChichesterDistrictCouncil - rewrite for AchieveForms V7 with inline results #1953) over user_agent-only (fix: add realistic user_agent to all Selenium scrapers missing one #1947)
MidlothianCouncil: took newer timeout fix (fix: SouthStaffordshireDistrictCouncil - use objectId query param and new parse #1957) over sleep-based (fix: MidlothianCouncil - rewrite for MyMidlothian Granicus fillform #1956)
poetry.lock: resolved dep bumps (chore: bump python-dotenv from 1.1.0 to 1.2.2 #1976, chore: bump lxml from 5.3.2 to 6.1.0 #1977) against earlier lock state
WestSuffolkCouncil: took complete fix (fix: WestSuffolkCouncil — User-Agent + renamed h4 heading #1978) over header-only (fix: WestSuffolkCouncil - send a User-Agent header #1955)
ChelmsfordCityCouncil: took Selenium to requests rewrite (fix: ChelmsfordCityCouncil - replace Selenium with requests (emarsys popup blocks clicks) #1984) over modal dismiss (fix: ChelmsfordCityCouncil - dismiss email-signup modal blocking form #1973)
MidSuffolkDistrictCouncil: took WebDriverWait rewrite (fix: MidSuffolkDistrictCouncil - replace sleep(5) with WebDriverWait, fix date parsing #1985) over comma-split fix (fix: March 2026 integration test fixes — 4 councils repaired + diagnostics #1899)
EastRidingCouncil: took JSON API rewrite (fix: EastRidingCouncil - replace Selenium with direct JSON API #1989) over user_agent-only (fix: add realistic user_agent to all Selenium scrapers missing one #1947)

Issues Resolved (20)

This release fixes the following open issues:

Calendar setup async_config_entry_first_refresh causing unavailable sensors #1555 NorthHertfordshire — Calendar setup async_config_entry_first_refresh
Buckhurst Hill #1616 EppingForest (Buckhurst Hill) — scraper not working
Error with Great Yarmouth Council #1725 GreatYarmouth — error with council
Boston Borough Council form updated #1768 Boston — form updated
Stockton-on-Tees Council Issue #1804 StocktonOnTees — council issue
Changes to bin types on council webpage not reflected in HA #1889 Midlothian — bin types not reflected in HA
Plymouth Council - Not Detecting Food or Garden Waste Collections #1890 Plymouth — not detecting food or garden waste
Midlothian Council changed all bins pickup schedule due to new green bins #1893 Midlothian — changed pickup schedule for new green bins
Peterborough City Council - Timeout acquiring data #1902 Peterborough — timeout acquiring data
Mid Suffolk District Council - Date Parsing Error #1904 MidSuffolk — date parsing error
Unexpected error: time data does not match format '%A %d %B' Crawley #1906 Crawley — time data does not match format
Oxford City Council not working today #1908 Oxford — not working
Chorley council #1910 Chorley — council broken
East Suffolk Council - Failed setup, will retry: Unexpected error: list index out of range #1911 EastSuffolk — list index out of range
Cannot seem to get Epping Forest to work #1914 EppingForest — cannot get to work
Council site has an accept cookies pop-up, Selenium doesn't seem to cope then times out #1931 Chichester — accept cookies popup blocks Selenium
Midlothian (again!) not working #1951 Midlothian — not working (again)
Shropshire Council - Recycling Showing as "Collection Collection" in Calendar #1962 Shropshire — recycling showing as "Collection Collection"
Failed Setup: expected string; got list #1970 NorthYorkshire — expected string, got list
Waste Bin not showing, ok until Friday PM #1982 MidSussex — initial page did not load correctly

Closes #1555, #1616, #1725, #1768, #1804, #1889, #1890, #1893, #1902, #1904, #1906, #1908, #1910, #1911, #1914, #1931, #1951, #1962, #1970, #1982

Code Review Notes

No malicious code found — all URLs point to council .gov.uk sites or standard libraries
No eval/exec/subprocess/os.system calls added
No hardcoded credentials or API keys
Fixed indentation bug in server.py from PR Fix: API server returning double-encoded JSON string #1935 merge (try block was at column 0)
A few verify=False SSL calls exist but are consistent with existing codebase patterns for councils with bad certs

…tCouncil, WirralCouncil - date parsing, null handling, input.json config

…failures

Signed-off-by: Tim Collins <tim@thecollins.team>

… of first

This reverts commit 1648b2f.

…mat fallback

…elector

…ick on submit

Fix double-encoded JSON output in API server

coderabbitai

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (19)

uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)

52-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused headers variable.

The headers dictionary is defined but never used anywhere in the function. This is dead code.

🧹 Proposed fix

-            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py`
at line 52, The file defines an unused variable headers = {"User-Agent":
"Mozilla/5.0 (Windows NT 6.1; Win64; x64)"} in the BrightonandHoveCityCouncil
code; remove that dead assignment (search for the symbol headers in the
BrightonandHoveCityCouncil module or the surrounding function where it appears)
so the unused variable is deleted, and run tests/lint to ensure no other
references remain; if the User-Agent was intended for HTTP requests instead,
replace the direct requests call to pass headers rather than keeping the unused
variable.

uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py (1)

27-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent ValueError suppression hides API format regressions.

Both the legacy and new-format branches swallow ValueError with continue, silently dropping bins whose dates can't be parsed. Per the project's established convention, parsers should raise explicitly so format changes are caught immediately rather than producing a quietly empty result.

💡 Proposed fix — re-raise instead of silently skipping

-                    except ValueError:
-                        continue
+                    except ValueError as e:
+                        raise ValueError(
+                            f"Unexpected date format in legacy response for '{bin_type}': {date_str!r}"
+                        ) from e

-                except ValueError:
-                    continue
+                except ValueError as e:
+                    raise ValueError(
+                        f"Unexpected nextCollectionDate format for '{key}': {date_str!r}"
+                    ) from e

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors … ensures format changes are detected early."

Also applies to: 50-58

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py`
around lines 27 - 34, The ValueError from datetime.strptime in
TewkesburyBoroughCouncil.py is being silently swallowed (the try/except uses
continue), so change the handlers in both parsing branches (the new-format block
around date_str -> datetime.strptime and the legacy-format block around lines
~50-58) to re-raise the exception (or remove the except) instead of continuing;
ensure the error includes context (e.g., the offending date_str and bin_type) so
unexpected API format changes surface immediately rather than silently dropping
bins.

uk_bin_collection/uk_bin_collection/councils/NorthDevonCountyCouncil.py (1)

130-132: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unchecked find() return values will raise AttributeError on missing spans, aborting the entire parse.

All three find(...) calls are outside the try/except block (which starts at line 135), so any missing span propagates to the outer handler and raises, producing an empty result. Council sites frequently change their HTML structure.

🛡️ Proposed fix — guard each span lookup

-                        bin_type = li.find("span", class_="wasteType").text.strip()
-                        day = li.find("span", class_="wasteDay").text.strip()
-                        weekday = li.find("span", class_="wasteName").text.strip()
+                        waste_type_span = li.find("span", class_="wasteType")
+                        waste_day_span = li.find("span", class_="wasteDay")
+                        if not waste_type_span or not waste_day_span:
+                            continue
+                        bin_type = waste_type_span.text.strip()
+                        day = waste_day_span.text.strip()
+                        weekday_span = li.find("span", class_="wasteName")
+                        weekday = weekday_span.text.strip() if weekday_span else ""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/NorthDevonCountyCouncil.py`
around lines 130 - 132, The three span lookups that set bin_type, day and
weekday in NorthDevonCountyCouncil.py currently call .text on the result of
li.find(...) and can raise AttributeError if a span is missing; wrap each lookup
in a safe guard (e.g., assign the result of li.find("span", class_="wasteType")
to a temp, check for None, then use .get_text(strip=True) or a default like ""),
or move them inside the existing try/except block and check for None before
accessing .text so bin_type, day and weekday are always set to a safe default
rather than raising.

uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py (1)

84-87: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cells[0] and cells[3] are accessed without a bounds check — rows with fewer <td> elements raise IndexError.

cells[0] on line 85 is unconditional. The cells[3] in the if condition on line 87 looks like a guard, but Python list indexing raises IndexError when the index is out of range rather than returning a falsy value, so it provides no protection for short rows (e.g. a header <tr> containing only <th> elements, or an empty row).

🛡️ Proposed fix

                     for row in bins_table.find_all("tr"):
                         cells = row.find_all("td")
-                        bin_type = cells[0].get_text(strip=True)
-                        # Skip header row
-                        if bin_type and cells[3] and bin_type != "Bin Type":
+                        if len(cells) < 4:
+                            continue
+                        bin_type = cells[0].get_text(strip=True)
+                        # Skip header row
+                        if bin_type and bin_type != "Bin Type":

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py`
around lines 84 - 87, The code reads cells = row.find_all("td") and then
accesses cells[0] and cells[3] without ensuring the list has enough elements,
which can raise IndexError; update the loop in BroxtoweBoroughCouncil (the block
using row, cells and bin_type) to first check len(cells) >= 4 before accessing
cells[0] or cells[3] and skip the row if it’s shorter (or treat header rows
separately), and adjust the bin_type extraction/condition to only run when the
length check passes so short/empty/header rows are safely ignored.

uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py (1)

20-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Backwards-compatible url fallback is blocked by the upfront UPRN validation.

If a caller only passes url, check_uprn(user_uprn) raises before Lines 24-26 execute, so the fallback path is unreachable in practice.

Suggested fix

         try:
             user_uprn = kwargs.get("uprn")
-            check_uprn(user_uprn)
-            url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
-            if not user_uprn:
-                # Backwards compatibility for users who stored a URL.
-                url = kwargs.get("url")
+            if user_uprn:
+                check_uprn(user_uprn)
+                url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
+            else:
+                # Backwards compatibility for users who stored a URL.
+                url = kwargs.get("url")
+                if not url:
+                    raise ValueError("Either 'uprn' or 'url' must be provided")
         except Exception as e:
             raise ValueError(f"Error getting identifier: {str(e)}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py` around
lines 20 - 28, The current logic calls check_uprn(user_uprn) before falling back
to kwargs.get("url"), so callers that provide only url never reach the fallback;
fix by first assigning user_uprn = kwargs.get("uprn") and url =
kwargs.get("url") (or assign url after fallback) and only call
check_uprn(user_uprn) if user_uprn is truthy, otherwise keep the provided url;
update the block around check_uprn, user_uprn and url to perform conditional
validation so the backwards-compatible url path is reachable.

uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py (1)

20-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Backwards-compatible url fallback never actually runs.

check_uprn(user_uprn) executes before Lines 24-26, so callers that only provide url still fail validation and never reach the fallback branch. That breaks the compatibility path this change is trying to preserve.

Suggested fix

         try:
             user_uprn = kwargs.get("uprn")
-            check_uprn(user_uprn)
-            url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
-            if not user_uprn:
-                # Backwards compatibility for users who stored a URL.
-                url = kwargs.get("url")
+            if user_uprn:
+                check_uprn(user_uprn)
+                url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}"
+            else:
+                # Backwards compatibility for users who stored a URL.
+                url = kwargs.get("url")
+                if not url:
+                    raise ValueError("Either 'uprn' or 'url' must be provided")
         except Exception as e:
             raise ValueError(f"Error getting identifier: {str(e)}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py`
around lines 20 - 28, The fallback URL branch never runs because
check_uprn(user_uprn) is called before testing for a missing uprn; modify the
logic in the handler that uses user_uprn, check_uprn and url so you first read
user_uprn = kwargs.get("uprn") and if user_uprn is truthy call
check_uprn(user_uprn) and build url =
f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}", otherwise set url
= kwargs.get("url") for backward compatibility; keep the existing exception
handling (raising ValueError with the error string) unchanged and reference the
symbols user_uprn, check_uprn and url when making the change.

uk_bin_collection/uk_bin_collection/councils/SwaleBoroughCouncil.py (1)

99-106: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace the "No future date found" sentinel with an explicit None-guard.

If future_collection_date_tag is None but future_bins is non-empty, parse_collection_date("No future date found") produces a confusing ValueError: time data 'No future date found' does not match format '%A, %d %B' rather than a clear signal that the date element itself was absent. Per the project's explicit-failure convention, the cause should be surfaced directly.
🛡️ Proposed fix
-future_collection_date = (
-    future_collection_date_tag.text.split("starting from")[-1].strip()
-    if future_collection_date_tag
-    else "No future date found"
-)
+future_collection_date = (
+    future_collection_date_tag.text.split("starting from")[-1].strip()
+    if future_collection_date_tag
+    else None
+)
Then inside the loop:
 for bin in future_bins:
+    if future_collection_date is None:
+        raise ValueError(
+            "Future bins were found but no 'starting from' date element is present on the page"
+        )
     collection_date = parse_collection_date(future_collection_date)
Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors; use clear exception types and document error causes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/SwaleBoroughCouncil.py` around
lines 99 - 106, The code currently assigns "No future date found" to
future_collection_date which causes parse_collection_date to raise an unclear
ValueError; change the assignment so future_collection_date is None when
future_collection_date_tag is missing (i.e., future_collection_date =
future_collection_date_tag.text... if future_collection_date_tag else None) and
then inside the loop that iterates future_bins, check if future_collection_date
is None and, if so, raise a clear exception (e.g., ValueError with a message
referencing future_collection_date_tag/future_bins) before calling
parse_collection_date so the absence of the element is surfaced explicitly.

uk_bin_collection/uk_bin_collection/councils/Hillingdon.py (2)

266-268: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Silent except Exception: continue masks page-format changes

If the council changes its HTML structure, the selector/parse errors here are printed and swallowed, returning a silently incomplete data["bins"] with no signal to operators. The same pattern exists in get_bank_holiday_changes (Line 88–90 and Line 92–93).

Prefer re-raising (or at minimum propagating a recognisable exception) so that format regressions are caught immediately rather than producing quietly wrong data. Based on learnings, this project explicitly favours explicit failures over silent defaults or swallowed errors.

♻️ Suggested refactor

-                except Exception as e:
-                    print(f"Error processing item: {e}")
-                    continue
+                except Exception as e:
+                    raise ValueError(
+                        f"Failed to parse bin row; page format may have changed: {e}"
+                    ) from e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` around lines 266
- 268, The except block in the item-processing loop (and the similar handlers in
get_bank_holiday_changes) currently swallows all exceptions with print(...) and
continue, hiding HTML/selector regressions; change these handlers to re-raise
the caught exception (or raise a custom descriptive exception) instead of
continuing so failures surface—locate the try/except in the function that
populates data["bins"] (and the try/except blocks inside
get_bank_holiday_changes) and replace the except Exception as e: print(...);
continue with either: raise or raise RuntimeError(f"Hillingdon parser error
processing item: {e}") to propagate a clear error to callers/operators.

235-268: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Duplicate bin entries: hardcoded fallback and scraped rows are both unconditionally appended

Lines 235–242 always append three hardcoded entries (General Waste, Recycling, Food Waste) before lines 244–268 parse and also append the actual scraped div.bin--row entries. When the HTML scrape succeeds, data["bins"] ends up with each bin type twice — once with the day-of-week–computed date and once with the date parsed from the page — which will produce incorrect output for any downstream consumer.

Either remove the hardcoded block (let the scraper stand alone) or guard it as a genuine fallback that fires only when the div.bin--row parse yields nothing:

🐛 Proposed fix

-            # Add collection dates for each bin type
-            bin_types = ["General Waste", "Recycling", "Food Waste"]
-            for bin_type in bin_types:
-                data["bins"].append(
-                    {
-                        "type": bin_type,
-                        "collectionDate": next_collection.strftime("%d/%m/%Y"),
-                    }
-                )
-
             # Process collection details
             bin_rows = soup.select("div.bin--row:not(:first-child)")
             for row in bin_rows:
                 try:
                     bin_type = row.select_one("div.col-md-3").text.strip()
                     collection_dates_div = row.select("div.col-md-3")[1]
                     next_collection_text = "".join(
                         collection_dates_div.find_all(text=True, recursive=False)
                     ).strip()
                     cleaned_date_text = remove_ordinal_indicator_from_date_string(
                         next_collection_text
                     )
                     parsed_date = parse(cleaned_date_text, fuzzy=True)
                     bin_date = parsed_date.strftime("%d/%m/%Y")

                     if bin_type and bin_date:
                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": bin_date,
                             }
                         )
                 except Exception as e:
                     print(f"Error processing item: {e}")
                     continue

+            # Fallback: if no bin rows were scraped, use day-of-week computed date
+            if not data["bins"]:
+                bin_types = ["General Waste", "Recycling", "Food Waste"]
+                for bin_type in bin_types:
+                    data["bins"].append(
+                        {
+                            "type": bin_type,
+                            "collectionDate": next_collection.strftime("%d/%m/%Y"),
+                        }
+                    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` around lines 235
- 268, The code currently always appends three hardcoded entries via bin_types
into data["bins"] and then also appends parsed rows from bin_rows, causing
duplicates; change the logic so the hardcoded fallback (the loop that iterates
bin_types and appends to data["bins"]) only runs when no parsed rows are found
(i.e., when bin_rows is empty or when parsing yields no valid entries), or
remove that fallback entirely—update the block around bin_types and the
subsequent bin_rows parsing so that either parsed results are used
preferentially (append hardcoded entries only if parsed results list is empty)
and ensure references like data["bins"], bin_types, and bin_rows remain
consistent.

uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py (2)

142-142: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent fallback may mask JS-mapping format changes.

If the JS parsing above yields no entries, every bin type silently resolves to its raw table-cell text (svc). Per the project's convention of preferring explicit failures, consider at least logging a warning when svc is not found in a non-empty map, so format changes are surfaced quickly.

-                    "type": bin_type_map.get(svc, svc),
+                    "type": bin_type_map.get(svc) if not bin_type_map or svc in bin_type_map else (
+                        # bin_type_map is populated but this key is absent — surface it
+                        (_ for _ in ()).throw(ValueError(f"Unknown bin service key: {svc!r}"))
+                    ) if bin_type_map else svc,

A cleaner approach is a small helper or an explicit check before the loop:

if bin_type_map:
    mapped_type = bin_type_map.get(svc)
    if mapped_type is None:
        raise ValueError(f"Bin service key {svc!r} not found in JS type map")
else:
    mapped_type = svc   # JS block absent entirely — use raw name

Based on learnings: the project prefers explicit failures (raising exceptions on unexpected formats) over silent defaults, to ensure format changes are detected early.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py` at line 142,
Replace the silent fallback in the ChorleyCouncil mapping (currently using
"type": bin_type_map.get(svc, svc)) with an explicit check: if bin_type_map is
non-empty try to get mapped_type = bin_type_map.get(svc) and raise a ValueError
(or log a warning and raise) when mapped_type is None so unexpected JS-format
changes fail fast; if bin_type_map is empty (JS block absent) set mapped_type =
svc and use that for the "type" field. Ensure you update the code paths in the
loop that reference svc to use mapped_type and include the unique symbols
bin_type_map and svc in your changes.

129-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

table.find("tbody") can return None, causing AttributeError.

Python's html.parser does not always auto-insert <tbody>; if the server's HTML omits it, the chained .find_all("tr") call will crash.

🛡️ Proposed fix

-        for row in table.find("tbody").find_all("tr"):
+        tbody = table.find("tbody") or table
+        for row in tbody.find_all("tr"):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py` around lines
129 - 130, The loop in ChorleyCouncil.py uses table.find("tbody").find_all("tr")
which can raise AttributeError if tbody is missing; update the code in the
function/method that iterates over table (look for the for row in
table.find("tbody").find_all("tr") block) to first assign tbody =
table.find("tbody") and then set rows = tbody.find_all("tr") if tbody is not
None else table.find_all("tr"), and iterate over rows so the parser works
whether or not a <tbody> element is present.

uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale copy-paste comment references the wrong council.

Line 1 reads # This script pulls (in one hit) the data from Bromley Council Bins Data but this file is for Calderdale.

✏️ Proposed fix

-# This script pulls (in one hit) the data from Bromley Council Bins Data
+# This script pulls the bin collection data for Calderdale Council

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Update the incorrect top-line comment that mentions "Bromley Council" to
reference Calderdale instead; locate the module/comment at the top of
CalderdaleCouncil.py (the CalderdaleCouncil class/module) and replace the stale
copy-paste comment so it accurately describes that the script pulls Calderdale
council bin data.

uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py (2)

32-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dead/broken else branch produces "None, {postcode}" as the search string.

When house_number is None (absent paon), the condition is falsy and the else branch executes f"{house_number}, {postcode}" where house_number is still None, yielding "None, EC1A 1BB". The comment above even says "Use house_number as full address since it contains the complete address" — so the else clause is never intended to be reached, or it should fall back to just the postcode.
🐛 Proposed fix
-# Use house_number as full address since it contains the complete address
-full_address = house_number if house_number else f"{house_number}, {postcode}"
+if not house_number:
+    raise ValueError("No house number/address (paon) provided.")
+full_address = house_number
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`
around lines 32 - 33, The current full_address assignment in
CotswoldDistrictCouncil.py uses f"{house_number}, {postcode}" when house_number
is falsy which produces "None, {postcode}"; change the logic in the full_address
assignment so that when house_number is present use house_number, otherwise fall
back to just postcode (i.e. set full_address based on house_number if truthy,
else postcode) to avoid embedding the literal "None" into the search string.
223-226: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent empty-data return violates the explicit-failure convention used across this codebase.

When the page yields no results, the scraper returns {"bins": []} silently. Downstream consumers will see an empty collection and have no way to tell whether there genuinely are no upcoming collections or whether the page structure changed. A ValueError/RuntimeError here would surface scraper breakage immediately. Based on learnings, this project prefers explicit failures over silent defaults.
🐛 Proposed fix
-# If no data found, add a debug entry
 if not data["bins"]:
-    print("No bin collection data found. Page source:")
-    print(driver.page_source[:1000])  # Print first 1000 chars for debugging
+    raise ValueError(
+        "No bin collection data found. The page structure may have changed."
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`
around lines 223 - 226, The current silent return when no bins are found
(checking data["bins"]) should be replaced with an explicit error: in the
CotswoldDistrictCouncil scraper (the block that inspects data["bins"] and uses
driver.page_source), raise a RuntimeError or ValueError instead of printing and
returning {"bins": []}; include a short context message and a truncated
page-source snippet (e.g., first 1000 chars) in the exception to aid debugging,
and remove the print statements so the scraper surfaces a clear failure when the
page yields no results.

uk_bin_collection/uk_bin_collection/councils/HyndburnBoroughCouncil.py (1)

132-132: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove debug print(bin_data) from the production code path.

This emits the full parsed bin collection to stdout on every successful invocation.

🐛 Proposed fix

-print(bin_data)
-
 return bin_data

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/HyndburnBoroughCouncil.py` at
line 132, Remove the debug print that emits full parsed bin data to stdout by
deleting the print(bin_data) call in HyndburnBoroughCouncil.py; instead, if
diagnostic information is required use the module's logger (e.g.,
logger.debug(...) or processLogger.debug(...)) inside the same function where
print(bin_data) appears so production runs no longer print sensitive/verbose
data to stdout.

uk_bin_collection/uk_bin_collection/councils/MidUlsterDistrictCouncil.py (1)

101-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Year-rollover bug: collection dates in January will be assigned the wrong year in December.

current_year is appended unconditionally. If a user runs this in late November/December and the next collection falls in January, the resulting date will be dated to the current year, not the following one. Compare with HyndburnBoroughCouncil.py in the same PR which correctly adds 1 to the year when the parsed date is already in the past.

🐛 Proposed fix

-current_year = datetime.now().year
-full_date = f"{date_text} {current_year}"  # e.g., "18 Apr 2025"
-collection_date = datetime.strptime(full_date, "%d %b %Y").strftime(
-    date_format
-)
+current_year = datetime.now().year
+full_date = f"{date_text} {current_year}"
+parsed = datetime.strptime(full_date, "%d %b %Y")
+if parsed.date() < datetime.now().date():
+    parsed = parsed.replace(year=current_year + 1)
+collection_date = parsed.strftime(date_format)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/MidUlsterDistrictCouncil.py`
around lines 101 - 110, The date parsing logic in MidUlsterDistrictCouncil.py
unconditionally appends current_year to date_text (variables current_year,
full_date, collection_date), causing January dates parsed in December to be
assigned the wrong year; change the flow to build a tentative datetime from
date_text + current_year, then if that tentative date is before today (use
datetime.today() or datetime.now().date()), add 1 to current_year before
re-parsing so the final collection_date uses the next year; update the try block
around current_year/full_date/collection_date to compute tentative_date and bump
the year when needed, mirroring the approach used in HyndburnBoroughCouncil.py.

uk_bin_collection/uk_bin_collection/councils/NorthEastDerbyshireDistrictCouncil.py (1)

76-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove debug print statements from the hot path.

Lines 76, 100, 117, and 118 emit parsing progress and the full parsed result to stdout on every successful call. These should be removed (or at minimum guarded behind a logging.debug call) before merging.

🐛 Proposed fix

-print("Parsing HTML content...")

-print(f"Found collection: {collection_date} - {bin_types}")

-print(f"Found {len(data['bins'])} collections")
-print(f"Final data: {data}")

Also applies to: 100-100, 117-118

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/NorthEastDerbyshireDistrictCouncil.py`
at line 76, Remove the debug print statements that write parsing
progress/results to stdout in the NorthEastDerbyshireDistrictCouncil class;
replace them with logging.debug calls (or remove entirely) so the hot path
doesn’t print to stdout. Locate the prints (e.g., "Parsing HTML content..." and
the subsequent prints of the parsed result) inside the HTML parsing method(s) of
NorthEastDerbyshireDistrictCouncil, import Python’s logging if not present, and
change each print(...) to logging.debug(...) with a short descriptive message
and the variable (if needed) or drop the output entirely.

uk_bin_collection/uk_bin_collection/councils/PeterboroughCityCouncil.py (1)

108-154: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when the new panel selector stops matching.

If waste-service-grid matches nothing, or one matched panel no longer contains the expected heading/date rows, this currently falls through as a successful scrape with partial or empty data. Add an explicit selector check and let malformed panels raise instead of print+continue.

🔎 Suggested tightening

             # Each bin section is within a waste-service-grid div
             collection_panels = soup.find_all("div", class_="waste-service-grid")
+            if not collection_panels:
+                raise ValueError("Expected waste-service-grid collection panels")

             for panel in collection_panels:
-                try:
-                    # Bin type
-                    bin_type_tag = panel.find("h3", class_="waste-service-name")
-                    if not bin_type_tag:
-                        continue
-                    bin_type = bin_type_tag.get_text(strip=True)
+                # Bin type
+                bin_type_tag = panel.find("h3", class_="waste-service-name")
+                if not bin_type_tag:
+                    raise ValueError("Expected waste-service-name heading in collection panel")
+                bin_type = bin_type_tag.get_text(strip=True)

-                    # Get 'Next collection' date
-                    rows = panel.find_all("div", class_="govuk-summary-list__row")
-                    next_collection = None
-                    for row in rows:
-                        key = row.find("dt", class_="govuk-summary-list__key")
-                        value = row.find("dd", class_="govuk-summary-list__value")
-                        if key and value and "Next collection" in key.get_text():
-                            raw_date = " ".join(value.get_text().split())
+                # Get 'Next collection' date
+                rows = panel.find_all("div", class_="govuk-summary-list__row")
+                next_collection = None
+                for row in rows:
+                    key = row.find("dt", class_="govuk-summary-list__key")
+                    value = row.find("dd", class_="govuk-summary-list__value")
+                    if key and value and "Next collection" in key.get_text():
+                        raw_date = " ".join(value.get_text().split())

-                            cleaned_date = re.sub(
-                                r"(\d{1,2})(st|nd|rd|th)", r"\1", raw_date
-                            )
-                            next_collection = cleaned_date
-                            break
+                        cleaned_date = re.sub(
+                            r"(\d{1,2})(st|nd|rd|th)", r"\1", raw_date
+                        )
+                        next_collection = cleaned_date
+                        break

-                    if not next_collection:
-                        continue
+                if not next_collection:
+                    raise ValueError(f"Expected 'Next collection' row for {bin_type}")

-                    print(f"Found next collection for {bin_type}: '{next_collection}'")
+                print(f"Found next collection for {bin_type}: '{next_collection}'")

-                    parsed_date = datetime.strptime(next_collection, input_date_format)
-                    formatted_date = parsed_date.strftime(output_date_format)
+                parsed_date = datetime.strptime(next_collection, input_date_format)
+                formatted_date = parsed_date.strftime(output_date_format)

-                    data["bins"].append(
-                        {
-                            "type": bin_type,
-                            "collectionDate": formatted_date,
-                        }
-                    )
-
-                except Exception as e:
-                    print(
-                        f"Error processing panel for bin '{bin_type if 'bin_type' in locals() else 'unknown'}': {e}"
-                    )
+                data["bins"].append(
+                    {
+                        "type": bin_type,
+                        "collectionDate": formatted_date,
+                    }
+                )

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures over silent parsing fallbacks so council format changes are detected early.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/PeterboroughCityCouncil.py`
around lines 108 - 154, The parser currently silently skips when
collection_panels or individual panels lack expected elements; change this to
fail fast: after computing collection_panels in the PeterboroughCityCouncil
scraper, if collection_panels is empty raise a ValueError (include context like
"no waste-service-grid panels found"); inside the for panel loop, if
bin_type_tag is missing or next_collection remains None, raise a ValueError
indicating the malformed panel (mention the panel or bin_type if available)
instead of continue/print; replace the broad try/except that swallows errors
(the except block around the panel processing) with either no catch or re-raise
the exception after logging so malformed HTML surfaces quickly (references:
collection_panels, bin_type_tag, rows, next_collection and the except block).

uk_bin_collection/tests/input.json (1)

853-858: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh these wiki notes to the new real-address inputs.

The examples now use actual house_number/postcode values, but the notes still tell users to pass weekday/week or day/round placeholders. That will steer CLI users to the old calling convention.

Also applies to: 2517-2523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/tests/input.json` around lines 853 - 858, The wiki_note
text still instructs users to pass DAY/WEEK placeholders even though this record
now uses real address fields (house_number and postcode); update the "wiki_note"
value to describe the new real-address input format (e.g., state that
house_number should contain the house number and postcode the postcode, and
remove any guidance about passing weekday/week placeholders), and make the same
change for the other matching records that currently contain the placeholder
guidance (the entries with the same "wiki_note" pattern).

🟡 Minor comments (12)

uk_bin_collection/uk_bin_collection/councils/EppingForestDistrictCouncil.py-56-65 (1)
56-65: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Raise on unparseable dates instead of silently continuing.

except ValueError: continue silently drops a bin type if its date string has an unexpected format. This means API-side format changes go undetected at runtime — the bin entry simply disappears from output with no error. Based on learnings, the project convention is to prefer explicit failures so format changes surface immediately rather than going unnoticed.
🐛 Proposed fix
-                try:
-                    parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y")
+                parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y")
                    data["bins"].append(
                        {
                            "type": bin_type,
                            "collectionDate": parsed.strftime(date_format),
                        }
                    )
-                except ValueError:
-                    continue
Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, so format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EppingForestDistrictCouncil.py`
around lines 56 - 65, The code in EppingForestDistrictCouncil where parsed =
datetime.strptime(date_str.strip(), "%d/%m/%Y") silently swallows ValueError and
continues; change this to raise an explicit exception (or re-raise with a clear
message) instead of continue so parse failures fail fast; update the exception
handling in the method (the block that builds data["bins"] with "type": bin_type
and "collectionDate") to raise a ValueError that includes the offending
date_str, the expected format "%d/%m/%Y", and the bin_type to make debugging and
upstream detection immediate.
uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py-140-146 (1)
140-146: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

XPath breaks if user_paon contains a single quote.

The user_paon value is directly interpolated into the XPath string. If the address contains an apostrophe (e.g., "St John's Close"), the XPath will be malformed and raise an exception or match nothing.

Use concat() or escape quotes to handle this edge case:
🛠️ Proposed fix using XPath escaping
-                desired_option = WebDriverWait(driver, 10).until(
-                    EC.element_to_be_clickable((
-                        By.XPATH,
-                        f"//li[contains(`@class`, 'active-result') and contains(., '{user_paon}')]"
-                    ))
-                )
+                # Escape single quotes in user_paon for XPath
+                if "'" in user_paon:
+                    # Use concat() to handle quotes: concat("St John", "'", "s Close")
+                    escaped_paon = "concat('" + user_paon.replace("'", "',\"'\",'" ) + "')"
+                    xpath_expr = f"//li[contains(`@class`, 'active-result') and contains(., {escaped_paon})]"
+                else:
+                    xpath_expr = f"//li[contains(`@class`, 'active-result') and contains(., '{user_paon}')]"
+                desired_option = WebDriverWait(driver, 10).until(
+                    EC.element_to_be_clickable((By.XPATH, xpath_expr))
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py` around
lines 140 - 146, The XPath used to locate the PAON option interpolates user_paon
directly and will break if user_paon contains a single quote; update the locator
construction in the block that uses WebDriverWait / desired_option / driver to
escape quotes or build a safe XPath (e.g., use XPath concat() to join pieces
around any single quotes or alternatively switch to a locator that matches text
via normalize-space() combined with translate() or use a CSS selector). Ensure
the code that forms f"//li...contains(., '{user_paon}')" is replaced with a
function that returns a quoted-safe XPath string before passing it to
WebDriverWait and keep the click on desired_option the same.
uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py-36-58 (1)
36-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unrecognized dict schema silently returns empty bins.

The elif isinstance(json_data, dict) branch matches any dict that isn't the legacy format — including a completely unexpected API response. If none of the type_map keys (refuse, recycling, garden, food) are present, the function returns {"bins": []} with no diagnostic, making it indistinguishable from a valid "no collections due" response.

Consider adding a guard after the loop (or at entry to the branch) to raise when the response matches neither expected schema:
💡 Proposed fix
         elif isinstance(json_data, dict):
+            if not any(k in json_data for k in type_map):
+                raise ValueError(
+                    f"Unrecognised API response format; no expected bin-type keys found: {list(json_data.keys())}"
+                )
             for key, display_name in type_map.items():
Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py`
around lines 36 - 58, The branch handling elif isinstance(json_data, dict)
silently returns an empty data["bins"] when the dict doesn't contain any
expected keys; update this branch (using json_data and the existing type_map) to
detect whether any of the type_map keys were present/processed (e.g., track a
matched flag or count while iterating), and if none were matched raise a clear
exception (or log and raise) describing "unexpected API schema" and include the
raw json_data or its keys so callers can surface the error instead of treating
it as "no collections".
uk_bin_collection/uk_bin_collection/councils/MidAndEastAntrimBoroughCouncil.py-35-36 (1)
35-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

UA Chrome version is 10 major releases behind current stable (Chrome 148)

Chrome 148 is the current stable version, released May 5, 2026. The hardcoded string Chrome/138.0.0.0 is ~10 versions stale. While most council sites don't enforce strict UA version checks, bot-detection middleware increasingly flags very outdated Chrome versions. Easy to bump.
🔧 Proposed fix
-            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/MidAndEastAntrimBoroughCouncil.py`
around lines 35 - 36, The user-agent string in MidAndEastAntrimBoroughCouncil.py
is hardcoded to an outdated Chrome version; update the user_agent variable used
in the create_webdriver(web_driver, headless, user_agent, __name__) call to a
current Chrome stable string (e.g. replace "Chrome/138.0.0.0" with
"Chrome/148.0.0.0") or make it configurable via a constant/setting so future
bumps are easy (update the user_agent assignment and any related constant used
by create_webdriver).
uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py-34-35 (1)
34-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

User-agent version Chrome/138 is already ~10 versions behind the current stable release.

Chrome 148 is the current stable version as of May 5, 2026. The hardcoded Chrome/138.0.0.0 string is 10 major versions stale, which could trigger bot-detection heuristics on sites that check for suspiciously outdated browser signatures.
🔧 Suggested update
-            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py`
around lines 34 - 35, The hardcoded user_agent string in
BroxtoweBoroughCouncil.py (variable user_agent) uses Chrome/138 which is
outdated; update it to a current stable Chrome version (e.g., Chrome/148.0.0.0)
or replace the hardcoded string with a dynamically generated/maintained value
(for example reading from a CHROME_MAJOR_VERSION constant or using a user-agent
helper) and pass that to create_webdriver(web_driver, headless, user_agent,
__name__); ensure you update the user_agent variable rather than only changing
the create_webdriver call so tests and other modules receive the corrected UA.
uk_bin_collection/uk_bin_collection/councils/BarnetCouncil.py-93-97 (1)
93-97: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fallback PAON substring match can select the wrong address for short numeric house numbers.

The fallback at line 94 uses paon_str in option.text.upper() which is a plain substring check. If user_paon is "1", it will match option texts like "10 HIGH STREET", "11 HIGH STREET", "21 ...", etc. and select the first such match, silently returning bin data for the wrong property.

A safer fallback is to match against word boundaries, or at minimum require a space or comma immediately after the PAON:
🛡️ Proposed fix
             if not selected:
                 # Fallback: select first option containing the paon
                 for option in dropdown.options:
-                    if paon_str in option.text.upper():
+                    option_text_fb = option.text.upper().strip()
+                    if (
+                        option_text_fb.startswith(paon_str + " ")
+                        or option_text_fb.startswith(paon_str + ",")
+                        or f", {paon_str} " in option_text_fb
+                        or f", {paon_str}," in option_text_fb
+                    ):
                         dropdown.select_by_value(option.get_attribute("value"))
                         selected = True
                         break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BarnetCouncil.py` around lines
93 - 97, The current fallback substring check "paon_str in option.text.upper()"
(used in the loop over dropdown.options, setting selected via
dropdown.select_by_value(...)) can falsely match short numeric PAONs like "1"
against "10 HIGH STREET"; replace this with a word-boundary-aware match: build a
regex from paon_str (escape it) and test option.text.upper() against r'\b' +
PAON + r'(?:\b|[ ,])' (or split option.text into tokens and compare whole-token
equality) so only whole PAON matches (or followed by space/comma) are accepted;
ensure you import re and keep setting selected and breaking only on a true
whole-word match.
uk_bin_collection/uk_bin_collection/councils/Hillingdon.py-112-112 (1)
112-112: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Chrome/138.0.0.0 in the user-agent is ~10–11 months stale

Chrome 138 was in beta as of 28 May, 2025, and ChromeOS LTS 144 was released on April 21, 2026, meaning the current stable channel is around Chrome 144–145. Presenting an outdated version may increase bot-detection risk on the target site.
🔧 Suggested update
-            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` at line 112, The
user_agent string in Hillingdon.py is stale (Chrome/138.0.0.0); update the
user_agent variable to a current, realistic Chrome stable identifier (e.g.,
Chrome/145.0.0.0 or use a more generic recent UA) to reduce bot-detection risk;
locate the user_agent assignment in the Hillingdon class or module and replace
the hard-coded value accordingly (or implement a small helper to generate/rotate
a recent UA if preferred).
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py-105-108 (1)
105-108: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent None return for invalid-but-supplied paon produces a misleading error

When a caller provides a non-None paon that fails the digit/length test (e.g., a regular house number like "42" or a 9-digit string), _pid_from_paon returns None. Combined with no URL-based PID, the caller then hits:
ValueError: Richmond: supply a URL with ?pid=... OR put PID in the House Number field.
This is confusing because the user did supply something in the House Number field. Per the project's preference for explicit failures, an out-of-range-but-non-empty paon should raise a descriptive error rather than silently fall through.
🛠️ Proposed fix
     def _pid_from_paon(self, paon):
-        if paon and str(paon).isdigit() and 10 <= len(str(paon)) <= 14:
-            return str(paon)
-        return None
+        if not paon:
+            return None
+        s = str(paon)
+        if s.isdigit() and 10 <= len(s) <= 14:
+            return s
+        raise ValueError(
+            f"Richmond: House Number '{s}' does not look like a valid PID "
+            "(expected a 10–14 digit UPRN). Supply a URL with ?pid=... instead."
+        )
Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, so format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 105 - 108, The function _pid_from_paon currently returns None
silently when paon is provided but doesn't meet the digit/length check; instead,
change its behavior to raise a ValueError with a clear message (e.g., "Richmond:
invalid PID in House Number; expected 10–14 digit PID") whenever paon is
non-empty but fails the str(paon).isdigit() or length check so callers get an
explicit error; keep returning the string when valid and only return None when
paon is truly empty/None.
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py-25-29 (1)
25-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

"pid=" in url substring check can give a false positive

If base_url contains any parameter whose name happens to include pid= as a substring (e.g., &upid=..., &notpid=...), the condition evaluates to False, so the correctly-resolved pid is never appended to the URL, and the request goes out with the wrong query string.

parse_qs is already imported and used in _pid_from_url — use it here for a consistent, correct check:
🛠️ Proposed fix
-        if "pid=" not in (base_url or ""):
+        if not parse_qs(urlparse(base_url or "").query).get("pid"):
             sep = "&" if "?" in (base_url or "") else "?"
             target_url = f"{base_url}{sep}pid={pid}"
         else:
             target_url = base_url
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 25 - 29, The substring check "pid=" on base_url can give false
positives; instead parse the URL query and check for a real pid parameter. In
the block that sets target_url (referencing variables base_url, pid, target_url
in LondonBoroughOfRichmondUponThames.py), call urlparse(base_url).query and then
parse_qs(...) to get params, test whether 'pid' is a key in that dict, and only
append "?pid=..." or "&pid=..." when 'pid' is absent; otherwise leave target_url
as base_url. Use the same parse_qs usage pattern as in the existing
_pid_from_url helper to keep behavior consistent.
uk_bin_collection/uk_bin_collection/councils/GreatYarmouthBoroughCouncil.py-109-111 (1)
109-111: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Year-rollover causes upcoming Jan–Mar dates to be silently dropped when scraped in Oct–Dec.

With datetime.today().year applied unconditionally, any date the council returns for January/February/March (which appear in the schedule during Oct/Nov/Dec of the preceding year) is parsed with the current year, immediately compared as a past date, and skipped via continue. Users scraping in November or December would silently receive no data for their next collections.

NorthumberlandCouncil.py in the same PR applies an explicit rollover guard. Apply the same fix here:
🐛 Proposed fix — mirror the NorthumberlandCouncil year-rollover guard
+                today = datetime.today()
+                current_year = today.year
+                current_month = today.month
                 try:
                     # Parse the date text
-                    date_obj = datetime.strptime(date_text + " " + str(datetime.today().year), "%A %d %B %Y")
-                    if date_obj.date() < datetime.today().date():
+                    date_obj = datetime.strptime(date_text + " " + str(current_year), "%A %d %B %Y")
+                    # If Oct–Dec and collection falls in Jan–Mar, it belongs to next year
+                    if current_month >= 10 and date_obj.month in [1, 2, 3]:
+                        date_obj = date_obj.replace(year=current_year + 1)
+                    if date_obj.date() < today.date():
                         continue  # Skip past dates
                 except ValueError:
                     continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/GreatYarmouthBoroughCouncil.py`
around lines 109 - 111, The date parsing always appends datetime.today().year
which causes Jan–Mar dates scraped in Oct–Dec to be treated as past and skipped;
update the logic around the date parsing in GreatYarmouthBoroughCouncil.py
(where date_obj is created from date_text) to apply a year-rollover guard like
NorthumberlandCouncil.py: parse the month from date_text (or a temp date with
the current year), and if that month is less than the current month, add 1 to
the year before constructing date_obj so Jan–Mar entries scraped late in the
year are assigned the next year instead of being dropped.
uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py-15-15 (1)
15-15: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include the full alphabet in LETTERS.

X and Z are missing here, so streets starting with those letters fall back to the A page and cannot resolve correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py` at line 15,
The LETTERS constant in ThurrockCouncil.py is missing 'X' and 'Z', causing
streets starting with those letters to map incorrectly; update the LETTERS
variable (used in the ThurrockCouncil class/file) to include the full alphabet
(A–Z) so all initial letters are handled rather than the truncated
"ABCDEFGHIJKLMNOPQRSTUVWY".
uk_bin_collection/tests/generate_map_test_results.py-80-81 (1)
80-81: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the usage string.

The help text still references generate_test_results.py, but this script is generate_map_test_results.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/tests/generate_map_test_results.py` around lines 80 - 81,
Update the usage/help string printed in the startup argument check so it
references the correct script name; in the block that tests len(sys.argv) < 2
(the startup argument validation), replace "Usage: python
generate_test_results.py <junit.xml path> [--detailed]" with "Usage: python
generate_map_test_results.py <junit.xml path> [--detailed]" (or an equivalent
accurate invocation string).

coderabbitai · 2026-05-01T20:35:18Z

        # Create the Selenium WebDriver
-        driver = create_webdriver(web_driver, headless, None, __name__)
+        user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+            driver = create_webdriver(web_driver, headless, user_agent, __name__)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)" echo "Inspecting: $file" sed -n '28,35p' "$file" python - <<'PY' from pathlib import Path import ast path = next(Path('.').rglob('ThanetDistrictCouncil.py')) source = path.read_text() try: ast.parse(source) print("AST parse succeeded") except SyntaxError as exc: print(f"SyntaxError: {exc.msg} at line {exc.lineno}, column {exc.offset}") raise PY

Repository: robbrad/UKBinCollectionData

Length of output: 941

🏁 Script executed:

#!/bin/bash # Check broader context of the problematic section file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)" echo "=== Full context around the error (lines 20-55) ===" sed -n '20,55p' "$file"

Repository: robbrad/UKBinCollectionData

Length of output: 1588

🏁 Script executed:

#!/bin/bash # Check for the finally/except blocks file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)" echo "=== Looking for try/except/finally structure (lines 34-100) ===" sed -n '34,100p' "$file"

Repository: robbrad/UKBinCollectionData

Length of output: 1743

Fix the mis-indented WebDriver assignment.

Line 32 is indented one level deeper than the surrounding block, causing an IndentationError: unexpected indent that prevents the module from parsing. Move the driver assignment inside the try block at the correct indentation level, initialize driver = None before the try, and guard the finally cleanup with an if driver: check.

🐛 Proposed fix

def parse_data(self, page: str, **kwargs) -> dict: user_uprn = kwargs.get("uprn") check_uprn(user_uprn) bindata = {"bins": []} + driver = None url = f"https://www.thanet.gov.uk/wp-content/mu-plugins/collection-day/incl/mu-collection-day-calls.php?pAddress={user_uprn}" web_driver = kwargs.get("web_driver") headless = kwargs.get("headless") - # Create the Selenium WebDriver - user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" - driver = create_webdriver(web_driver, headless, user_agent, __name__) - try: + # Create the Selenium WebDriver + user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" + driver = create_webdriver(web_driver, headless, user_agent, __name__) print(f"Navigating to URL: {url}") driver.get(url)

finally: print("Cleaning up WebDriver...") - driver.quit() + if driver: + driver.quit()

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 32-32: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/ThanetDistrictCouncil.py` around lines 30 - 32, The driver assignment is mis-indented causing a parse error; initialize driver = None before the try block, move the call to create_webdriver(web_driver, headless, user_agent, __name__) into the try at the same indentation level as the rest of the block (assigning to driver), and in the finally section only call driver.quit() or similar if driver is truthy (if driver:) to avoid errors when construction fails; the symbols to adjust are the driver variable, the create_webdriver(...) call, and the surrounding try/finally that performs cleanup.

coderabbitai · 2026-05-01T20:35:18Z

+                "(KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36"
+            )
+        }
+        response = requests.get(api_url, headers=headers)


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add a timeout to requests.get — blocking call with no deadline.

Without a timeout this call can hang the worker thread indefinitely if West Suffolk's IIS is slow or unresponsive, which can exhaust the thread pool in a multi-scraper run.

🔧 Proposed fix

- response = requests.get(api_url, headers=headers) + response = requests.get(api_url, headers=headers, timeout=60)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

response = requests.get(api_url, headers=headers)

response = requests.get(api_url, headers=headers, timeout=60)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` at line 26, The requests.get call using api_url and headers in WestSuffolkCouncil should include a timeout to avoid blocking indefinitely; update the requests.get(api_url, headers=headers) invocation to pass a sensible timeout (e.g. timeout=10 or a configurable constant) and ensure the surrounding code handles requests.exceptions.Timeout and other requests.RequestException errors so the scraper fails fast and logs the error instead of hanging the worker thread.

…ndentation errors from PR #1947 merge

robbrad and others added 30 commits March 28, 2026 10:49

fix: BaberghDistrictCouncil, MidSuffolkDistrictCouncil, RotherDistric…

2f89eaf

…tCouncil, WirralCouncil - date parsing, null handling, input.json config

chore: improve map with detailed error info, update test results report

ba8c9de

chore: update integration test results with full diagnosis of all 50 …

d904a91

…failures

feat: include food waste collections for Southampton City Council

3fb8b1e

docs: add Food Waste to docstring waste type list

d7a4d31

Signed-off-by: Tim Collins <tim@thecollins.team>

fix: BlabyDistrictCouncil - add User-Agent header to prevent 403

92469e1

fix: BrentCouncil - add User-Agent header

6645510

fix: SwaleBoroughCouncil - add user_agent to webdriver

48913a4

fix: SouthHamsDistrictCouncil - init fcc_session_token before try block

f2e29d6

fix: OxfordCityCouncil - handle multiple editor divs

95deba7

fix: ReigateAndBansteadBoroughCouncil - select populated span instead…

a52294b

… of first

fix: ShropshireCouncil - use full link text for bin type

d5d108f

fix: SalfordCityCouncil - use lxml parser for malformed HTML

b6aef4d

fix: RushmoorCouncil - handle raw JSON instead of HTML-wrapped response

6c897b7

fix: add timeout and raise_for_status to request

674a3da

Debug PlymouthCouncil.py

5b707ec

Update manifest.json

1648b2f

Add second request for garden bin

b79e39c

Debugging second request

e32a67f

Updated to get a collectivekey first

598abc4

Introduce fallback

a0d5df4

Revert "Update manifest.json"

e3759c9

This reverts commit 1648b2f.

fix: CrawleyBoroughCouncil - add XML response parsing fallback

17f2987

fix: NewhamCouncil - updated card selectors, nbsp stripping, date for…

1394e57

…mat fallback

fix: PeterboroughCityCouncil - pass user_agent and update container s…

6195377

…elector

fix: HorshamDistrictCouncil - user_agent, postcode ID selector, JS cl…

9292cbb

…ick on submit

Update server.py

12890e6

Fix double-encoded JSON output in API server

fix: NeathPortTalbotCouncil - user_agent and JS click for overlay bypass

5970d6c

fix: BexleyCouncil - replace Selenium with requests (WasteWorks HTML)

0be02ac

fix: StirlingCouncil - replace Selenium with Recollect API

44242f1

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

fix: ThanetDistrictCouncil, StaffordshireMoorlandsDistrictCouncil - i…

ea4cd4c

…ndentation errors from PR #1947 merge

robbrad merged commit fe0a245 into master May 1, 2026
7 of 11 checks passed

robbrad deleted the may-2026-release branch May 1, 2026 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

May 2026 Release#1992

May 2026 Release#1992
robbrad merged 144 commits into
masterfrom
may-2026-release

robbrad commented May 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	response = requests.get(api_url, headers=headers)
	response = requests.get(api_url, headers=headers, timeout=60)

Conversation

robbrad commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

May 2026 Release — 60 PRs Consolidated

Council Fixes (53 PRs)

Dependencies (4 PRs)

Merge Conflicts Resolved (7)

Issues Resolved (20)

Code Review Notes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

robbrad commented May 1, 2026 •

edited

Loading