May 2026 Release#1992
Conversation
…tCouncil, WirralCouncil - date parsing, null handling, input.json config
Signed-off-by: Tim Collins <tim@thecollins.team>
This reverts commit 1648b2f.
Fix double-encoded JSON output in API server
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (19)
uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py (1)
52-52:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove unused
headersvariable.The
headersdictionary is defined but never used anywhere in the function. This is dead code.🧹 Proposed fix
- headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BrightonandHoveCityCouncil.py` at line 52, The file defines an unused variable headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"} in the BrightonandHoveCityCouncil code; remove that dead assignment (search for the symbol headers in the BrightonandHoveCityCouncil module or the surrounding function where it appears) so the unused variable is deleted, and run tests/lint to ensure no other references remain; if the User-Agent was intended for HTTP requests instead, replace the direct requests call to pass headers rather than keeping the unused variable.uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py (1)
27-34:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent
ValueErrorsuppression hides API format regressions.Both the legacy and new-format branches swallow
ValueErrorwithcontinue, silently dropping bins whose dates can't be parsed. Per the project's established convention, parsers should raise explicitly so format changes are caught immediately rather than producing a quietly empty result.💡 Proposed fix — re-raise instead of silently skipping
- except ValueError: - continue + except ValueError as e: + raise ValueError( + f"Unexpected date format in legacy response for '{bin_type}': {date_str!r}" + ) from e- except ValueError: - continue + except ValueError as e: + raise ValueError( + f"Unexpected nextCollectionDate format for '{key}': {date_str!r}" + ) from eBased on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors … ensures format changes are detected early."
Also applies to: 50-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py` around lines 27 - 34, The ValueError from datetime.strptime in TewkesburyBoroughCouncil.py is being silently swallowed (the try/except uses continue), so change the handlers in both parsing branches (the new-format block around date_str -> datetime.strptime and the legacy-format block around lines ~50-58) to re-raise the exception (or remove the except) instead of continuing; ensure the error includes context (e.g., the offending date_str and bin_type) so unexpected API format changes surface immediately rather than silently dropping bins.uk_bin_collection/uk_bin_collection/councils/NorthDevonCountyCouncil.py (1)
130-132:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUnchecked
find()return values will raiseAttributeErroron missing spans, aborting the entire parse.All three
find(...)calls are outside thetry/exceptblock (which starts at line 135), so any missing span propagates to the outer handler and raises, producing an empty result. Council sites frequently change their HTML structure.🛡️ Proposed fix — guard each span lookup
- bin_type = li.find("span", class_="wasteType").text.strip() - day = li.find("span", class_="wasteDay").text.strip() - weekday = li.find("span", class_="wasteName").text.strip() + waste_type_span = li.find("span", class_="wasteType") + waste_day_span = li.find("span", class_="wasteDay") + if not waste_type_span or not waste_day_span: + continue + bin_type = waste_type_span.text.strip() + day = waste_day_span.text.strip() + weekday_span = li.find("span", class_="wasteName") + weekday = weekday_span.text.strip() if weekday_span else ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/NorthDevonCountyCouncil.py` around lines 130 - 132, The three span lookups that set bin_type, day and weekday in NorthDevonCountyCouncil.py currently call .text on the result of li.find(...) and can raise AttributeError if a span is missing; wrap each lookup in a safe guard (e.g., assign the result of li.find("span", class_="wasteType") to a temp, check for None, then use .get_text(strip=True) or a default like ""), or move them inside the existing try/except block and check for None before accessing .text so bin_type, day and weekday are always set to a safe default rather than raising.uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py (1)
84-87:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
cells[0]andcells[3]are accessed without a bounds check — rows with fewer<td>elements raiseIndexError.
cells[0]on line 85 is unconditional. Thecells[3]in theifcondition on line 87 looks like a guard, but Python list indexing raisesIndexErrorwhen the index is out of range rather than returning a falsy value, so it provides no protection for short rows (e.g. a header<tr>containing only<th>elements, or an empty row).🛡️ Proposed fix
for row in bins_table.find_all("tr"): cells = row.find_all("td") - bin_type = cells[0].get_text(strip=True) - # Skip header row - if bin_type and cells[3] and bin_type != "Bin Type": + if len(cells) < 4: + continue + bin_type = cells[0].get_text(strip=True) + # Skip header row + if bin_type and bin_type != "Bin Type":🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py` around lines 84 - 87, The code reads cells = row.find_all("td") and then accesses cells[0] and cells[3] without ensuring the list has enough elements, which can raise IndexError; update the loop in BroxtoweBoroughCouncil (the block using row, cells and bin_type) to first check len(cells) >= 4 before accessing cells[0] or cells[3] and skip the row if it’s shorter (or treat header rows separately), and adjust the bin_type extraction/condition to only run when the length check passes so short/empty/header rows are safely ignored.uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py (1)
20-28:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBackwards-compatible
urlfallback is blocked by the upfront UPRN validation.If a caller only passes
url,check_uprn(user_uprn)raises before Lines 24-26 execute, so the fallback path is unreachable in practice.Suggested fix
try: user_uprn = kwargs.get("uprn") - check_uprn(user_uprn) - url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}" - if not user_uprn: - # Backwards compatibility for users who stored a URL. - url = kwargs.get("url") + if user_uprn: + check_uprn(user_uprn) + url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}" + else: + # Backwards compatibility for users who stored a URL. + url = kwargs.get("url") + if not url: + raise ValueError("Either 'uprn' or 'url' must be provided") except Exception as e: raise ValueError(f"Error getting identifier: {str(e)}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py` around lines 20 - 28, The current logic calls check_uprn(user_uprn) before falling back to kwargs.get("url"), so callers that provide only url never reach the fallback; fix by first assigning user_uprn = kwargs.get("uprn") and url = kwargs.get("url") (or assign url after fallback) and only call check_uprn(user_uprn) if user_uprn is truthy, otherwise keep the provided url; update the block around check_uprn, user_uprn and url to perform conditional validation so the backwards-compatible url path is reachable.uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py (1)
20-28:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBackwards-compatible
urlfallback never actually runs.
check_uprn(user_uprn)executes before Lines 24-26, so callers that only provideurlstill fail validation and never reach the fallback branch. That breaks the compatibility path this change is trying to preserve.Suggested fix
try: user_uprn = kwargs.get("uprn") - check_uprn(user_uprn) - url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}" - if not user_uprn: - # Backwards compatibility for users who stored a URL. - url = kwargs.get("url") + if user_uprn: + check_uprn(user_uprn) + url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}" + else: + # Backwards compatibility for users who stored a URL. + url = kwargs.get("url") + if not url: + raise ValueError("Either 'uprn' or 'url' must be provided") except Exception as e: raise ValueError(f"Error getting identifier: {str(e)}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py` around lines 20 - 28, The fallback URL branch never runs because check_uprn(user_uprn) is called before testing for a missing uprn; modify the logic in the handler that uses user_uprn, check_uprn and url so you first read user_uprn = kwargs.get("uprn") and if user_uprn is truthy call check_uprn(user_uprn) and build url = f"https://environmentfirst.co.uk/house.php?uprn={user_uprn}", otherwise set url = kwargs.get("url") for backward compatibility; keep the existing exception handling (raising ValueError with the error string) unchanged and reference the symbols user_uprn, check_uprn and url when making the change.uk_bin_collection/uk_bin_collection/councils/SwaleBoroughCouncil.py (1)
99-106:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReplace the
"No future date found"sentinel with an explicitNone-guard.If
future_collection_date_tagisNonebutfuture_binsis non-empty,parse_collection_date("No future date found")produces a confusingValueError: time data 'No future date found' does not match format '%A, %d %B'rather than a clear signal that the date element itself was absent. Per the project's explicit-failure convention, the cause should be surfaced directly.🛡️ Proposed fix
-future_collection_date = ( - future_collection_date_tag.text.split("starting from")[-1].strip() - if future_collection_date_tag - else "No future date found" -) +future_collection_date = ( + future_collection_date_tag.text.split("starting from")[-1].strip() + if future_collection_date_tag + else None +)Then inside the loop:
for bin in future_bins: + if future_collection_date is None: + raise ValueError( + "Future bins were found but no 'starting from' date element is present on the page" + ) collection_date = parse_collection_date(future_collection_date)Based on learnings: in
uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors; use clear exception types and document error causes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/SwaleBoroughCouncil.py` around lines 99 - 106, The code currently assigns "No future date found" to future_collection_date which causes parse_collection_date to raise an unclear ValueError; change the assignment so future_collection_date is None when future_collection_date_tag is missing (i.e., future_collection_date = future_collection_date_tag.text... if future_collection_date_tag else None) and then inside the loop that iterates future_bins, check if future_collection_date is None and, if so, raise a clear exception (e.g., ValueError with a message referencing future_collection_date_tag/future_bins) before calling parse_collection_date so the absence of the element is surfaced explicitly.uk_bin_collection/uk_bin_collection/councils/Hillingdon.py (2)
266-268: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winSilent
except Exception: continuemasks page-format changesIf the council changes its HTML structure, the selector/parse errors here are printed and swallowed, returning a silently incomplete
data["bins"]with no signal to operators. The same pattern exists inget_bank_holiday_changes(Line 88–90 and Line 92–93).Prefer re-raising (or at minimum propagating a recognisable exception) so that format regressions are caught immediately rather than producing quietly wrong data. Based on learnings, this project explicitly favours explicit failures over silent defaults or swallowed errors.
♻️ Suggested refactor
- except Exception as e: - print(f"Error processing item: {e}") - continue + except Exception as e: + raise ValueError( + f"Failed to parse bin row; page format may have changed: {e}" + ) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` around lines 266 - 268, The except block in the item-processing loop (and the similar handlers in get_bank_holiday_changes) currently swallows all exceptions with print(...) and continue, hiding HTML/selector regressions; change these handlers to re-raise the caught exception (or raise a custom descriptive exception) instead of continuing so failures surface—locate the try/except in the function that populates data["bins"] (and the try/except blocks inside get_bank_holiday_changes) and replace the except Exception as e: print(...); continue with either: raise or raise RuntimeError(f"Hillingdon parser error processing item: {e}") to propagate a clear error to callers/operators.
235-268:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDuplicate bin entries: hardcoded fallback and scraped rows are both unconditionally appended
Lines 235–242 always append three hardcoded entries (
General Waste,Recycling,Food Waste) before lines 244–268 parse and also append the actual scrapeddiv.bin--rowentries. When the HTML scrape succeeds,data["bins"]ends up with each bin type twice — once with the day-of-week–computed date and once with the date parsed from the page — which will produce incorrect output for any downstream consumer.Either remove the hardcoded block (let the scraper stand alone) or guard it as a genuine fallback that fires only when the
div.bin--rowparse yields nothing:🐛 Proposed fix
- # Add collection dates for each bin type - bin_types = ["General Waste", "Recycling", "Food Waste"] - for bin_type in bin_types: - data["bins"].append( - { - "type": bin_type, - "collectionDate": next_collection.strftime("%d/%m/%Y"), - } - ) - # Process collection details bin_rows = soup.select("div.bin--row:not(:first-child)") for row in bin_rows: try: bin_type = row.select_one("div.col-md-3").text.strip() collection_dates_div = row.select("div.col-md-3")[1] next_collection_text = "".join( collection_dates_div.find_all(text=True, recursive=False) ).strip() cleaned_date_text = remove_ordinal_indicator_from_date_string( next_collection_text ) parsed_date = parse(cleaned_date_text, fuzzy=True) bin_date = parsed_date.strftime("%d/%m/%Y") if bin_type and bin_date: data["bins"].append( { "type": bin_type, "collectionDate": bin_date, } ) except Exception as e: print(f"Error processing item: {e}") continue + # Fallback: if no bin rows were scraped, use day-of-week computed date + if not data["bins"]: + bin_types = ["General Waste", "Recycling", "Food Waste"] + for bin_type in bin_types: + data["bins"].append( + { + "type": bin_type, + "collectionDate": next_collection.strftime("%d/%m/%Y"), + } + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` around lines 235 - 268, The code currently always appends three hardcoded entries via bin_types into data["bins"] and then also appends parsed rows from bin_rows, causing duplicates; change the logic so the hardcoded fallback (the loop that iterates bin_types and appends to data["bins"]) only runs when no parsed rows are found (i.e., when bin_rows is empty or when parsing yields no valid entries), or remove that fallback entirely—update the block around bin_types and the subsequent bin_rows parsing so that either parsed results are used preferentially (append hardcoded entries only if parsed results list is empty) and ensure references like data["bins"], bin_types, and bin_rows remain consistent.uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py (2)
142-142:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent fallback may mask JS-mapping format changes.
If the JS parsing above yields no entries, every bin type silently resolves to its raw table-cell text (
svc). Per the project's convention of preferring explicit failures, consider at least logging a warning whensvcis not found in a non-empty map, so format changes are surfaced quickly.- "type": bin_type_map.get(svc, svc), + "type": bin_type_map.get(svc) if not bin_type_map or svc in bin_type_map else ( + # bin_type_map is populated but this key is absent — surface it + (_ for _ in ()).throw(ValueError(f"Unknown bin service key: {svc!r}")) + ) if bin_type_map else svc,A cleaner approach is a small helper or an explicit check before the loop:
if bin_type_map: mapped_type = bin_type_map.get(svc) if mapped_type is None: raise ValueError(f"Bin service key {svc!r} not found in JS type map") else: mapped_type = svc # JS block absent entirely — use raw nameBased on learnings: the project prefers explicit failures (raising exceptions on unexpected formats) over silent defaults, to ensure format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py` at line 142, Replace the silent fallback in the ChorleyCouncil mapping (currently using "type": bin_type_map.get(svc, svc)) with an explicit check: if bin_type_map is non-empty try to get mapped_type = bin_type_map.get(svc) and raise a ValueError (or log a warning and raise) when mapped_type is None so unexpected JS-format changes fail fast; if bin_type_map is empty (JS block absent) set mapped_type = svc and use that for the "type" field. Ensure you update the code paths in the loop that reference svc to use mapped_type and include the unique symbols bin_type_map and svc in your changes.
129-130:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
table.find("tbody")can returnNone, causingAttributeError.Python's
html.parserdoes not always auto-insert<tbody>; if the server's HTML omits it, the chained.find_all("tr")call will crash.🛡️ Proposed fix
- for row in table.find("tbody").find_all("tr"): + tbody = table.find("tbody") or table + for row in tbody.find_all("tr"):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/ChorleyCouncil.py` around lines 129 - 130, The loop in ChorleyCouncil.py uses table.find("tbody").find_all("tr") which can raise AttributeError if tbody is missing; update the code in the function/method that iterates over table (look for the for row in table.find("tbody").find_all("tr") block) to first assign tbody = table.find("tbody") and then set rows = tbody.find_all("tr") if tbody is not None else table.find_all("tr"), and iterate over rows so the parser works whether or not a <tbody> element is present.uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale copy-paste comment references the wrong council.
Line 1 reads
# This script pulls (in one hit) the data from Bromley Council Bins Databut this file is for Calderdale.✏️ Proposed fix
-# This script pulls (in one hit) the data from Bromley Council Bins Data +# This script pulls the bin collection data for Calderdale Council🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1, Update the incorrect top-line comment that mentions "Bromley Council" to reference Calderdale instead; locate the module/comment at the top of CalderdaleCouncil.py (the CalderdaleCouncil class/module) and replace the stale copy-paste comment so it accurately describes that the script pulls Calderdale council bin data.uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py (2)
32-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDead/broken else branch produces
"None, {postcode}"as the search string.When
house_numberisNone(absentpaon), the condition is falsy and the else branch executesf"{house_number}, {postcode}"wherehouse_numberis stillNone, yielding"None, EC1A 1BB". The comment above even says "Use house_number as full address since it contains the complete address" — so the else clause is never intended to be reached, or it should fall back to just the postcode.🐛 Proposed fix
-# Use house_number as full address since it contains the complete address -full_address = house_number if house_number else f"{house_number}, {postcode}" +if not house_number: + raise ValueError("No house number/address (paon) provided.") +full_address = house_number🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py` around lines 32 - 33, The current full_address assignment in CotswoldDistrictCouncil.py uses f"{house_number}, {postcode}" when house_number is falsy which produces "None, {postcode}"; change the logic in the full_address assignment so that when house_number is present use house_number, otherwise fall back to just postcode (i.e. set full_address based on house_number if truthy, else postcode) to avoid embedding the literal "None" into the search string.
223-226:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent empty-data return violates the explicit-failure convention used across this codebase.
When the page yields no results, the scraper returns
{"bins": []}silently. Downstream consumers will see an empty collection and have no way to tell whether there genuinely are no upcoming collections or whether the page structure changed. AValueError/RuntimeErrorhere would surface scraper breakage immediately. Based on learnings, this project prefers explicit failures over silent defaults.🐛 Proposed fix
-# If no data found, add a debug entry if not data["bins"]: - print("No bin collection data found. Page source:") - print(driver.page_source[:1000]) # Print first 1000 chars for debugging + raise ValueError( + "No bin collection data found. The page structure may have changed." + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py` around lines 223 - 226, The current silent return when no bins are found (checking data["bins"]) should be replaced with an explicit error: in the CotswoldDistrictCouncil scraper (the block that inspects data["bins"] and uses driver.page_source), raise a RuntimeError or ValueError instead of printing and returning {"bins": []}; include a short context message and a truncated page-source snippet (e.g., first 1000 chars) in the exception to aid debugging, and remove the print statements so the scraper surfaces a clear failure when the page yields no results.uk_bin_collection/uk_bin_collection/councils/HyndburnBoroughCouncil.py (1)
132-132:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove debug
print(bin_data)from the production code path.This emits the full parsed bin collection to stdout on every successful invocation.
🐛 Proposed fix
-print(bin_data) - return bin_data🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/HyndburnBoroughCouncil.py` at line 132, Remove the debug print that emits full parsed bin data to stdout by deleting the print(bin_data) call in HyndburnBoroughCouncil.py; instead, if diagnostic information is required use the module's logger (e.g., logger.debug(...) or processLogger.debug(...)) inside the same function where print(bin_data) appears so production runs no longer print sensitive/verbose data to stdout.uk_bin_collection/uk_bin_collection/councils/MidUlsterDistrictCouncil.py (1)
101-110:⚠️ Potential issue | 🟠 Major | ⚡ Quick winYear-rollover bug: collection dates in January will be assigned the wrong year in December.
current_yearis appended unconditionally. If a user runs this in late November/December and the next collection falls in January, the resulting date will be dated to the current year, not the following one. Compare withHyndburnBoroughCouncil.pyin the same PR which correctly adds 1 to the year when the parsed date is already in the past.🐛 Proposed fix
-current_year = datetime.now().year -full_date = f"{date_text} {current_year}" # e.g., "18 Apr 2025" -collection_date = datetime.strptime(full_date, "%d %b %Y").strftime( - date_format -) +current_year = datetime.now().year +full_date = f"{date_text} {current_year}" +parsed = datetime.strptime(full_date, "%d %b %Y") +if parsed.date() < datetime.now().date(): + parsed = parsed.replace(year=current_year + 1) +collection_date = parsed.strftime(date_format)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/MidUlsterDistrictCouncil.py` around lines 101 - 110, The date parsing logic in MidUlsterDistrictCouncil.py unconditionally appends current_year to date_text (variables current_year, full_date, collection_date), causing January dates parsed in December to be assigned the wrong year; change the flow to build a tentative datetime from date_text + current_year, then if that tentative date is before today (use datetime.today() or datetime.now().date()), add 1 to current_year before re-parsing so the final collection_date uses the next year; update the try block around current_year/full_date/collection_date to compute tentative_date and bump the year when needed, mirroring the approach used in HyndburnBoroughCouncil.py.uk_bin_collection/uk_bin_collection/councils/NorthEastDerbyshireDistrictCouncil.py (1)
76-76:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove debug print statements from the hot path.
Lines 76, 100, 117, and 118 emit parsing progress and the full parsed result to stdout on every successful call. These should be removed (or at minimum guarded behind a
logging.debugcall) before merging.🐛 Proposed fix
-print("Parsing HTML content...")-print(f"Found collection: {collection_date} - {bin_types}")-print(f"Found {len(data['bins'])} collections") -print(f"Final data: {data}")Also applies to: 100-100, 117-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/NorthEastDerbyshireDistrictCouncil.py` at line 76, Remove the debug print statements that write parsing progress/results to stdout in the NorthEastDerbyshireDistrictCouncil class; replace them with logging.debug calls (or remove entirely) so the hot path doesn’t print to stdout. Locate the prints (e.g., "Parsing HTML content..." and the subsequent prints of the parsed result) inside the HTML parsing method(s) of NorthEastDerbyshireDistrictCouncil, import Python’s logging if not present, and change each print(...) to logging.debug(...) with a short descriptive message and the variable (if needed) or drop the output entirely.uk_bin_collection/uk_bin_collection/councils/PeterboroughCityCouncil.py (1)
108-154:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when the new panel selector stops matching.
If
waste-service-gridmatches nothing, or one matched panel no longer contains the expected heading/date rows, this currently falls through as a successful scrape with partial or empty data. Add an explicit selector check and let malformed panels raise instead ofcontinue.🔎 Suggested tightening
# Each bin section is within a waste-service-grid div collection_panels = soup.find_all("div", class_="waste-service-grid") + if not collection_panels: + raise ValueError("Expected waste-service-grid collection panels") for panel in collection_panels: - try: - # Bin type - bin_type_tag = panel.find("h3", class_="waste-service-name") - if not bin_type_tag: - continue - bin_type = bin_type_tag.get_text(strip=True) + # Bin type + bin_type_tag = panel.find("h3", class_="waste-service-name") + if not bin_type_tag: + raise ValueError("Expected waste-service-name heading in collection panel") + bin_type = bin_type_tag.get_text(strip=True) - # Get 'Next collection' date - rows = panel.find_all("div", class_="govuk-summary-list__row") - next_collection = None - for row in rows: - key = row.find("dt", class_="govuk-summary-list__key") - value = row.find("dd", class_="govuk-summary-list__value") - if key and value and "Next collection" in key.get_text(): - raw_date = " ".join(value.get_text().split()) + # Get 'Next collection' date + rows = panel.find_all("div", class_="govuk-summary-list__row") + next_collection = None + for row in rows: + key = row.find("dt", class_="govuk-summary-list__key") + value = row.find("dd", class_="govuk-summary-list__value") + if key and value and "Next collection" in key.get_text(): + raw_date = " ".join(value.get_text().split()) - cleaned_date = re.sub( - r"(\d{1,2})(st|nd|rd|th)", r"\1", raw_date - ) - next_collection = cleaned_date - break + cleaned_date = re.sub( + r"(\d{1,2})(st|nd|rd|th)", r"\1", raw_date + ) + next_collection = cleaned_date + break - if not next_collection: - continue + if not next_collection: + raise ValueError(f"Expected 'Next collection' row for {bin_type}") - print(f"Found next collection for {bin_type}: '{next_collection}'") + print(f"Found next collection for {bin_type}: '{next_collection}'") - parsed_date = datetime.strptime(next_collection, input_date_format) - formatted_date = parsed_date.strftime(output_date_format) + parsed_date = datetime.strptime(next_collection, input_date_format) + formatted_date = parsed_date.strftime(output_date_format) - data["bins"].append( - { - "type": bin_type, - "collectionDate": formatted_date, - } - ) - - except Exception as e: - print( - f"Error processing panel for bin '{bin_type if 'bin_type' in locals() else 'unknown'}': {e}" - ) + data["bins"].append( + { + "type": bin_type, + "collectionDate": formatted_date, + } + )Based on learnings: in
uk_bin_collection/**/*.py, prefer explicit failures over silent parsing fallbacks so council format changes are detected early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/PeterboroughCityCouncil.py` around lines 108 - 154, The parser currently silently skips when collection_panels or individual panels lack expected elements; change this to fail fast: after computing collection_panels in the PeterboroughCityCouncil scraper, if collection_panels is empty raise a ValueError (include context like "no waste-service-grid panels found"); inside the for panel loop, if bin_type_tag is missing or next_collection remains None, raise a ValueError indicating the malformed panel (mention the panel or bin_type if available) instead of continue/print; replace the broad try/except that swallows errors (the except block around the panel processing) with either no catch or re-raise the exception after logging so malformed HTML surfaces quickly (references: collection_panels, bin_type_tag, rows, next_collection and the except block).uk_bin_collection/tests/input.json (1)
853-858:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRefresh these wiki notes to the new real-address inputs.
The examples now use actual
house_number/postcodevalues, but the notes still tell users to pass weekday/week or day/round placeholders. That will steer CLI users to the old calling convention.Also applies to: 2517-2523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/tests/input.json` around lines 853 - 858, The wiki_note text still instructs users to pass DAY/WEEK placeholders even though this record now uses real address fields (house_number and postcode); update the "wiki_note" value to describe the new real-address input format (e.g., state that house_number should contain the house number and postcode the postcode, and remove any guidance about passing weekday/week placeholders), and make the same change for the other matching records that currently contain the placeholder guidance (the entries with the same "wiki_note" pattern).
🟡 Minor comments (12)
uk_bin_collection/uk_bin_collection/councils/EppingForestDistrictCouncil.py-56-65 (1)
56-65:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRaise on unparseable dates instead of silently continuing.
except ValueError: continuesilently drops a bin type if its date string has an unexpected format. This means API-side format changes go undetected at runtime — the bin entry simply disappears from output with no error. Based on learnings, the project convention is to prefer explicit failures so format changes surface immediately rather than going unnoticed.🐛 Proposed fix
- try: - parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y") + parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y") data["bins"].append( { "type": bin_type, "collectionDate": parsed.strftime(date_format), } ) - except ValueError: - continueBased on learnings: in
uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, so format changes are detected early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EppingForestDistrictCouncil.py` around lines 56 - 65, The code in EppingForestDistrictCouncil where parsed = datetime.strptime(date_str.strip(), "%d/%m/%Y") silently swallows ValueError and continues; change this to raise an explicit exception (or re-raise with a clear message) instead of continue so parse failures fail fast; update the exception handling in the method (the block that builds data["bins"] with "type": bin_type and "collectionDate") to raise a ValueError that includes the offending date_str, the expected format "%d/%m/%Y", and the bin_type to make debugging and upstream detection immediate.uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py-140-146 (1)
140-146:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winXPath breaks if
user_paoncontains a single quote.The
user_paonvalue is directly interpolated into the XPath string. If the address contains an apostrophe (e.g., "St John's Close"), the XPath will be malformed and raise an exception or match nothing.Use
concat()or escape quotes to handle this edge case:🛠️ Proposed fix using XPath escaping
- desired_option = WebDriverWait(driver, 10).until( - EC.element_to_be_clickable(( - By.XPATH, - f"//li[contains(`@class`, 'active-result') and contains(., '{user_paon}')]" - )) - ) + # Escape single quotes in user_paon for XPath + if "'" in user_paon: + # Use concat() to handle quotes: concat("St John", "'", "s Close") + escaped_paon = "concat('" + user_paon.replace("'", "',\"'\",'" ) + "')" + xpath_expr = f"//li[contains(`@class`, 'active-result') and contains(., {escaped_paon})]" + else: + xpath_expr = f"//li[contains(`@class`, 'active-result') and contains(., '{user_paon}')]" + desired_option = WebDriverWait(driver, 10).until( + EC.element_to_be_clickable((By.XPATH, xpath_expr)) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BostonBoroughCouncil.py` around lines 140 - 146, The XPath used to locate the PAON option interpolates user_paon directly and will break if user_paon contains a single quote; update the locator construction in the block that uses WebDriverWait / desired_option / driver to escape quotes or build a safe XPath (e.g., use XPath concat() to join pieces around any single quotes or alternatively switch to a locator that matches text via normalize-space() combined with translate() or use a CSS selector). Ensure the code that forms f"//li...contains(., '{user_paon}')" is replaced with a function that returns a quoted-safe XPath string before passing it to WebDriverWait and keep the click on desired_option the same.uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py-36-58 (1)
36-58:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUnrecognized dict schema silently returns empty bins.
The
elif isinstance(json_data, dict)branch matches any dict that isn't the legacy format — including a completely unexpected API response. If none of thetype_mapkeys (refuse,recycling,garden,food) are present, the function returns{"bins": []}with no diagnostic, making it indistinguishable from a valid "no collections due" response.Consider adding a guard after the loop (or at entry to the branch) to raise when the response matches neither expected schema:
💡 Proposed fix
elif isinstance(json_data, dict): + if not any(k in json_data for k in type_map): + raise ValueError( + f"Unrecognised API response format; no expected bin-type keys found: {list(json_data.keys())}" + ) for key, display_name in type_map.items():Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/TewkesburyBoroughCouncil.py` around lines 36 - 58, The branch handling elif isinstance(json_data, dict) silently returns an empty data["bins"] when the dict doesn't contain any expected keys; update this branch (using json_data and the existing type_map) to detect whether any of the type_map keys were present/processed (e.g., track a matched flag or count while iterating), and if none were matched raise a clear exception (or log and raise) describing "unexpected API schema" and include the raw json_data or its keys so callers can surface the error instead of treating it as "no collections".uk_bin_collection/uk_bin_collection/councils/MidAndEastAntrimBoroughCouncil.py-35-36 (1)
35-36:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUA Chrome version is 10 major releases behind current stable (Chrome 148)
Chrome 148 is the current stable version, released May 5, 2026. The hardcoded string
Chrome/138.0.0.0is ~10 versions stale. While most council sites don't enforce strict UA version checks, bot-detection middleware increasingly flags very outdated Chrome versions. Easy to bump.🔧 Proposed fix
- user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" + user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/MidAndEastAntrimBoroughCouncil.py` around lines 35 - 36, The user-agent string in MidAndEastAntrimBoroughCouncil.py is hardcoded to an outdated Chrome version; update the user_agent variable used in the create_webdriver(web_driver, headless, user_agent, __name__) call to a current Chrome stable string (e.g. replace "Chrome/138.0.0.0" with "Chrome/148.0.0.0") or make it configurable via a constant/setting so future bumps are easy (update the user_agent assignment and any related constant used by create_webdriver).uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py-34-35 (1)
34-35:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUser-agent version
Chrome/138is already ~10 versions behind the current stable release.Chrome 148 is the current stable version as of May 5, 2026. The hardcoded
Chrome/138.0.0.0string is 10 major versions stale, which could trigger bot-detection heuristics on sites that check for suspiciously outdated browser signatures.🔧 Suggested update
- user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" + user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BroxtoweBoroughCouncil.py` around lines 34 - 35, The hardcoded user_agent string in BroxtoweBoroughCouncil.py (variable user_agent) uses Chrome/138 which is outdated; update it to a current stable Chrome version (e.g., Chrome/148.0.0.0) or replace the hardcoded string with a dynamically generated/maintained value (for example reading from a CHROME_MAJOR_VERSION constant or using a user-agent helper) and pass that to create_webdriver(web_driver, headless, user_agent, __name__); ensure you update the user_agent variable rather than only changing the create_webdriver call so tests and other modules receive the corrected UA.uk_bin_collection/uk_bin_collection/councils/BarnetCouncil.py-93-97 (1)
93-97:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFallback PAON substring match can select the wrong address for short numeric house numbers.
The fallback at line 94 uses
paon_str in option.text.upper()which is a plain substring check. Ifuser_paonis"1", it will match option texts like"10 HIGH STREET","11 HIGH STREET","21 ...", etc. and select the first such match, silently returning bin data for the wrong property.A safer fallback is to match against word boundaries, or at minimum require a space or comma immediately after the PAON:
🛡️ Proposed fix
if not selected: # Fallback: select first option containing the paon for option in dropdown.options: - if paon_str in option.text.upper(): + option_text_fb = option.text.upper().strip() + if ( + option_text_fb.startswith(paon_str + " ") + or option_text_fb.startswith(paon_str + ",") + or f", {paon_str} " in option_text_fb + or f", {paon_str}," in option_text_fb + ): dropdown.select_by_value(option.get_attribute("value")) selected = True break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BarnetCouncil.py` around lines 93 - 97, The current fallback substring check "paon_str in option.text.upper()" (used in the loop over dropdown.options, setting selected via dropdown.select_by_value(...)) can falsely match short numeric PAONs like "1" against "10 HIGH STREET"; replace this with a word-boundary-aware match: build a regex from paon_str (escape it) and test option.text.upper() against r'\b' + PAON + r'(?:\b|[ ,])' (or split option.text into tokens and compare whole-token equality) so only whole PAON matches (or followed by space/comma) are accepted; ensure you import re and keep setting selected and breaking only on a true whole-word match.uk_bin_collection/uk_bin_collection/councils/Hillingdon.py-112-112 (1)
112-112:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
Chrome/138.0.0.0in the user-agent is ~10–11 months staleChrome 138 was in beta as of 28 May, 2025, and ChromeOS LTS 144 was released on April 21, 2026, meaning the current stable channel is around Chrome 144–145. Presenting an outdated version may increase bot-detection risk on the target site.
🔧 Suggested update
- user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" + user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/Hillingdon.py` at line 112, The user_agent string in Hillingdon.py is stale (Chrome/138.0.0.0); update the user_agent variable to a current, realistic Chrome stable identifier (e.g., Chrome/145.0.0.0 or use a more generic recent UA) to reduce bot-detection risk; locate the user_agent assignment in the Hillingdon class or module and replace the hard-coded value accordingly (or implement a small helper to generate/rotate a recent UA if preferred).uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py-105-108 (1)
105-108:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent
Nonereturn for invalid-but-supplied paon produces a misleading errorWhen a caller provides a non-
Nonepaon that fails the digit/length test (e.g., a regular house number like"42"or a 9-digit string),_pid_from_paonreturnsNone. Combined with no URL-based PID, the caller then hits:ValueError: Richmond: supply a URL with ?pid=... OR put PID in the House Number field.This is confusing because the user did supply something in the House Number field. Per the project's preference for explicit failures, an out-of-range-but-non-empty paon should raise a descriptive error rather than silently fall through.
🛠️ Proposed fix
def _pid_from_paon(self, paon): - if paon and str(paon).isdigit() and 10 <= len(str(paon)) <= 14: - return str(paon) - return None + if not paon: + return None + s = str(paon) + if s.isdigit() and 10 <= len(s) <= 14: + return s + raise ValueError( + f"Richmond: House Number '{s}' does not look like a valid PID " + "(expected a 10–14 digit UPRN). Supply a URL with ?pid=... instead." + )Based on learnings: in
uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, so format changes are detected early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` around lines 105 - 108, The function _pid_from_paon currently returns None silently when paon is provided but doesn't meet the digit/length check; instead, change its behavior to raise a ValueError with a clear message (e.g., "Richmond: invalid PID in House Number; expected 10–14 digit PID") whenever paon is non-empty but fails the str(paon).isdigit() or length check so callers get an explicit error; keep returning the string when valid and only return None when paon is truly empty/None.uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py-25-29 (1)
25-29:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
"pid=" in urlsubstring check can give a false positiveIf
base_urlcontains any parameter whose name happens to includepid=as a substring (e.g.,&upid=...,¬pid=...), the condition evaluates toFalse, so the correctly-resolvedpidis never appended to the URL, and the request goes out with the wrong query string.
parse_qsis already imported and used in_pid_from_url— use it here for a consistent, correct check:🛠️ Proposed fix
- if "pid=" not in (base_url or ""): + if not parse_qs(urlparse(base_url or "").query).get("pid"): sep = "&" if "?" in (base_url or "") else "?" target_url = f"{base_url}{sep}pid={pid}" else: target_url = base_url🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` around lines 25 - 29, The substring check "pid=" on base_url can give false positives; instead parse the URL query and check for a real pid parameter. In the block that sets target_url (referencing variables base_url, pid, target_url in LondonBoroughOfRichmondUponThames.py), call urlparse(base_url).query and then parse_qs(...) to get params, test whether 'pid' is a key in that dict, and only append "?pid=..." or "&pid=..." when 'pid' is absent; otherwise leave target_url as base_url. Use the same parse_qs usage pattern as in the existing _pid_from_url helper to keep behavior consistent.uk_bin_collection/uk_bin_collection/councils/GreatYarmouthBoroughCouncil.py-109-111 (1)
109-111:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winYear-rollover causes upcoming Jan–Mar dates to be silently dropped when scraped in Oct–Dec.
With
datetime.today().yearapplied unconditionally, any date the council returns for January/February/March (which appear in the schedule during Oct/Nov/Dec of the preceding year) is parsed with the current year, immediately compared as a past date, and skipped viacontinue. Users scraping in November or December would silently receive no data for their next collections.
NorthumberlandCouncil.pyin the same PR applies an explicit rollover guard. Apply the same fix here:🐛 Proposed fix — mirror the NorthumberlandCouncil year-rollover guard
+ today = datetime.today() + current_year = today.year + current_month = today.month try: # Parse the date text - date_obj = datetime.strptime(date_text + " " + str(datetime.today().year), "%A %d %B %Y") - if date_obj.date() < datetime.today().date(): + date_obj = datetime.strptime(date_text + " " + str(current_year), "%A %d %B %Y") + # If Oct–Dec and collection falls in Jan–Mar, it belongs to next year + if current_month >= 10 and date_obj.month in [1, 2, 3]: + date_obj = date_obj.replace(year=current_year + 1) + if date_obj.date() < today.date(): continue # Skip past dates except ValueError: continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/GreatYarmouthBoroughCouncil.py` around lines 109 - 111, The date parsing always appends datetime.today().year which causes Jan–Mar dates scraped in Oct–Dec to be treated as past and skipped; update the logic around the date parsing in GreatYarmouthBoroughCouncil.py (where date_obj is created from date_text) to apply a year-rollover guard like NorthumberlandCouncil.py: parse the month from date_text (or a temp date with the current year), and if that month is less than the current month, add 1 to the year before constructing date_obj so Jan–Mar entries scraped late in the year are assigned the next year instead of being dropped.uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py-15-15 (1)
15-15:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInclude the full alphabet in
LETTERS.
XandZare missing here, so streets starting with those letters fall back to theApage and cannot resolve correctly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py` at line 15, The LETTERS constant in ThurrockCouncil.py is missing 'X' and 'Z', causing streets starting with those letters to map incorrectly; update the LETTERS variable (used in the ThurrockCouncil class/file) to include the full alphabet (A–Z) so all initial letters are handled rather than the truncated "ABCDEFGHIJKLMNOPQRSTUVWY".uk_bin_collection/tests/generate_map_test_results.py-80-81 (1)
80-81:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the usage string.
The help text still references
generate_test_results.py, but this script isgenerate_map_test_results.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/tests/generate_map_test_results.py` around lines 80 - 81, Update the usage/help string printed in the startup argument check so it references the correct script name; in the block that tests len(sys.argv) < 2 (the startup argument validation), replace "Usage: python generate_test_results.py <junit.xml path> [--detailed]" with "Usage: python generate_map_test_results.py <junit.xml path> [--detailed]" (or an equivalent accurate invocation string).
| # Create the Selenium WebDriver | ||
| driver = create_webdriver(web_driver, headless, None, __name__) | ||
| user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" | ||
| driver = create_webdriver(web_driver, headless, user_agent, __name__) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)"
echo "Inspecting: $file"
sed -n '28,35p' "$file"
python - <<'PY'
from pathlib import Path
import ast
path = next(Path('.').rglob('ThanetDistrictCouncil.py'))
source = path.read_text()
try:
ast.parse(source)
print("AST parse succeeded")
except SyntaxError as exc:
print(f"SyntaxError: {exc.msg} at line {exc.lineno}, column {exc.offset}")
raise
PYRepository: robbrad/UKBinCollectionData
Length of output: 941
🏁 Script executed:
#!/bin/bash
# Check broader context of the problematic section
file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)"
echo "=== Full context around the error (lines 20-55) ==="
sed -n '20,55p' "$file"Repository: robbrad/UKBinCollectionData
Length of output: 1588
🏁 Script executed:
#!/bin/bash
# Check for the finally/except blocks
file="$(fd -p 'ThanetDistrictCouncil.py' | head -n1)"
echo "=== Looking for try/except/finally structure (lines 34-100) ==="
sed -n '34,100p' "$file"Repository: robbrad/UKBinCollectionData
Length of output: 1743
Fix the mis-indented WebDriver assignment.
Line 32 is indented one level deeper than the surrounding block, causing an IndentationError: unexpected indent that prevents the module from parsing. Move the driver assignment inside the try block at the correct indentation level, initialize driver = None before the try, and guard the finally cleanup with an if driver: check.
🐛 Proposed fix
def parse_data(self, page: str, **kwargs) -> dict:
user_uprn = kwargs.get("uprn")
check_uprn(user_uprn)
bindata = {"bins": []}
+ driver = None
url = f"https://www.thanet.gov.uk/wp-content/mu-plugins/collection-day/incl/mu-collection-day-calls.php?pAddress={user_uprn}"
web_driver = kwargs.get("web_driver")
headless = kwargs.get("headless")
- # Create the Selenium WebDriver
- user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
- driver = create_webdriver(web_driver, headless, user_agent, __name__)
-
try:
+ # Create the Selenium WebDriver
+ user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36"
+ driver = create_webdriver(web_driver, headless, user_agent, __name__)
print(f"Navigating to URL: {url}")
driver.get(url) finally:
print("Cleaning up WebDriver...")
- driver.quit()
+ if driver:
+ driver.quit()🧰 Tools
🪛 Ruff (0.15.12)
[warning] 32-32: Unexpected indentation
(invalid-syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/ThanetDistrictCouncil.py` around
lines 30 - 32, The driver assignment is mis-indented causing a parse error;
initialize driver = None before the try block, move the call to
create_webdriver(web_driver, headless, user_agent, __name__) into the try at the
same indentation level as the rest of the block (assigning to driver), and in
the finally section only call driver.quit() or similar if driver is truthy (if
driver:) to avoid errors when construction fails; the symbols to adjust are the
driver variable, the create_webdriver(...) call, and the surrounding try/finally
that performs cleanup.
| "(KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36" | ||
| ) | ||
| } | ||
| response = requests.get(api_url, headers=headers) |
There was a problem hiding this comment.
Add a timeout to requests.get — blocking call with no deadline.
Without a timeout this call can hang the worker thread indefinitely if West Suffolk's IIS is slow or unresponsive, which can exhaust the thread pool in a multi-scraper run.
🔧 Proposed fix
- response = requests.get(api_url, headers=headers)
+ response = requests.get(api_url, headers=headers, timeout=60)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = requests.get(api_url, headers=headers) | |
| response = requests.get(api_url, headers=headers, timeout=60) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` at line
26, The requests.get call using api_url and headers in WestSuffolkCouncil should
include a timeout to avoid blocking indefinitely; update the
requests.get(api_url, headers=headers) invocation to pass a sensible timeout
(e.g. timeout=10 or a configurable constant) and ensure the surrounding code
handles requests.exceptions.Timeout and other requests.RequestException errors
so the scraper fails fast and logs the error instead of hanging the worker
thread.
…ndentation errors from PR #1947 merge
May 2026 Release — 60 PRs Consolidated
All open PRs merged into a single release branch with 7 merge conflicts resolved.
Council Fixes (53 PRs)
Dependencies (4 PRs)
Merge Conflicts Resolved (7)
Issues Resolved (20)
This release fixes the following open issues:
Closes #1555, #1616, #1725, #1768, #1804, #1889, #1890, #1893, #1902, #1904, #1906, #1908, #1910, #1911, #1914, #1931, #1951, #1962, #1970, #1982
Code Review Notes
verify=FalseSSL calls exist but are consistent with existing codebase patterns for councils with bad certs