fix: Kingston-upon-Thames website HTML format change#1828
Conversation
Signed-off-by: Andy Piper <andypiper@users.noreply.github.com>
📝 WalkthroughWalkthroughParsing for Kingston Upon Thames Council was changed to iterate over Changes
Sequence Diagram(s)(Skipped — changes are localized parsing updates and do not introduce multi-component sequential flows.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1828 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py`:
- Line 75: Replace the use of .capitalize() on service_name in the dict
construction (the line that sets the "type" key) so the original casing is
preserved; in KingstonUponThamesCouncil.py stop calling
service_name.capitalize() and assign service_name directly to the "type" field
(keeping the rest of the code that builds the service dict intact), ensuring any
user-visible labels remain as provided by the source.
- Around line 47-48: The parser currently uses soup.find_all("div", {"class":
"waste-service-grid"}) and then continues silently if service_grids is empty;
modify the code around the service_grids variable (the result of find_all) to
check if not service_grids and immediately raise a descriptive exception (e.g.,
ValueError or a custom ParseError) with a message like "No waste-service-grid
elements found on KingstonUponThamesCouncil page" so failures are explicit;
ensure the check sits before any iteration over service_grids (the for grid in
service_grids loop) so the function fails fast when the page structure changes.
- Around line 65-69: The code that extracts the "next collection" date (inside
KingstonUponThamesCouncil.py where dt = row.find("dt") and collection_date is
built using row.find("dd")) does not guard against a missing <dd> and will raise
an AttributeError; update the logic to check that dd = row.find("dd") is not
None before calling get_text(), and if it is None raise a clear, descriptive
error (or handle gracefully) that includes context (e.g., the offending row or
dt text); ensure you still call
remove_ordinal_indicator_from_date_string(dd.get_text()) only when dd is
present.
Signed-off-by: Andy Piper <andypiper@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py (1)
67-94: Fail fast if no “next collection” row is found per service.
Right now, if a grid lacks a “next collection” row, the parser silently skips that bin. That can yield partial data without surfacing a format change. Suggest tracking whether a match was found and raising when absent. Based on learnings, explicit failures are preferred.🔧 Proposed fix
- rows = summary_list.find_all("div", {"class": "govuk-summary-list__row"}) + rows = summary_list.find_all("div", {"class": "govuk-summary-list__row"}) + found_next_collection = False for row in rows: dt = row.find("dt") if dt and dt.get_text().strip().lower() == "next collection": + found_next_collection = True dd = row.find("dd") if not dd: raise ValueError( f"Kingston parser: missing dd element for 'next collection' in {service_name}" ) collection_date = remove_ordinal_indicator_from_date_string( dd.get_text() ).strip().replace(" (In progress)", "") # strip out any text inside of the date string collection_date = re.sub( r"\n\s*\(this.*?\)", "", collection_date ) dict_data = { "type": service_name, "collectionDate": get_next_occurrence_from_day_month( datetime.strptime( collection_date + " " + datetime.now().strftime("%Y"), "%A, %d %B %Y", ) ).strftime(date_format), } data["bins"].append(dict_data) + if not found_next_collection: + raise ValueError( + f"Kingston parser: no 'next collection' row found for {service_name}" + )
|
This PR has been merged into the February 2026 consolidated release PR #1837. Thank you for your contribution! |
The Kingston upon Thames page layout slightly changed to a structure using a
div.waste-service-gridcontainer that now wraps both the service name and the collection details. This was breaking the data retrieval.While I was in there I also updated the comment with the URL for the help page on the Kingston website about bin collection (was 404).
Tested locally on Home Assistant Green / 2026.1.3 and this successfully now returns a calendar again.
Fixes #1824
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.