Skip to content

fix: Kingston-upon-Thames website HTML format change#1828

Closed
andypiper wants to merge 2 commits into
robbrad:masterfrom
andypiper:fix-kingston-council
Closed

fix: Kingston-upon-Thames website HTML format change#1828
andypiper wants to merge 2 commits into
robbrad:masterfrom
andypiper:fix-kingston-council

Conversation

@andypiper
Copy link
Copy Markdown
Contributor

@andypiper andypiper commented Jan 25, 2026

The Kingston upon Thames page layout slightly changed to a structure using a div.waste-service-grid container that now wraps both the service name and the collection details. This was breaking the data retrieval.

While I was in there I also updated the comment with the URL for the help page on the Kingston website about bin collection (was 404).

Tested locally on Home Assistant Green / 2026.1.3 and this successfully now returns a calendar again.

Fixes #1824

Summary by CodeRabbit

  • Bug Fixes
    • Improved Kingston Upon Thames bin collection parsing for greater resilience to website layout changes, with clearer messages when data is missing and more accurate next-collection date detection and formatting.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Andy Piper <andypiper@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 25, 2026

📝 Walkthrough

Walkthrough

Parsing for Kingston Upon Thames Council was changed to iterate over div.waste-service-grid elements, extracting h3.waste-service-name and dl.govuk-summary-list per grid with explicit validation and descriptive ValueError raises when expected DOM elements are missing.

Changes

Cohort / File(s) Summary
DOM Parsing Logic Refactor
uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py
Replaced previous sibling-based DOM traversal with iteration over div.waste-service-grid. Extracts service type from inner h3.waste-service-name, finds collection info in dl.govuk-summary-list rows, constructs date string including current year, uses get_next_occurrence_from_day_month, and adds explicit validation with descriptive ValueError messages for missing elements.

Sequence Diagram(s)

(Skipped — changes are localized parsing updates and do not introduce multi-component sequential flows.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • dp247

Poem

🐰 I hopped through grids where h3s hide,
Found names and dates with checks beside.
No more NoneType scares in the night,
Kingston's bins will pop up right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: fixing Kingston Council's parser due to an HTML format change on their website.
Linked Issues check ✅ Passed The code changes directly address issue #1824 by updating the parser to handle the new div.waste-service-grid structure, fixing the AttributeError and restoring bin collection data retrieval.
Out of Scope Changes check ✅ Passed All changes are scoped to the Kingston Council parser to address the HTML structure change; the help-page URL comment update is a necessary correction directly related to the parsing issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (b057b08).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1828   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py`:
- Line 75: Replace the use of .capitalize() on service_name in the dict
construction (the line that sets the "type" key) so the original casing is
preserved; in KingstonUponThamesCouncil.py stop calling
service_name.capitalize() and assign service_name directly to the "type" field
(keeping the rest of the code that builds the service dict intact), ensuring any
user-visible labels remain as provided by the source.
- Around line 47-48: The parser currently uses soup.find_all("div", {"class":
"waste-service-grid"}) and then continues silently if service_grids is empty;
modify the code around the service_grids variable (the result of find_all) to
check if not service_grids and immediately raise a descriptive exception (e.g.,
ValueError or a custom ParseError) with a message like "No waste-service-grid
elements found on KingstonUponThamesCouncil page" so failures are explicit;
ensure the check sits before any iteration over service_grids (the for grid in
service_grids loop) so the function fails fast when the page structure changes.
- Around line 65-69: The code that extracts the "next collection" date (inside
KingstonUponThamesCouncil.py where dt = row.find("dt") and collection_date is
built using row.find("dd")) does not guard against a missing <dd> and will raise
an AttributeError; update the logic to check that dd = row.find("dd") is not
None before calling get_text(), and if it is None raise a clear, descriptive
error (or handle gracefully) that includes context (e.g., the offending row or
dt text); ensure you still call
remove_ordinal_indicator_from_date_string(dd.get_text()) only when dd is
present.

Comment thread uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py Outdated
Signed-off-by: Andy Piper <andypiper@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py (1)

67-94: Fail fast if no “next collection” row is found per service.
Right now, if a grid lacks a “next collection” row, the parser silently skips that bin. That can yield partial data without surfacing a format change. Suggest tracking whether a match was found and raising when absent. Based on learnings, explicit failures are preferred.

🔧 Proposed fix
-                rows = summary_list.find_all("div", {"class": "govuk-summary-list__row"})
+                rows = summary_list.find_all("div", {"class": "govuk-summary-list__row"})
+                found_next_collection = False
                 for row in rows:
                     dt = row.find("dt")
                     if dt and dt.get_text().strip().lower() == "next collection":
+                        found_next_collection = True
                         dd = row.find("dd")
                         if not dd:
                             raise ValueError(
                                 f"Kingston parser: missing dd element for 'next collection' in {service_name}"
                             )
                         collection_date = remove_ordinal_indicator_from_date_string(
                             dd.get_text()
                         ).strip().replace(" (In progress)", "")
                         # strip out any text inside of the date string
                         collection_date = re.sub(
                             r"\n\s*\(this.*?\)", "", collection_date
                         )
                         dict_data = {
                             "type": service_name,
                             "collectionDate": get_next_occurrence_from_day_month(
                                 datetime.strptime(
                                     collection_date
                                     + " "
                                     + datetime.now().strftime("%Y"),
                                     "%A, %d %B %Y",
                                 )
                             ).strftime(date_format),
                         }
                         data["bins"].append(dict_data)
+                if not found_next_collection:
+                    raise ValueError(
+                        f"Kingston parser: no 'next collection' row found for {service_name}"
+                    )

@robbrad
Copy link
Copy Markdown
Owner

robbrad commented Feb 1, 2026

This PR has been merged into the February 2026 consolidated release PR #1837.

Thank you for your contribution!

@robbrad robbrad closed this Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error: 'NoneType' object has no attribute 'find_all'

2 participants