fix: Update NorthTynesideCouncil to reflect changes to collection schedule pages#1659
Conversation
…tract schedule from the UPRN linked page
WalkthroughReplaced North Tyneside Council's multi-step postcode form flow with a direct UPRN-based fetch of the waste-collection schedule page and implemented a new parser for the page's structured schedule block to produce chronological bin entries. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant OldFlow as Old Flow
participant NewFlow as New Flow
participant Website
rect rgb(230,240,255)
Note over OldFlow: Old: Multi-step form (postcode)
Client->>OldFlow: submit postcode
OldFlow->>Website: POST form (retrieve form_build_id)
Website->>OldFlow: form response (form_build_id)
OldFlow->>Website: POST with form_build_id
Website->>OldFlow: HTML response
OldFlow->>Client: compute bins (calendar logic)
end
rect rgb(230,255,240)
Note over NewFlow: New: Direct UPRN schedule fetch
Client->>NewFlow: provide UPRN
NewFlow->>Website: GET /waste-collection-schedule?uprn=...
Website->>NewFlow: HTML with waste-collection__schedule
NewFlow->>NewFlow: parse days (time, type, colour) ✓ / warn ✖
NewFlow->>Client: return chronological bins
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py (1)
76-84: Replace wildcard import with explicit imports includingdate_format.The
date_formatconstant is defined incommon.py(line 17:date_format = "%d/%m/%Y"), making it available via the current wildcard import. However, using explicit imports improves code clarity and maintainability. Replace line 5:from uk_bin_collection.uk_bin_collection.common import date_format, check_uprn(and include any other symbols needed from
common.py), removing the wildcard import.
🧹 Nitpick comments (5)
uk_bin_collection/tests/input.json (1)
1762-1764: Add a wiki_command_url_override to show the exact view endpoint that requires the UPRN.
- Current change looks good. To help users, add a command override that includes the UPRN placeholder.
Apply:
"NorthTynesideCouncil": { "skip_get_url": true, "uprn": "47097627", - "url": "https://www.northtyneside.gov.uk/waste-collection-schedule", + "url": "https://www.northtyneside.gov.uk/waste-collection-schedule", + "wiki_command_url_override": "https://www.northtyneside.gov.uk/waste-collection-schedule/view/XXXXXXXX", "wiki_name": "North Tyneside", - "wiki_note": "Pass the UPRN. You can find the UPRN using [FindMyAddress](https://www.findmyaddress.co.uk/search).", + "wiki_note": "Pass only the UPRN (no postcode). Example view URL: /waste-collection-schedule/view/UPRN.", "LAD24CD": "E08000022" },uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py (4)
13-17: Silence unused-arg warning forpage.
- Keep signature but mark/handle unused param.
Apply:
def parse_data(self, page: str, **kwargs) -> dict: user_uprn = kwargs.get("uprn") check_uprn(user_uprn) + # `page` is unused because we construct the view URL directly. + # del page # noqa: F841 # (alternatively use ruff: # noqa: ARG002 on def line)As per coding guidelines.
18-20: Avoid redundant global warning toggles here.
- You already set
verify=Falseon requests or can rely on AbstractGetBinDataClass.get_data (which disables warnings centrally). This local toggle viarequests.packages...is unnecessary.Apply:
- # diable warnings so that we can ignore cert verification - requests.packages.urllib3.disable_warnings() + # TLS warning suppression is handled in AbstractGetBinDataClass.get_data.
30-34: Prefer a specific exception type and include URL context.Apply:
- if schedule is None: - raise Exception("No waste-collection schedule info found - has the page changed?") + if schedule is None: + raise ValueError("No waste-collection schedule found. The page structure may have changed.")
56-75: Harden parsing and avoid brittle "NoneType" string checks.
- Guard each element fetch and continue when missing.
- Catch narrow exceptions instead of generic
Exception.Apply:
- for day in schedule.find_all("li", {"class": "waste-collection__day"}): - try: - # extract the date, bin type and colour - collection_date = datetime.strptime(day.find("time")["datetime"], "%Y-%m-%d") - - # for the collection type we only want the text before any nested span - type_span = day.find("span", {"class": "waste-collection__day--type"}) - bin_type = next(type_span.strings).strip() - - bin_colour = day.find("span", {"class": "waste-collection__day--colour"}).text.strip() - - collections.append((f'{bin_type} ({bin_colour})', collection_date)) - - except Exception as e: - # here NoneType typically suggests parsing errors so report them and continue - if "NoneType" in str(e): - logger.warning(f'Error while processing {day}: {e}') - continue - raise + for day in schedule.find_all("li", {"class": "waste-collection__day"}): + try: + time_el = day.find("time") + if not time_el or not time_el.get("datetime"): + logger.warning("Skipping day: missing time/datetime") + continue + collection_date = datetime.strptime(time_el["datetime"], "%Y-%m-%d") + + type_span = day.find("span", {"class": "waste-collection__day--type"}) + # Direct text only (exclude nested spans, e.g., bank-holiday note) + bin_type_text = type_span.find(text=True, recursive=False) if type_span else None + if not bin_type_text: + logger.warning("Skipping day: missing type") + continue + bin_type = bin_type_text.strip() + + colour_span = day.find("span", {"class": "waste-collection__day--colour"}) + if not colour_span: + logger.warning("Skipping day: missing colour") + continue + bin_colour = colour_span.get_text(strip=True) + + collections.append((f"{bin_type} ({bin_colour})", collection_date)) + except (AttributeError, KeyError, TypeError, ValueError) as e: + logger.warning(f"Skipping unparsable day node: {e}") + continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)
🪛 Ruff (0.14.0)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py
5-5: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
13-13: Unused method argument: page
(ARG002)
16-16: check_uprn may be undefined, or defined from star imports
(F405)
19-19: requests may be undefined, or defined from star imports
(F405)
22-22: requests may be undefined, or defined from star imports
(F405)
33-33: Create your own exception
(TRY002)
33-33: Avoid specifying long messages outside the exception class
(TRY003)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py (1)
59-65: Schedule extraction logic is sound.The code correctly validates the presence of the expected structure and provides a helpful error message for debugging. The static analysis hint (TRY003) about the long exception message is a minor style preference—the descriptive message is actually valuable for users when the page structure changes.
If you prefer to address the TRY003 hint, you could define a custom exception class, but this is optional:
class ScheduleNotFoundError(ValueError): """Raised when waste collection schedule is not found on the page.""" passThen on line 64:
- raise ValueError("No waste-collection schedule found. The page structure may have changed.") + raise ScheduleNotFoundError("The page structure may have changed.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- uk_bin_collection/tests/input.json
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)uk_bin_collection/uk_bin_collection/get_bin_data.py (2)
AbstractGetBinDataClass(43-146)get_data(110-128)
🪛 Ruff (0.14.0)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (4)
uk_bin_collection/uk_bin_collection/councils/NorthTynesideCouncil.py (4)
1-8: LGTM! Import concerns from previous review are resolved.The explicit imports on line 5 address the previous wildcard import concern. The previous request to add
import requestsis now obsolete since the code correctly usesself.get_data()(inherited from the parent class) instead of callingrequests.get()directly.
50-56: LGTM! HTTP request handling correctly addresses previous review.The code now uses
self.get_data()which includes proper headers, timeout, and SSL verification settings. The defensiveraise_for_status()check ensures fast failure on HTTP errors.
114-122: LGTM! Return structure is correct.The chronological sorting and date formatting using the shared
date_formatensure consistent output across the codebase.
87-112: Well-structured parsing loop with good defensive coding.The parsing logic correctly handles the documented HTML structure, including the bank holiday notification edge case (lines 95-101). The
recursive=Falsepattern on line 97 elegantly extracts only the direct text content while skipping nested spans. Exception handling with logging ensures invalid entries don't break the entire parse.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1659 +/- ##
=======================================
Coverage 86.79% 86.79%
=======================================
Files 9 9
Lines 1136 1136
=======================================
Hits 986 986
Misses 150 150 ☔ View full report in Codecov by Sentry. |
Summary:
Changes:
https://my.northtyneside.gov.uk/category/81/bin-collection-datestohttps://www.northtyneside.gov.uk/waste-collection-scheduleFixes: #1654
Summary by CodeRabbit