Fix MertonCouncil for new website URL by davida72 · Pull Request #1716 · robbrad/UKBinCollectionData

davida72 · 2025-11-14T16:41:32Z

The Merton Council website has changed from myneighbourhood.merton.gov.uk to fixmystreet.merton.gov.uk. Updated the script to:

Use the new FixMyStreet-based website
Poll the page until JavaScript loads collection data
Accept UPRN parameter instead of full URL
Skip booking services (Bulky waste, Garden waste)

Fixes #1664

Summary by CodeRabbit

Bug Fixes
- Merton Council waste collection now uses the FixMyStreet waste page and properly handles dynamic (JS-loaded) content, improving reliability and accuracy of bin schedules.
- Skips non-scheduled services (e.g., bulky/garden waste) and robustly parses next-collection dates for correct ordering.
Documentation
- Simplified instructions for obtaining your UPRN via FindMyAddress; a UPRN has been added for Merton for testing.

The Merton Council website has changed from myneighbourhood.merton.gov.uk to fixmystreet.merton.gov.uk. Updated the script to: - Use the new FixMyStreet-based website - Poll the page until JavaScript loads collection data - Accept UPRN parameter instead of full URL - Skip booking services (Bulky waste, Garden waste) Fixes #1664 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-11-14T16:41:41Z

Walkthrough

Updated Merton Council integration and test config: test input now includes UPRN and skip flag; council parser switched from static HTML table parsing to fetching an UPRN-specific FixMyStreet page, polling for JS-loaded content, extracting collection types and "Next collection" dates, and returning sorted bins with formatted dates.

Changes

Cohort / File(s)	Summary
Test Configuration `uk_bin_collection/tests/input.json`	Added `uprn: "4328213"` and `skip_get_url: true` for MertonCouncil; updated `url` to `https://fixmystreet.merton.gov.uk/waste/`; removed `wiki_command_url_override`; simplified `wiki_note` to use FindMyAddress.
Council Implementation `uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py`	Replaced static BS4 table parsing with UPRN-based GET to FixMyStreet and polling for JS-loaded content. Added UPRN validation, polling constants (`MAX_POLLING_ATTEMPTS`, `POLLING_SLEEP_SECONDS`), request headers/timeouts and retry loop. New selectors target `div.waste__collections`, `h3.waste-service-name`, and GOV.UK summary lists to extract "Next collection" dates, with multi-format date parsing, year-rollover handling, service-type filtering, and output as `data["bins"]` sorted by date. Added error handling for missing containers and timeouts.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Merton as MertonCouncil.py
    participant FMS as fixmystreet.merton.gov.uk
    participant Parser as HTML Parser

    User->>Merton: fetch(uprn, ...)
    Merton->>Merton: validate uprn (numeric)
    Merton->>FMS: GET /waste/up/{uprn} (User-Agent, timeout)
    FMS-->>Merton: HTML response

    rect rgb(230,245,255)
    Note over Merton,FMS: Polling loop until JS data appears or attempts exhausted
    loop while not ready and attempts < MAX_POLLING_ATTEMPTS
        Merton->>FMS: GET page (retry)
        FMS-->>Merton: HTML
        Merton->>Merton: check loading indicator gone
        Merton->>Merton: sleep POLLING_SLEEP_SECONDS
    end
    end

    Merton->>Parser: parse div.waste__collections
    Parser->>Parser: for each h3.waste-service-name -> skip non-scheduled
    Parser->>Parser: extract "Next collection" from summary list
    Parser->>Parser: parse date (multi-format, year-rollover)
    Parser-->>Merton: list of (bin_type, collectionDate)
    Merton->>Merton: sort & format -> data["bins"]
    Merton-->>User: return collections

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas to focus on:
- Polling logic, timeout and retry behavior (MAX_POLLING_ATTEMPTS, POLLING_SLEEP_SECONDS)
- UPRN validation and error messages
- Robustness of CSS selectors and handling when structure differs
- Date parsing (formats supported, year-rollover edge cases)
- Tests/input.json changes and any test expectations impacted

Possibly related PRs

Enhance HTTP requests with retry logic and headers #1691 — Implements similar requests.Session headers/retries and JS-loading polling logic in another council parser.
fix: Update NorthTynesideCouncil to reflect changes to collection schedule pages #1659 — Replaces form/postcode flows with direct UPRN-based fetch-and-parse for a different council; similar parsing and URL-change handling.

Suggested reviewers

dp247

Poem

🐇 From old myneighbourhood to fixmystreet we hop,

UPRN in paw, we poll till the JS stops,
Bins lined up neatly with dates in a row,
Hooray for parsed collections — off we go! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: fixing MertonCouncil for a new website URL, which aligns with the primary objective in the linked issue.
Linked Issues check	✅ Passed	The PR successfully implements all coding requirements from issue #1664: updated URL to fixmystreet.merton.gov.uk/waste, implemented JS polling, added UPRN support, and skips non-scheduled services.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing MertonCouncil for the new FixMyStreet URL. No out-of-scope changes detected; modifications to input.json and MertonCouncil.py align with the stated objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-14T16:43:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (37c8a80) to head (c836362).
⚠️ Report is 97 commits behind head on dec_release.

Additional details and impacted files

@@             Coverage Diff              @@
##           dec_release    #1716   +/-   ##
============================================
  Coverage        86.79%   86.79%           
============================================
  Files                9        9           
  Lines             1136     1136           
============================================
  Hits               986      986           
  Misses             150      150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2)
7-7: Consider explicit imports instead of star import.

The star import from common makes it difficult to verify which symbols are used and could lead to namespace pollution. Static analysis tools flag datetime usage as potentially undefined (F405).

Consider replacing the star import with explicit imports:
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import datetime, date_format
35-49: Consider parameterizing polling configuration.

The polling mechanism uses hardcoded values for max_attempts (10) and sleep duration (2 seconds). While these values are reasonable, making them configurable would improve flexibility for different network conditions or slower-loading pages.

Additionally, static analysis suggests creating custom exception classes rather than using generic Exception with long messages (TRY002, TRY003), though this is a minor style consideration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c8a80 and 4474633.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json (1 hunks)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.4)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py

22-22: Avoid specifying long messages outside the exception class

(TRY003)

48-48: Create your own exception

(TRY002)

48-48: Avoid specifying long messages outside the exception class

(TRY003)

54-54: Create your own exception

(TRY002)

54-54: Avoid specifying long messages outside the exception class

(TRY003)

95-95: datetime may be undefined, or defined from star imports

(F405)

100-100: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run Unit Tests (3.12, 1.8.4)
GitHub Check: Run Integration Tests (3.12, 1.8.4)

🔇 Additional comments (2)

uk_bin_collection/tests/input.json (1)

1511-1515: LGTM! Test data correctly updated for new implementation.

The changes properly align with the updated MertonCouncil.py implementation:

Added uprn parameter for the new UPRN-based flow

Updated URL to the new FixMyStreet site

Added skip_get_url: true to match the new data fetching approach

Updated wiki_note to guide users on obtaining UPRN via FindMyAddress

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1)

64-72: LGTM! Correctly skips booking services.

The logic to skip "Bulky waste" and "Garden waste" correctly implements the PR objective to filter out booking services that aren't regular collections.

davida72 · 2025-11-15T09:26:39Z

Wow. That CodeRabbit is something. I'll take a look at the things it's raised.

This commit addresses all issues raised in the CodeRabbit review: 1. Replace star import with explicit imports - Changed from `import *` to explicit imports of `datetime` and `date_format` - Eliminates namespace pollution and F405 static analysis warnings 2. Add UPRN format validation - Validates that UPRN contains only numeric characters - Prevents potential security/stability issues from invalid input 3. Fix year-boundary date parsing bug - Checks if parsed date is in the past and adjusts to next year - Fixes December-to-January collection date parsing issues 4. Parameterize polling configuration - Moved hardcoded values to class constants (MAX_POLLING_ATTEMPTS, POLLING_SLEEP_SECONDS) - Improves flexibility for varying network conditions 5. Add comprehensive docstrings - Added detailed class and method docstrings - Includes Args, Returns, Raises, Examples, and Notes sections - Significantly improves docstring coverage All changes maintain backward compatibility while improving code quality, security, and maintainability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1)
68-82: Normalize UPRN before validation to make input handling more robust

Nice improvement adding numeric validation. You can make this a bit safer and more user-friendly by normalizing first (and reusing the normalized value for the URL):

Strip surrounding whitespace before checking isdigit().

Treat None and empty-string UPRNs distinctly.

Use the normalized string when constructing the URL.

Example diff:
-        uprn = kwargs.get("uprn")
-        if not uprn:
-            raise ValueError("uprn is required")
-
-        # Validate UPRN format (must be numeric only)
-        if not str(uprn).isdigit():
-            raise ValueError("uprn must contain only numeric characters")
+        uprn_value = kwargs.get("uprn")
+        if uprn_value is None:
+            raise ValueError("uprn is required")
+
+        uprn_str = str(uprn_value).strip()
+        # Validate UPRN format (must be numeric only)
+        if not uprn_str or not uprn_str.isdigit():
+            raise ValueError("uprn must contain only numeric characters")
+
+        uprn = uprn_str
This keeps the behavior strict while avoiding surprises from accidental whitespace.

🧹 Nitpick comments (3)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (3)

34-47: Clarify unused page parameter to address ARG002 and future readers

parse_data no longer uses the page argument (by design) and Ruff flags this as ARG002. Since you’ve documented that page is unused for interface compatibility, consider either renaming it to _page or adding a # noqa: ARG002 comment so this is explicit and lint-clean.

87-105: Handle HTTP errors explicitly and avoid raising bare Exception

The polling loop works, but a couple of tweaks would make failures easier to reason about and align with the Ruff TRY002/TRY003 hints:

If the server returns a non-2xx status, you currently fall through and eventually raise “Collections div not found” or a timeout. Consider calling response.raise_for_status() after requests.get(...) so HTTP errors are clearly surfaced.
Replace bare Exception with a more specific type (e.g. RuntimeError or a small module-level custom exception) and slightly more contextual messages.

For example:

-        for attempt in range(1, self.MAX_POLLING_ATTEMPTS + 1):
-            response = requests.get(url, headers=headers, timeout=10)
+        for attempt in range(1, self.MAX_POLLING_ATTEMPTS + 1):
+            response = requests.get(url, headers=headers, timeout=10)
+            response.raise_for_status()
@@
-                else:
-                    raise Exception("Timeout waiting for bin collection data to load")
+                else:
+                    raise RuntimeError("Timeout waiting for bin collection data to load")
@@
-        collections_div = soup.find("div", class_="waste__collections")
-        if not collections_div:
-            raise Exception("Collections div not found")
+        collections_div = soup.find("div", class_="waste__collections")
+        if not collections_div:
+            raise RuntimeError("Collections div not found in waste page HTML")

This keeps the polling behavior but produces clearer, more actionable errors.

110-163: Tighten and simplify Next collection date parsing

The year-boundary handling looks good, but the current parsing has a couple of rough edges:

possible_formats includes "%A %d %B %Y", but date_str is built as "day month year", so that format will never match.
Splitting collection_date_str and taking date_parts[1] and [2] assumes the format is always "Weekday DD Month", and doesn’t cope with minor template changes (extra text, bracketed notes).

You can both simplify and make this more robust by:

Using the entire (optionally normalised) date string and appending the year.
Letting possible_formats reflect the strings you actually parse.
Reusing a today value so you don’t call datetime.now() repeatedly.

For example:

-        possible_formats = [
-            "%d %B %Y",
-            "%A %d %B %Y",
-        ]
+        today = datetime.now().date()
+        possible_formats = [
+            "%A %d %B %Y",  # e.g. "Saturday 15 November"
+            "%d %B %Y",     # e.g. "15 November"
+        ]
@@
-                        if key and value and "Next collection" in key.get_text():
-                            collection_date_str = value.get_text().strip()
-
-                            # Parse the date - format is like "Saturday 15 November"
-                            collectionDate = None
-                            # Try with day of week
-                            date_parts = collection_date_str.split()
-                            if len(date_parts) >= 3:
-                                # Try parsing with day name, day, month
-                                day = date_parts[1]
-                                month = date_parts[2]
-                                year = datetime.now().year
-                                date_str = f"{day} {month} {year}"
-
-                                for format in possible_formats:
-                                    try:
-                                        collectionDate = datetime.strptime(
-                                            date_str, format
-                                        )
-                                        # Handle year boundary: if parsed date is in the past, assume next year
-                                        if collectionDate.date() < datetime.now().date():
-                                            collectionDate = collectionDate.replace(year=year + 1)
-                                        break
-                                    except ValueError:
-                                        continue
+                        if key and value and "Next collection" in key.get_text():
+                            raw_value = value.get_text().strip()
+
+                            # Parse the date - values tend to look like "Saturday 15 November"
+                            # Drop any bracketed notes, e.g. "(In progress)"
+                            collection_date_str = raw_value.split("(", 1)[0].strip()
+                            collectionDate = None
+                            if collection_date_str:
+                                year = datetime.now().year
+                                for fmt in possible_formats:
+                                    try:
+                                        date_str = f"{collection_date_str} {year}"
+                                        collectionDate = datetime.strptime(date_str, fmt)
+                                        # Handle year boundary: if parsed date is in the past, assume next year
+                                        if collectionDate.date() < today:
+                                            collectionDate = collectionDate.replace(year=year + 1)
+                                        break
+                                    except ValueError:
+                                        collectionDate = None
+                                        continue

That keeps the intent but uses the possible_formats list effectively and gives you a bit more resilience if the FixMyStreet template changes slightly (similar to the Kingston-upon-Thames parser).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4474633 and c836362.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2)

uk_bin_collection/uk_bin_collection/get_bin_data.py (2)

AbstractGetBinDataClass (43-146)

parse_data (131-136)

uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py (1)

parse_data (27-86)

🪛 Ruff (0.14.4)

uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py

34-34: Unused method argument: page

(ARG002)

70-70: Avoid specifying long messages outside the exception class

(TRY003)

74-74: Avoid specifying long messages outside the exception class

(TRY003)

99-99: Create your own exception

(TRY002)

99-99: Avoid specifying long messages outside the exception class

(TRY003)

105-105: Create your own exception

(TRY002)

105-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run Integration Tests (3.12, 1.8.4)

coderabbitai Bot reviewed Nov 14, 2025

View reviewed changes

Comment thread uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py

Comment thread uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py

coderabbitai Bot reviewed Nov 15, 2025

View reviewed changes

robbrad changed the base branch from master to dec_release December 7, 2025 10:17

robbrad merged commit 2ca72dc into robbrad:dec_release Dec 7, 2025
12 of 14 checks passed

coderabbitai Bot mentioned this pull request Jan 9, 2026

fix: Council Fix Pack - January 2026 #1772

Closed

This was referenced May 12, 2026

fix(MertonCouncil): add postcode search for property ID resolution #2034

Open

fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID #2044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MertonCouncil for new website URL#1716

Fix MertonCouncil for new website URL#1716
robbrad merged 2 commits into
robbrad:dec_releasefrom
davida72:merton-fix

davida72 commented Nov 14, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

davida72 commented Nov 15, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davida72 commented Nov 14, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov Bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davida72 commented Nov 15, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davida72 commented Nov 14, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 14, 2025 •

edited

Loading

codecov Bot commented Nov 14, 2025 •

edited

Loading