Skip to content

Fix MertonCouncil for new website URL#1716

Merged
robbrad merged 2 commits into
robbrad:dec_releasefrom
davida72:merton-fix
Dec 7, 2025
Merged

Fix MertonCouncil for new website URL#1716
robbrad merged 2 commits into
robbrad:dec_releasefrom
davida72:merton-fix

Conversation

@davida72
Copy link
Copy Markdown
Contributor

@davida72 davida72 commented Nov 14, 2025

The Merton Council website has changed from myneighbourhood.merton.gov.uk to fixmystreet.merton.gov.uk. Updated the script to:

  • Use the new FixMyStreet-based website
  • Poll the page until JavaScript loads collection data
  • Accept UPRN parameter instead of full URL
  • Skip booking services (Bulky waste, Garden waste)

Fixes #1664

Summary by CodeRabbit

  • Bug Fixes

    • Merton Council waste collection now uses the FixMyStreet waste page and properly handles dynamic (JS-loaded) content, improving reliability and accuracy of bin schedules.
    • Skips non-scheduled services (e.g., bulky/garden waste) and robustly parses next-collection dates for correct ordering.
  • Documentation

    • Simplified instructions for obtaining your UPRN via FindMyAddress; a UPRN has been added for Merton for testing.

The Merton Council website has changed from myneighbourhood.merton.gov.uk
to fixmystreet.merton.gov.uk. Updated the script to:
- Use the new FixMyStreet-based website
- Poll the page until JavaScript loads collection data
- Accept UPRN parameter instead of full URL
- Skip booking services (Bulky waste, Garden waste)

Fixes #1664

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 14, 2025

Walkthrough

Updated Merton Council integration and test config: test input now includes UPRN and skip flag; council parser switched from static HTML table parsing to fetching an UPRN-specific FixMyStreet page, polling for JS-loaded content, extracting collection types and "Next collection" dates, and returning sorted bins with formatted dates.

Changes

Cohort / File(s) Summary
Test Configuration
uk_bin_collection/tests/input.json
Added uprn: "4328213" and skip_get_url: true for MertonCouncil; updated url to https://fixmystreet.merton.gov.uk/waste/; removed wiki_command_url_override; simplified wiki_note to use FindMyAddress.
Council Implementation
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py
Replaced static BS4 table parsing with UPRN-based GET to FixMyStreet and polling for JS-loaded content. Added UPRN validation, polling constants (MAX_POLLING_ATTEMPTS, POLLING_SLEEP_SECONDS), request headers/timeouts and retry loop. New selectors target div.waste__collections, h3.waste-service-name, and GOV.UK summary lists to extract "Next collection" dates, with multi-format date parsing, year-rollover handling, service-type filtering, and output as data["bins"] sorted by date. Added error handling for missing containers and timeouts.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Merton as MertonCouncil.py
    participant FMS as fixmystreet.merton.gov.uk
    participant Parser as HTML Parser

    User->>Merton: fetch(uprn, ...)
    Merton->>Merton: validate uprn (numeric)
    Merton->>FMS: GET /waste/up/{uprn} (User-Agent, timeout)
    FMS-->>Merton: HTML response

    rect rgb(230,245,255)
    Note over Merton,FMS: Polling loop until JS data appears or attempts exhausted
    loop while not ready and attempts < MAX_POLLING_ATTEMPTS
        Merton->>FMS: GET page (retry)
        FMS-->>Merton: HTML
        Merton->>Merton: check loading indicator gone
        Merton->>Merton: sleep POLLING_SLEEP_SECONDS
    end
    end

    Merton->>Parser: parse div.waste__collections
    Parser->>Parser: for each h3.waste-service-name -> skip non-scheduled
    Parser->>Parser: extract "Next collection" from summary list
    Parser->>Parser: parse date (multi-format, year-rollover)
    Parser-->>Merton: list of (bin_type, collectionDate)
    Merton->>Merton: sort & format -> data["bins"]
    Merton-->>User: return collections
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas to focus on:
    • Polling logic, timeout and retry behavior (MAX_POLLING_ATTEMPTS, POLLING_SLEEP_SECONDS)
    • UPRN validation and error messages
    • Robustness of CSS selectors and handling when structure differs
    • Date parsing (formats supported, year-rollover edge cases)
    • Tests/input.json changes and any test expectations impacted

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐇 From old myneighbourhood to fixmystreet we hop,

UPRN in paw, we poll till the JS stops,
Bins lined up neatly with dates in a row,
Hooray for parsed collections — off we go! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing MertonCouncil for a new website URL, which aligns with the primary objective in the linked issue.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #1664: updated URL to fixmystreet.merton.gov.uk/waste, implemented JS polling, added UPRN support, and skips non-scheduled services.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing MertonCouncil for the new FixMyStreet URL. No out-of-scope changes detected; modifications to input.json and MertonCouncil.py align with the stated objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (37c8a80) to head (c836362).
⚠️ Report is 97 commits behind head on dec_release.

Additional details and impacted files
@@             Coverage Diff              @@
##           dec_release    #1716   +/-   ##
============================================
  Coverage        86.79%   86.79%           
============================================
  Files                9        9           
  Lines             1136     1136           
============================================
  Hits               986      986           
  Misses             150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2)

7-7: Consider explicit imports instead of star import.

The star import from common makes it difficult to verify which symbols are used and could lead to namespace pollution. Static analysis tools flag datetime usage as potentially undefined (F405).

Consider replacing the star import with explicit imports:

-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import datetime, date_format

35-49: Consider parameterizing polling configuration.

The polling mechanism uses hardcoded values for max_attempts (10) and sleep duration (2 seconds). While these values are reasonable, making them configurable would improve flexibility for different network conditions or slower-loading pages.

Additionally, static analysis suggests creating custom exception classes rather than using generic Exception with long messages (TRY002, TRY003), though this is a minor style consideration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c8a80 and 4474633.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py

22-22: Avoid specifying long messages outside the exception class

(TRY003)


48-48: Create your own exception

(TRY002)


48-48: Avoid specifying long messages outside the exception class

(TRY003)


54-54: Create your own exception

(TRY002)


54-54: Avoid specifying long messages outside the exception class

(TRY003)


95-95: datetime may be undefined, or defined from star imports

(F405)


100-100: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/tests/input.json (1)

1511-1515: LGTM! Test data correctly updated for new implementation.

The changes properly align with the updated MertonCouncil.py implementation:

  • Added uprn parameter for the new UPRN-based flow
  • Updated URL to the new FixMyStreet site
  • Added skip_get_url: true to match the new data fetching approach
  • Updated wiki_note to guide users on obtaining UPRN via FindMyAddress
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1)

64-72: LGTM! Correctly skips booking services.

The logic to skip "Bulky waste" and "Garden waste" correctly implements the PR objective to filter out booking services that aren't regular collections.

Comment thread uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py
Comment thread uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py
@davida72
Copy link
Copy Markdown
Contributor Author

Wow. That CodeRabbit is something. I'll take a look at the things it's raised.

This commit addresses all issues raised in the CodeRabbit review:

1. Replace star import with explicit imports
   - Changed from `import *` to explicit imports of `datetime` and `date_format`
   - Eliminates namespace pollution and F405 static analysis warnings

2. Add UPRN format validation
   - Validates that UPRN contains only numeric characters
   - Prevents potential security/stability issues from invalid input

3. Fix year-boundary date parsing bug
   - Checks if parsed date is in the past and adjusts to next year
   - Fixes December-to-January collection date parsing issues

4. Parameterize polling configuration
   - Moved hardcoded values to class constants (MAX_POLLING_ATTEMPTS, POLLING_SLEEP_SECONDS)
   - Improves flexibility for varying network conditions

5. Add comprehensive docstrings
   - Added detailed class and method docstrings
   - Includes Args, Returns, Raises, Examples, and Notes sections
   - Significantly improves docstring coverage

All changes maintain backward compatibility while improving code quality,
security, and maintainability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1)

68-82: Normalize UPRN before validation to make input handling more robust

Nice improvement adding numeric validation. You can make this a bit safer and more user-friendly by normalizing first (and reusing the normalized value for the URL):

  • Strip surrounding whitespace before checking isdigit().
  • Treat None and empty-string UPRNs distinctly.
  • Use the normalized string when constructing the URL.

Example diff:

-        uprn = kwargs.get("uprn")
-        if not uprn:
-            raise ValueError("uprn is required")
-
-        # Validate UPRN format (must be numeric only)
-        if not str(uprn).isdigit():
-            raise ValueError("uprn must contain only numeric characters")
+        uprn_value = kwargs.get("uprn")
+        if uprn_value is None:
+            raise ValueError("uprn is required")
+
+        uprn_str = str(uprn_value).strip()
+        # Validate UPRN format (must be numeric only)
+        if not uprn_str or not uprn_str.isdigit():
+            raise ValueError("uprn must contain only numeric characters")
+
+        uprn = uprn_str

This keeps the behavior strict while avoiding surprises from accidental whitespace.

🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (3)

34-47: Clarify unused page parameter to address ARG002 and future readers

parse_data no longer uses the page argument (by design) and Ruff flags this as ARG002. Since you’ve documented that page is unused for interface compatibility, consider either renaming it to _page or adding a # noqa: ARG002 comment so this is explicit and lint-clean.


87-105: Handle HTTP errors explicitly and avoid raising bare Exception

The polling loop works, but a couple of tweaks would make failures easier to reason about and align with the Ruff TRY002/TRY003 hints:

  • If the server returns a non-2xx status, you currently fall through and eventually raise “Collections div not found” or a timeout. Consider calling response.raise_for_status() after requests.get(...) so HTTP errors are clearly surfaced.
  • Replace bare Exception with a more specific type (e.g. RuntimeError or a small module-level custom exception) and slightly more contextual messages.

For example:

-        for attempt in range(1, self.MAX_POLLING_ATTEMPTS + 1):
-            response = requests.get(url, headers=headers, timeout=10)
+        for attempt in range(1, self.MAX_POLLING_ATTEMPTS + 1):
+            response = requests.get(url, headers=headers, timeout=10)
+            response.raise_for_status()
@@
-                else:
-                    raise Exception("Timeout waiting for bin collection data to load")
+                else:
+                    raise RuntimeError("Timeout waiting for bin collection data to load")
@@
-        collections_div = soup.find("div", class_="waste__collections")
-        if not collections_div:
-            raise Exception("Collections div not found")
+        collections_div = soup.find("div", class_="waste__collections")
+        if not collections_div:
+            raise RuntimeError("Collections div not found in waste page HTML")

This keeps the polling behavior but produces clearer, more actionable errors.


110-163: Tighten and simplify Next collection date parsing

The year-boundary handling looks good, but the current parsing has a couple of rough edges:

  • possible_formats includes "%A %d %B %Y", but date_str is built as "day month year", so that format will never match.
  • Splitting collection_date_str and taking date_parts[1] and [2] assumes the format is always "Weekday DD Month", and doesn’t cope with minor template changes (extra text, bracketed notes).

You can both simplify and make this more robust by:

  • Using the entire (optionally normalised) date string and appending the year.
  • Letting possible_formats reflect the strings you actually parse.
  • Reusing a today value so you don’t call datetime.now() repeatedly.

For example:

-        possible_formats = [
-            "%d %B %Y",
-            "%A %d %B %Y",
-        ]
+        today = datetime.now().date()
+        possible_formats = [
+            "%A %d %B %Y",  # e.g. "Saturday 15 November"
+            "%d %B %Y",     # e.g. "15 November"
+        ]
@@
-                        if key and value and "Next collection" in key.get_text():
-                            collection_date_str = value.get_text().strip()
-
-                            # Parse the date - format is like "Saturday 15 November"
-                            collectionDate = None
-                            # Try with day of week
-                            date_parts = collection_date_str.split()
-                            if len(date_parts) >= 3:
-                                # Try parsing with day name, day, month
-                                day = date_parts[1]
-                                month = date_parts[2]
-                                year = datetime.now().year
-                                date_str = f"{day} {month} {year}"
-
-                                for format in possible_formats:
-                                    try:
-                                        collectionDate = datetime.strptime(
-                                            date_str, format
-                                        )
-                                        # Handle year boundary: if parsed date is in the past, assume next year
-                                        if collectionDate.date() < datetime.now().date():
-                                            collectionDate = collectionDate.replace(year=year + 1)
-                                        break
-                                    except ValueError:
-                                        continue
+                        if key and value and "Next collection" in key.get_text():
+                            raw_value = value.get_text().strip()
+
+                            # Parse the date - values tend to look like "Saturday 15 November"
+                            # Drop any bracketed notes, e.g. "(In progress)"
+                            collection_date_str = raw_value.split("(", 1)[0].strip()
+                            collectionDate = None
+                            if collection_date_str:
+                                year = datetime.now().year
+                                for fmt in possible_formats:
+                                    try:
+                                        date_str = f"{collection_date_str} {year}"
+                                        collectionDate = datetime.strptime(date_str, fmt)
+                                        # Handle year boundary: if parsed date is in the past, assume next year
+                                        if collectionDate.date() < today:
+                                            collectionDate = collectionDate.replace(year=year + 1)
+                                        break
+                                    except ValueError:
+                                        collectionDate = None
+                                        continue

That keeps the intent but uses the possible_formats list effectively and gives you a bit more resilience if the FixMyStreet template changes slightly (similar to the Kingston-upon-Thames parser).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4474633 and c836362.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (2)
  • AbstractGetBinDataClass (43-146)
  • parse_data (131-136)
uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py (1)
  • parse_data (27-86)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py

34-34: Unused method argument: page

(ARG002)


70-70: Avoid specifying long messages outside the exception class

(TRY003)


74-74: Avoid specifying long messages outside the exception class

(TRY003)


99-99: Create your own exception

(TRY002)


99-99: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Create your own exception

(TRY002)


105-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)

@robbrad robbrad changed the base branch from master to dec_release December 7, 2025 10:17
@robbrad robbrad merged commit 2ca72dc into robbrad:dec_release Dec 7, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change in URL - Merton

2 participants