Fix MertonCouncil for new website URL#1716
Conversation
The Merton Council website has changed from myneighbourhood.merton.gov.uk to fixmystreet.merton.gov.uk. Updated the script to: - Use the new FixMyStreet-based website - Poll the page until JavaScript loads collection data - Accept UPRN parameter instead of full URL - Skip booking services (Bulky waste, Garden waste) Fixes #1664 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughUpdated Merton Council integration and test config: test input now includes UPRN and skip flag; council parser switched from static HTML table parsing to fetching an UPRN-specific FixMyStreet page, polling for JS-loaded content, extracting collection types and "Next collection" dates, and returning sorted bins with formatted dates. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Merton as MertonCouncil.py
participant FMS as fixmystreet.merton.gov.uk
participant Parser as HTML Parser
User->>Merton: fetch(uprn, ...)
Merton->>Merton: validate uprn (numeric)
Merton->>FMS: GET /waste/up/{uprn} (User-Agent, timeout)
FMS-->>Merton: HTML response
rect rgb(230,245,255)
Note over Merton,FMS: Polling loop until JS data appears or attempts exhausted
loop while not ready and attempts < MAX_POLLING_ATTEMPTS
Merton->>FMS: GET page (retry)
FMS-->>Merton: HTML
Merton->>Merton: check loading indicator gone
Merton->>Merton: sleep POLLING_SLEEP_SECONDS
end
end
Merton->>Parser: parse div.waste__collections
Parser->>Parser: for each h3.waste-service-name -> skip non-scheduled
Parser->>Parser: extract "Next collection" from summary list
Parser->>Parser: parse date (multi-format, year-rollover)
Parser-->>Merton: list of (bin_type, collectionDate)
Merton->>Merton: sort & format -> data["bins"]
Merton-->>User: return collections
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dec_release #1716 +/- ##
============================================
Coverage 86.79% 86.79%
============================================
Files 9 9
Lines 1136 1136
============================================
Hits 986 986
Misses 150 150 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2)
7-7: Consider explicit imports instead of star import.The star import from
commonmakes it difficult to verify which symbols are used and could lead to namespace pollution. Static analysis tools flagdatetimeusage as potentially undefined (F405).Consider replacing the star import with explicit imports:
-from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import datetime, date_format
35-49: Consider parameterizing polling configuration.The polling mechanism uses hardcoded values for
max_attempts(10) and sleep duration (2 seconds). While these values are reasonable, making them configurable would improve flexibility for different network conditions or slower-loading pages.Additionally, static analysis suggests creating custom exception classes rather than using generic
Exceptionwith long messages (TRY002, TRY003), though this is a minor style consideration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py
22-22: Avoid specifying long messages outside the exception class
(TRY003)
48-48: Create your own exception
(TRY002)
48-48: Avoid specifying long messages outside the exception class
(TRY003)
54-54: Create your own exception
(TRY002)
54-54: Avoid specifying long messages outside the exception class
(TRY003)
95-95: datetime may be undefined, or defined from star imports
(F405)
100-100: datetime may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Unit Tests (3.12, 1.8.4)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/tests/input.json (1)
1511-1515: LGTM! Test data correctly updated for new implementation.The changes properly align with the updated MertonCouncil.py implementation:
- Added
uprnparameter for the new UPRN-based flow- Updated URL to the new FixMyStreet site
- Added
skip_get_url: trueto match the new data fetching approach- Updated
wiki_noteto guide users on obtaining UPRN via FindMyAddressuk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1)
64-72: LGTM! Correctly skips booking services.The logic to skip "Bulky waste" and "Garden waste" correctly implements the PR objective to filter out booking services that aren't regular collections.
|
Wow. That CodeRabbit is something. I'll take a look at the things it's raised. |
This commit addresses all issues raised in the CodeRabbit review: 1. Replace star import with explicit imports - Changed from `import *` to explicit imports of `datetime` and `date_format` - Eliminates namespace pollution and F405 static analysis warnings 2. Add UPRN format validation - Validates that UPRN contains only numeric characters - Prevents potential security/stability issues from invalid input 3. Fix year-boundary date parsing bug - Checks if parsed date is in the past and adjusts to next year - Fixes December-to-January collection date parsing issues 4. Parameterize polling configuration - Moved hardcoded values to class constants (MAX_POLLING_ATTEMPTS, POLLING_SLEEP_SECONDS) - Improves flexibility for varying network conditions 5. Add comprehensive docstrings - Added detailed class and method docstrings - Includes Args, Returns, Raises, Examples, and Notes sections - Significantly improves docstring coverage All changes maintain backward compatibility while improving code quality, security, and maintainability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (1)
68-82: Normalize UPRN before validation to make input handling more robustNice improvement adding numeric validation. You can make this a bit safer and more user-friendly by normalizing first (and reusing the normalized value for the URL):
- Strip surrounding whitespace before checking
isdigit().- Treat
Noneand empty-string UPRNs distinctly.- Use the normalized string when constructing the URL.
Example diff:
- uprn = kwargs.get("uprn") - if not uprn: - raise ValueError("uprn is required") - - # Validate UPRN format (must be numeric only) - if not str(uprn).isdigit(): - raise ValueError("uprn must contain only numeric characters") + uprn_value = kwargs.get("uprn") + if uprn_value is None: + raise ValueError("uprn is required") + + uprn_str = str(uprn_value).strip() + # Validate UPRN format (must be numeric only) + if not uprn_str or not uprn_str.isdigit(): + raise ValueError("uprn must contain only numeric characters") + + uprn = uprn_strThis keeps the behavior strict while avoiding surprises from accidental whitespace.
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (3)
34-47: Clarify unusedpageparameter to address ARG002 and future readers
parse_datano longer uses thepageargument (by design) and Ruff flags this as ARG002. Since you’ve documented thatpageis unused for interface compatibility, consider either renaming it to_pageor adding a# noqa: ARG002comment so this is explicit and lint-clean.
87-105: Handle HTTP errors explicitly and avoid raising bareExceptionThe polling loop works, but a couple of tweaks would make failures easier to reason about and align with the Ruff TRY002/TRY003 hints:
- If the server returns a non-2xx status, you currently fall through and eventually raise “Collections div not found” or a timeout. Consider calling
response.raise_for_status()afterrequests.get(...)so HTTP errors are clearly surfaced.- Replace bare
Exceptionwith a more specific type (e.g.RuntimeErroror a small module-level custom exception) and slightly more contextual messages.For example:
- for attempt in range(1, self.MAX_POLLING_ATTEMPTS + 1): - response = requests.get(url, headers=headers, timeout=10) + for attempt in range(1, self.MAX_POLLING_ATTEMPTS + 1): + response = requests.get(url, headers=headers, timeout=10) + response.raise_for_status() @@ - else: - raise Exception("Timeout waiting for bin collection data to load") + else: + raise RuntimeError("Timeout waiting for bin collection data to load") @@ - collections_div = soup.find("div", class_="waste__collections") - if not collections_div: - raise Exception("Collections div not found") + collections_div = soup.find("div", class_="waste__collections") + if not collections_div: + raise RuntimeError("Collections div not found in waste page HTML")This keeps the polling behavior but produces clearer, more actionable errors.
110-163: Tighten and simplifyNext collectiondate parsingThe year-boundary handling looks good, but the current parsing has a couple of rough edges:
possible_formatsincludes"%A %d %B %Y", butdate_stris built as"day month year", so that format will never match.- Splitting
collection_date_strand takingdate_parts[1]and[2]assumes the format is always"Weekday DD Month", and doesn’t cope with minor template changes (extra text, bracketed notes).You can both simplify and make this more robust by:
- Using the entire (optionally normalised) date string and appending the year.
- Letting
possible_formatsreflect the strings you actually parse.- Reusing a
todayvalue so you don’t calldatetime.now()repeatedly.For example:
- possible_formats = [ - "%d %B %Y", - "%A %d %B %Y", - ] + today = datetime.now().date() + possible_formats = [ + "%A %d %B %Y", # e.g. "Saturday 15 November" + "%d %B %Y", # e.g. "15 November" + ] @@ - if key and value and "Next collection" in key.get_text(): - collection_date_str = value.get_text().strip() - - # Parse the date - format is like "Saturday 15 November" - collectionDate = None - # Try with day of week - date_parts = collection_date_str.split() - if len(date_parts) >= 3: - # Try parsing with day name, day, month - day = date_parts[1] - month = date_parts[2] - year = datetime.now().year - date_str = f"{day} {month} {year}" - - for format in possible_formats: - try: - collectionDate = datetime.strptime( - date_str, format - ) - # Handle year boundary: if parsed date is in the past, assume next year - if collectionDate.date() < datetime.now().date(): - collectionDate = collectionDate.replace(year=year + 1) - break - except ValueError: - continue + if key and value and "Next collection" in key.get_text(): + raw_value = value.get_text().strip() + + # Parse the date - values tend to look like "Saturday 15 November" + # Drop any bracketed notes, e.g. "(In progress)" + collection_date_str = raw_value.split("(", 1)[0].strip() + collectionDate = None + if collection_date_str: + year = datetime.now().year + for fmt in possible_formats: + try: + date_str = f"{collection_date_str} {year}" + collectionDate = datetime.strptime(date_str, fmt) + # Handle year boundary: if parsed date is in the past, assume next year + if collectionDate.date() < today: + collectionDate = collectionDate.replace(year=year + 1) + break + except ValueError: + collectionDate = None + continueThat keeps the intent but uses the
possible_formatslist effectively and gives you a bit more resilience if the FixMyStreet template changes slightly (similar to the Kingston-upon-Thames parser).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (2)
AbstractGetBinDataClass(43-146)parse_data(131-136)uk_bin_collection/uk_bin_collection/councils/KingstonUponThamesCouncil.py (1)
parse_data(27-86)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/MertonCouncil.py
34-34: Unused method argument: page
(ARG002)
70-70: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Avoid specifying long messages outside the exception class
(TRY003)
99-99: Create your own exception
(TRY002)
99-99: Avoid specifying long messages outside the exception class
(TRY003)
105-105: Create your own exception
(TRY002)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
The Merton Council website has changed from myneighbourhood.merton.gov.uk to fixmystreet.merton.gov.uk. Updated the script to:
Fixes #1664
Summary by CodeRabbit
Bug Fixes
Documentation