fix: Cumberland Council#1764
Conversation
WalkthroughUpdated CumberlandCouncil integration from form-based bin collection retrieval to direct GET requests. Removed postcode from test data and simplified URL to official schedule page. Replaced form token parsing with direct HTML content extraction and line-based bin schedule parsing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1764 +/- ##
=======================================
Coverage 86.79% 86.79%
=======================================
Files 9 9
Lines 1136 1136
=======================================
Hits 986 986
Misses 150 150 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (2)
79-80: Silent exception swallowing may hide parsing failures.Catching
ValueErrorwith a barepassmeans malformed dates fail silently. Consider logging to aid debugging when collection dates aren't being captured correctly.except ValueError: - pass + # Log or handle malformed date gracefully + pass # Consider: logging.warning(f"Failed to parse date: {date_str}")
88-90: Use consistent date format reference.Line 76 uses
date_formatvariable, but line 89 hardcodes"%d/%m/%Y". For maintainability, use the same format constant in both places.bindata["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") + key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py
24-24: Probable use of requests call without timeout
(S113)
54-54: datetime may be undefined, or defined from star imports
(F405)
72-72: datetime may be undefined, or defined from star imports
(F405)
76-76: date_format may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
- GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (1)
uk_bin_collection/tests/input.json (1)
629-635: Approved: CumberlandCouncil test data updated correctly for direct-GET approach.The test configuration update correctly reflects the refactored CumberlandCouncil integration:
- URL simplified to the official schedule page (line 631)
- Postcode field removed (aligns with UPRN-only direct GET approach)
- UPRN retained (required for new parsing logic)
- Wiki guidance unchanged (still directs to FindMyAddress)
To confirm the changes work end-to-end, verify that the new URL structure at line 631 returns expected HTML content when accessed with the UPRN (
10009457328) and that the parser implementation inCumberlandCouncil.pycorrectly extracts bin schedule data from the page'sdiv.lgd-region--contentelement (as noted in the summary).
| response = requests.get(url) | ||
| soup = BeautifulSoup(response.content, "html.parser") |
There was a problem hiding this comment.
Add timeout and error handling for the HTTP request.
The requests.get() call lacks a timeout, which can cause indefinite hangs if the server is unresponsive. Additionally, there's no error handling for failed requests.
- response = requests.get(url)
- soup = BeautifulSoup(response.content, "html.parser")
+ response = requests.get(url, timeout=30)
+ response.raise_for_status()
+ soup = BeautifulSoup(response.content, "html.parser")If you prefer graceful degradation instead of raising on error:
response = requests.get(url, timeout=30)
if not response.ok:
return bindata🧰 Tools
🪛 Ruff (0.14.7)
24-24: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py around
lines 24-25 the requests.get call has no timeout and no error handling; update
the code to call requests.get(url, timeout=30) (or another sensible timeout),
wrap the request in a try/except to catch requests.exceptions.RequestException,
and handle non-2xx responses by checking response.ok and either returning the
existing bindata for graceful degradation or raising a clear exception; also
ensure any caught exception is logged or re-raised with context so failures are
observable.
| # Determine the year range from the page header | ||
| year_2026 = "2026" in text_content | ||
|
|
||
| while i < len(lines): | ||
| line = lines[i] | ||
|
|
||
| # Check if this is a month name | ||
| if line in ["January", "February", "March", "April", "May", "June", | ||
| "July", "August", "September", "October", "November", "December"]: | ||
| current_month = line | ||
| # Determine year based on month and context | ||
| if year_2026: | ||
| current_year = "2026" if line in ["January", "February"] else "2025" | ||
| else: | ||
| current_year = str(datetime.now().year) |
There was a problem hiding this comment.
Year inference logic is brittle and will break over time.
The current approach has two issues:
- Checking
"2026" in text_contentmay match unrelated occurrences (addresses, IDs). - Hardcoded years (
"2025","2026") will produce incorrect dates after this period.
Consider deriving the year dynamically based on the current date and whether the month has passed:
- # Determine the year range from the page header
- year_2026 = "2026" in text_content
+ # Base year for date inference
+ today = datetime.now()
+ base_year = today.yearThen in the month handling section:
- # Determine year based on month and context
- if year_2026:
- current_year = "2026" if line in ["January", "February"] else "2025"
- else:
- current_year = str(datetime.now().year)
+ # Determine year: if month is earlier in year than current month,
+ # assume it's next year (for schedules spanning year boundary)
+ month_num = datetime.strptime(line, "%B").month
+ if month_num < today.month:
+ current_year = str(base_year + 1)
+ else:
+ current_year = str(base_year)🧰 Tools
🪛 Ruff (0.14.7)
54-54: datetime may be undefined, or defined from star imports
(F405)
|
Thanks for the fix, I got the logic and assume this works no problem. However, I have tried to setup the integration again and both entries for Cumberland Council (I don't know why there are still two) are requesting "House Number" and "Postcode" - this fixed suggests that this now just needs a UPRN. Has this been made live? |
Fixes issues #1456, #1620, and #1627.
Summary by CodeRabbit
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.