Skip to content

fix: Cumberland Council#1764

Merged
robbrad merged 1 commit into
masterfrom
feat_dec_fixes
Dec 8, 2025
Merged

fix: Cumberland Council#1764
robbrad merged 1 commit into
masterfrom
feat_dec_fixes

Conversation

@robbrad
Copy link
Copy Markdown
Owner

@robbrad robbrad commented Dec 8, 2025

Fixes issues #1456, #1620, and #1627.

Summary by CodeRabbit

Bug Fixes

  • Cumberland Council bin collection data now retrieves from the official council schedule page.
  • Postcode field is no longer required for Cumberland Council queries.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 8, 2025

Walkthrough

Updated CumberlandCouncil integration from form-based bin collection retrieval to direct GET requests. Removed postcode from test data and simplified URL to official schedule page. Replaced form token parsing with direct HTML content extraction and line-based bin schedule parsing.

Changes

Cohort / File(s) Summary
CumberlandCouncil Test Data
uk_bin_collection/tests/input.json
Removed postcode field and updated URL from renderform endpoint to official Cumberland bin-collection schedule page; UPRN identifier preserved.
CumberlandCouncil Implementation
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py
Refactored data retrieval from multi-step form-based POST/GET sequence to direct GET using UPRN. Replaced form token extraction with simplified HTML parsing targeting lgd-region--content div. Introduced month/year context detection and line-based bin type/date mapping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Parsing logic changes: New month/year heuristic (2026 context) and line-based tokenization require validation against expected output
  • HTML structure dependency: Direct reliance on lgd-region--content class selector may be fragile; verify with council's current page structure
  • Test coverage: Confirm test data (input.json) aligns with parsing expectations and covers edge cases

Suggested reviewers

  • dp247

Poem

🐰 A form-based fetch now becomes a sprint,
Direct HTML parsing leaves no hint,
The UPRN guides us down the street,
Where bin schedules and logic sweetly meet!
No tokens or tokens, just content so neat! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'fix: Cumberland Council' is vague and does not clearly specify what aspect of Cumberland Council functionality is being fixed or improved. Consider using a more specific title that describes the actual fix, such as 'fix: Replace form-based bin collection retrieval with direct UPRN lookup for Cumberland Council' or 'fix: Simplify Cumberland Council bin collection parsing logic'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat_dec_fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (7970654) to head (5849bac).
⚠️ Report is 14 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1764   +/-   ##
=======================================
  Coverage   86.79%   86.79%           
=======================================
  Files           9        9           
  Lines        1136     1136           
=======================================
  Hits          986      986           
  Misses        150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robbrad robbrad merged commit 0b13739 into master Dec 8, 2025
14 of 16 checks passed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (2)

79-80: Silent exception swallowing may hide parsing failures.

Catching ValueError with a bare pass means malformed dates fail silently. Consider logging to aid debugging when collection dates aren't being captured correctly.

                     except ValueError:
-                        pass
+                        # Log or handle malformed date gracefully
+                        pass  # Consider: logging.warning(f"Failed to parse date: {date_str}")

88-90: Use consistent date format reference.

Line 76 uses date_format variable, but line 89 hardcodes "%d/%m/%Y". For maintainability, use the same format constant in both places.

         bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0a7f11 and 5849bac.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py

24-24: Probable use of requests call without timeout

(S113)


54-54: datetime may be undefined, or defined from star imports

(F405)


72-72: datetime may be undefined, or defined from star imports

(F405)


76-76: date_format may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (1)
uk_bin_collection/tests/input.json (1)

629-635: Approved: CumberlandCouncil test data updated correctly for direct-GET approach.

The test configuration update correctly reflects the refactored CumberlandCouncil integration:

  • URL simplified to the official schedule page (line 631)
  • Postcode field removed (aligns with UPRN-only direct GET approach)
  • UPRN retained (required for new parsing logic)
  • Wiki guidance unchanged (still directs to FindMyAddress)

To confirm the changes work end-to-end, verify that the new URL structure at line 631 returns expected HTML content when accessed with the UPRN (10009457328) and that the parser implementation in CumberlandCouncil.py correctly extracts bin schedule data from the page's div.lgd-region--content element (as noted in the summary).

Comment on lines +24 to +25
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout and error handling for the HTTP request.

The requests.get() call lacks a timeout, which can cause indefinite hangs if the server is unresponsive. Additionally, there's no error handling for failed requests.

-        response = requests.get(url)
-        soup = BeautifulSoup(response.content, "html.parser")
+        response = requests.get(url, timeout=30)
+        response.raise_for_status()
+        soup = BeautifulSoup(response.content, "html.parser")

If you prefer graceful degradation instead of raising on error:

response = requests.get(url, timeout=30)
if not response.ok:
    return bindata
🧰 Tools
🪛 Ruff (0.14.7)

24-24: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py around
lines 24-25 the requests.get call has no timeout and no error handling; update
the code to call requests.get(url, timeout=30) (or another sensible timeout),
wrap the request in a try/except to catch requests.exceptions.RequestException,
and handle non-2xx responses by checking response.ok and either returning the
existing bindata for graceful degradation or raising a clear exception; also
ensure any caught exception is logged or re-raised with context so failures are
observable.

Comment on lines +40 to +54
# Determine the year range from the page header
year_2026 = "2026" in text_content

while i < len(lines):
line = lines[i]

# Check if this is a month name
if line in ["January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"]:
current_month = line
# Determine year based on month and context
if year_2026:
current_year = "2026" if line in ["January", "February"] else "2025"
else:
current_year = str(datetime.now().year)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Year inference logic is brittle and will break over time.

The current approach has two issues:

  1. Checking "2026" in text_content may match unrelated occurrences (addresses, IDs).
  2. Hardcoded years ("2025", "2026") will produce incorrect dates after this period.

Consider deriving the year dynamically based on the current date and whether the month has passed:

-        # Determine the year range from the page header
-        year_2026 = "2026" in text_content
+        # Base year for date inference
+        today = datetime.now()
+        base_year = today.year

Then in the month handling section:

-                # Determine year based on month and context
-                if year_2026:
-                    current_year = "2026" if line in ["January", "February"] else "2025"
-                else:
-                    current_year = str(datetime.now().year)
+                # Determine year: if month is earlier in year than current month,
+                # assume it's next year (for schedules spanning year boundary)
+                month_num = datetime.strptime(line, "%B").month
+                if month_num < today.month:
+                    current_year = str(base_year + 1)
+                else:
+                    current_year = str(base_year)
🧰 Tools
🪛 Ruff (0.14.7)

54-54: datetime may be undefined, or defined from star imports

(F405)

@robbrad robbrad mentioned this pull request Dec 8, 2025
4 tasks
@jasperatus16
Copy link
Copy Markdown

Thanks for the fix, I got the logic and assume this works no problem.

However, I have tried to setup the integration again and both entries for Cumberland Council (I don't know why there are still two) are requesting "House Number" and "Postcode" - this fixed suggests that this now just needs a UPRN.

Has this been made live?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants