Skip to content

fix: Islington, Worcester#1763

Merged
robbrad merged 1 commit into
masterfrom
feat_dec_fixes
Dec 8, 2025
Merged

fix: Islington, Worcester#1763
robbrad merged 1 commit into
masterfrom
feat_dec_fixes

Conversation

@robbrad
Copy link
Copy Markdown
Owner

@robbrad robbrad commented Dec 8, 2025

Summary by CodeRabbit

  • New Features

    • IslingtonCouncil now accepts postcode and UPRN as input parameters for collection queries.
  • Bug Fixes

    • IslingtonCouncil: Improved collection date parsing and calculation.
    • WorcesterCityCouncil: Added error handling to skip invalid date entries gracefully.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 8, 2025

Walkthrough

Configuration and parser updates for multiple UK councils: FolkstoneandHytheDistrictCouncil adds web_driver field; IslingtonCouncil refactors URL handling with postcode parameter support and switches HTML parsing from table-based to list-based extraction; WorcesterCityCouncil adds defensive date parsing with error handling.

Changes

Cohort / File(s) Summary
Configuration Updates
uk_bin_collection/tests/input.json
Added web_driver field to FolkstoneandHytheDistrictCouncil entry. Refactored IslingtonCouncil entry: removed query-based URLs, introduced separate postcode field (N1 1XR), updated url and wiki_command_url_override to base URLs, updated wiki_note to guidance about passing postcode and UPRN as parameters.
IslingtonCouncil Parser
uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py
Added postcode extraction and validation from kwargs. Updated API endpoint assembly to use real Postcode value instead of placeholder. Added User-Agent HTTP headers. Replaced table-based HTML parsing with list-based approach (now extracts from ul within m-toggle-content). Implemented day-of-week extraction and next collection date calculation. Removed dateutil parsing dependency.
WorcesterCityCouncil Parser
uk_bin_collection/uk_bin_collection/councils/WorcesterCityCouncil.py
Added defensive date parsing with try/except wrapper around "%A %d/%m/%Y" date parsing. Skips entries with invalid date formats while preserving "Not applicable" skip logic. Constructs collection data with type and collectionDate on successful parse.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • IslingtonCouncil.py: HTML parsing logic change from table-based to list-based extraction requires careful verification against live HTML structure; postcode parameter handling and date calculation logic need validation.
  • Configuration changes: Verify postcode value and URL overrides are correct for IslingtonCouncil; confirm web_driver field syntax for FolkstoneandHytheDistrictCouncil.
  • WorcesterCityCouncil.py: Defensive parsing approach is straightforward but needs verification that invalid date entries are appropriately skipped without breaking the overall data collection.

Possibly related PRs

  • Dec release #1754: Overlapping modifications to the same test configuration file and council parser modules (IslingtonCouncil.py and WorcesterCityCouncil.py) with related parser/configuration updates.

Suggested reviewers

  • dp247

Poem

🐰 Hark! The bins do march with postcode in tow,
Lists replace tables, and collection dates flow!
Headers keep 403s at bay,
While gentle try/except catch parsing decay—
A parser's new springtime, hop-hop-hooray!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'fix: Islington, Worcester' is vague and lacks descriptive detail about what was actually fixed in these councils. Provide a more descriptive title that indicates the specific issue being addressed, such as 'fix: Update Islington and Worcester council bin collection parsing' or 'fix: Refactor Islington postcode handling and Worcester date parsing'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat_dec_fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (15d8390) to head (7970654).
⚠️ Report is 15 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1763   +/-   ##
=======================================
  Coverage   86.79%   86.79%           
=======================================
  Files           9        9           
  Lines        1136     1136           
=======================================
  Hits          986      986           
  Misses        150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robbrad robbrad merged commit 3123581 into master Dec 8, 2025
14 of 16 checks passed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/WorcesterCityCouncil.py (1)

1-4: Add explicit datetime import to resolve static analysis warning.

The code uses datetime (lines 48, 62) but relies on star imports from the common module. This creates ambiguity and triggers static analysis warnings.

Apply this diff to add an explicit import:

 import requests
 from bs4 import BeautifulSoup
+from datetime import datetime
 
 from uk_bin_collection.uk_bin_collection.common import *
 from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py (1)

73-73: Consider explicit import for date_format.

The code uses date_format from star imports, which triggers static analysis warnings. While this likely works correctly, explicit imports improve code clarity and maintainability.

-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import (
+    check_uprn,
+    check_postcode,
+    date_format,
+    requests,
+)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5408ab9 and 7970654.

📒 Files selected for processing (3)
  • uk_bin_collection/tests/input.json (2 hunks)
  • uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/WorcesterCityCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (2)
  • check_uprn (67-78)
  • check_postcode (36-49)
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py

11-11: check_uprn may be undefined, or defined from star imports

(F405)


12-12: check_postcode may be undefined, or defined from star imports

(F405)


20-20: requests may be undefined, or defined from star imports

(F405)


73-73: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/WorcesterCityCouncil.py

48-48: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py (2)

14-20: Good implementation of URL construction and anti-403 measures.

The code correctly constructs the API URL with query parameters and includes a User-Agent header to prevent 403 errors from the council's website.


32-46: Well-structured HTML parsing with proper null checks.

The transition from table-based to list-based parsing is implemented correctly with appropriate safety checks for element existence before accessing them.

uk_bin_collection/tests/input.json (1)

1291-1297: Configuration correctly updated to match code changes.

The IslingtonCouncil test configuration properly reflects the refactored parser that now requires postcode and UPRN as separate parameters instead of embedding them in the URL.

"skip_get_url": true,
"uprn": "50032097",
"url": "https://www.folkestone-hythe.gov.uk",
"web_driver": "http://localhost:4444",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check all Selenium web_driver configurations in input.json

echo "=== Counting web_driver URL patterns ==="
rg -o '"web_driver": "[^"]*"' uk_bin_collection/tests/input.json | sort | uniq -c

echo -e "\n=== Line 983 context (FolkstoneandHytheDistrictCouncil) ==="
sed -n '980,990p' uk_bin_collection/tests/input.json | cat -n

echo -e "\n=== Checking FolkstoneandHytheDistrictCouncil full entry ==="
rg -B 5 -A 5 'FolkstoneandHytheDistrictCouncil' uk_bin_collection/tests/input.json

Repository: robbrad/UKBinCollectionData

Length of output: 1377


FolkstoneandHytheDistrictCouncil has inconsistent Selenium web_driver configuration.

The web_driver is set to "http://localhost:4444" while 95 other councils use "http://selenium:4444". This is the only council with this non-standard URL and wasn't mentioned in the PR objectives (which cover only Islington and Worcester).

This inconsistency likely breaks tests if they depend on the Docker Compose Selenium container setup. Confirm whether this was intentional or should be corrected to "http://selenium:4444".

🤖 Prompt for AI Agents
In uk_bin_collection/tests/input.json around line 983, the
FolkstoneandHytheDistrictCouncil entry has web_driver set to
"http://localhost:4444"; change this value to the standard
"http://selenium:4444" to match the other councils, verify there are no other
stray localhost entries for web_driver, and run the test suite to confirm the
Docker Compose Selenium-based tests still pass (if this was intentional, add a
comment explaining the deviation instead of changing it).

Comment on lines +10 to +12
postcode = kwargs.get("postcode")
check_uprn(uprn)

api_url = f"https://www.islington.gov.uk/your-area?Postcode=unused&Uprn={uprn}"
response = requests.get(api_url)
check_postcode(postcode)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null check for postcode parameter.

The code retrieves the postcode but doesn't validate it's not None or empty before passing to check_postcode. Unlike check_uprn (line 11), which has built-in None/empty validation, check_postcode will attempt an API call with None/empty values, resulting in a validation error rather than a clear message.

Apply this diff to add explicit validation:

 uprn = kwargs.get("uprn")
 postcode = kwargs.get("postcode")
 check_uprn(uprn)
+if postcode is None or postcode == "":
+    raise ValueError("Invalid postcode")
 check_postcode(postcode)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
postcode = kwargs.get("postcode")
check_uprn(uprn)
api_url = f"https://www.islington.gov.uk/your-area?Postcode=unused&Uprn={uprn}"
response = requests.get(api_url)
check_postcode(postcode)
postcode = kwargs.get("postcode")
check_uprn(uprn)
if postcode is None or postcode == "":
raise ValueError("Invalid postcode")
check_postcode(postcode)
🧰 Tools
🪛 Ruff (0.14.7)

11-11: check_uprn may be undefined, or defined from star imports

(F405)


12-12: check_postcode may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py around lines
10 to 12, the postcode is retrieved but not validated before being passed to
check_postcode; add an explicit null/empty check (e.g., if not postcode or
postcode.strip() == "") and raise a clear ValueError or return an appropriate
error before calling check_postcode, or otherwise only call check_postcode when
postcode is present, so the function does not attempt an API call with
None/empty values.


if collection_day_name:
# Calculate next occurrence of this day
from datetime import datetime, timedelta
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Move datetime imports to module level.

Importing datetime and timedelta inside the loop (which runs for each list item) is inefficient and goes against Python best practices.

Apply this diff to move imports to the top:

 from bs4 import BeautifulSoup
+from datetime import datetime, timedelta
 
 from uk_bin_collection.uk_bin_collection.common import *
 from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass

Then remove the import from line 57:

                         if collection_day_name:
                             # Calculate next occurrence of this day
-                            from datetime import datetime, timedelta
-                            
                             today = datetime.now()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py around line
57, the code imports datetime and timedelta inside a loop; move "from datetime
import datetime, timedelta" to the module/top of the file with other imports and
then remove the in-loop import at line 57 so the loop reuses the module-level
import.

Comment on lines +64 to +66
days_ahead = target_day - current_day
if days_ahead <= 0: # Target day already happened this week
days_ahead += 7
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify same-day collection handling.

The logic if days_ahead <= 0: days_ahead += 7 treats same-day collections (when days_ahead == 0) as already completed, scheduling them for next week. This assumption may not be correct if the collection hasn't happened yet today. Consider:

  1. Adding a time-based check if same-day collections should be included
  2. Or documenting that same-day collections are intentionally scheduled for the following week
  3. The comment "Target day already happened this week" is misleading for the == 0 case

Suggested improvement:

                             # Calculate days until next occurrence
                             days_ahead = target_day - current_day
-                            if days_ahead <= 0:  # Target day already happened this week
+                            if days_ahead < 0:  # Target day already happened this week
+                                days_ahead += 7
+                            elif days_ahead == 0:  # Collection day is today - assume it's for next week
                                 days_ahead += 7
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
days_ahead = target_day - current_day
if days_ahead <= 0: # Target day already happened this week
days_ahead += 7
days_ahead = target_day - current_day
if days_ahead < 0: # Target day already happened this week
days_ahead += 7
elif days_ahead == 0: # Collection day is today - assume it's for next week
days_ahead += 7
🤖 Prompt for AI Agents
uk_bin_collection/uk_bin_collection/councils/IslingtonCouncil.py lines 64-66:
the current condition `if days_ahead <= 0: days_ahead += 7` treats same-day
(`days_ahead == 0`) as already passed; either handle same-day explicitly or make
the current behavior clear. Fix by either (A) changing logic to treat only past
days as next-week (`if days_ahead < 0: days_ahead += 7`) and add a comment that
same-day is considered upcoming, or (B) implement a time-based check for
same-day collections (compare current time to a collection cutoff/collection
datetime and only add 7 when the collection time has passed), or (C) if the
existing behavior is intentional, update the comment to say "Target day already
happened this week or collection time has passed; schedule for next week" so the
`== 0` case is documented. Choose one approach and apply it consistently,
updating the comment accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant