fix: DoverDistrictCouncil — rewrite for new portal.waste.dover.gov.uk API#2048
fix: DoverDistrictCouncil — rewrite for new portal.waste.dover.gov.uk API#2048InertiaUK wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughDoverDistrictCouncil scraper rewritten to use portal.waste.dover.gov.uk JSON API instead of HTML parsing. Extracts postcode/UPRN/PAON from input, resolves a property point ID via getPropertySearch, fetches collection schedules via getCollectionDays, filters future dates, and returns formatted, sorted bin collections. Test configuration updated with new endpoint URL and postcode field. ChangesDover District Council API Migration
Sequence Diagram(s)sequenceDiagram
participant Caller
participant parse_data
participant _resolve_point_id
participant getPropertySearch_API
participant _get_collections
participant getCollectionDays_API
Caller->>parse_data: postcode / uprn / paon
parse_data->>_resolve_point_id: postcode, uprn, paon
_resolve_point_id->>getPropertySearch_API: POST getPropertySearch (councilId, searchQuery)
getPropertySearch_API-->>_resolve_point_id: properties list (ids, names, uprns)
_resolve_point_id-->>parse_data: pointId
parse_data->>_get_collections: pointId
_get_collections->>getCollectionDays_API: POST getCollectionDays (pointId)
getCollectionDays_API-->>_get_collections: activeServices with serviceSchedules
_get_collections-->>parse_data: bins with formatted future dates
parse_data-->>Caller: {"bins": [...sorted...] }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py (1)
5-5: ⚡ Quick winReplace wildcard import with explicit symbols used by this module.
This removes Ruff F403 noise and makes dependencies clear.
Suggested fix
-from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import check_postcode, date_format🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py` at line 5, The file currently uses a wildcard import ("from uk_bin_collection.uk_bin_collection.common import *") which should be replaced with explicit imports; find every symbol this module uses from common (e.g., classes, functions, constants referenced in DoverDistrictCouncil.py) and replace the wildcard line with a single explicit import list importing only those names (keep the module path uk_bin_collection.uk_bin_collection.common and import the exact identifiers used in this file).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py`:
- Around line 46-58: The code currently falls back to returning
str(data[0]["id"]) when a provided uprn or paon does not match; change this to
raise an explicit error instead: in the UPRN check (the block that iterates over
data comparing prop.get("uprn")) if no match is found raise a ValueError or
LookupError including the requested uprn; likewise in the PAON block
(paon_lower, name.startswith checks) raise a clear exception including the paon
when no match is found; also replace the unconditional return str(data[0]["id"])
with either returning the first id only when no selector was provided or raising
an error if data is empty so callers never get a silently wrong address id.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py`:
- Line 5: The file currently uses a wildcard import ("from
uk_bin_collection.uk_bin_collection.common import *") which should be replaced
with explicit imports; find every symbol this module uses from common (e.g.,
classes, functions, constants referenced in DoverDistrictCouncil.py) and replace
the wildcard line with a single explicit import list importing only those names
(keep the module path uk_bin_collection.uk_bin_collection.common and import the
exact identifiers used in this file).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d80fef67-61e4-446d-94a3-a0d38f58203f
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py
| if uprn: | ||
| for prop in data: | ||
| if str(prop.get("uprn")) == str(uprn): | ||
| return str(prop["id"]) | ||
|
|
||
| if paon: | ||
| paon_lower = paon.strip().lower() | ||
| for prop in data: | ||
| name = prop.get("name", "").lower() | ||
| if name.startswith(paon_lower + " ") or name.startswith(paon_lower + ","): | ||
| return str(prop["id"]) | ||
|
|
||
| return str(data[0]["id"]) |
There was a problem hiding this comment.
Do not silently fall back to the first property when uprn/paon is provided but unmatched.
This can return collections for the wrong address. Raise explicitly when a requested selector is not found.
Suggested fix
if uprn:
for prop in data:
if str(prop.get("uprn")) == str(uprn):
return str(prop["id"])
+ raise ValueError(f"UPRN {uprn} not found for search query {search_query}")
if paon:
paon_lower = paon.strip().lower()
for prop in data:
name = prop.get("name", "").lower()
if name.startswith(paon_lower + " ") or name.startswith(paon_lower + ","):
return str(prop["id"])
+ raise ValueError(f"PAON {paon} not found for search query {search_query}")
return str(data[0]["id"])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if uprn: | |
| for prop in data: | |
| if str(prop.get("uprn")) == str(uprn): | |
| return str(prop["id"]) | |
| if paon: | |
| paon_lower = paon.strip().lower() | |
| for prop in data: | |
| name = prop.get("name", "").lower() | |
| if name.startswith(paon_lower + " ") or name.startswith(paon_lower + ","): | |
| return str(prop["id"]) | |
| return str(data[0]["id"]) | |
| if uprn: | |
| for prop in data: | |
| if str(prop.get("uprn")) == str(uprn): | |
| return str(prop["id"]) | |
| raise ValueError(f"UPRN {uprn} not found") | |
| if paon: | |
| paon_lower = paon.strip().lower() | |
| for prop in data: | |
| name = prop.get("name", "").lower() | |
| if name.startswith(paon_lower + " ") or name.startswith(paon_lower + ","): | |
| return str(prop["id"]) | |
| raise ValueError(f"PAON {paon} not found") | |
| return str(data[0]["id"]) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py` around
lines 46 - 58, The code currently falls back to returning str(data[0]["id"])
when a provided uprn or paon does not match; change this to raise an explicit
error instead: in the UPRN check (the block that iterates over data comparing
prop.get("uprn")) if no match is found raise a ValueError or LookupError
including the requested uprn; likewise in the PAON block (paon_lower,
name.startswith checks) raise a clear exception including the paon when no match
is found; also replace the unconditional return str(data[0]["id"]) with either
returning the first id only when no selector was provided or raising an error if
data is empty so callers never get a silently wrong address id.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2048 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
3b5302e to
d83077e
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py`:
- Line 5: Replace the wildcard import in this module with explicit imports to
satisfy Ruff F403: remove "from uk_bin_collection.uk_bin_collection.common
import *" and instead import only the symbols used here—import check_postcode
and date_format from uk_bin_collection.uk_bin_collection.common (these are
referenced by the DoverDistrictCouncil logic), ensuring the rest of the module
continues to reference check_postcode and date_format unchanged.
- Around line 72-80: The code currently uses permissive .get(..., []) calls on
result which can hide API schema changes; instead validate that resp.json()
returns a dict with an "activeServices" key holding a list and that each service
dict contains a "serviceName" (str) and "serviceSchedules" (list) and each
schedule contains "currentScheduledDate" (non-empty string) before processing;
if any of these checks fail raise a clear exception (e.g. ValueError) including
identifying context (resp.status_code or resp.text) so failures are explicit;
replace uses of result.get("activeServices", []) and
service.get("serviceSchedules", []) with explicit isinstance checks and explicit
error raises referencing result, activeServices, serviceSchedules,
service_name/currentScheduledDate.
- Around line 83-84: The parsed datetime from date_str (dt =
datetime.fromisoformat(date_str)) may be naive and causes TypeError when
compared to timezone-aware now; update the parsing to normalize to UTC like in
GosportBoroughCouncil.py: parse with
datetime.fromisoformat(date_str.replace('Z', '+00:00')), then if dt.tzinfo is
None set dt = dt.replace(tzinfo=timezone.utc) and/or call dt =
dt.astimezone(timezone.utc) before the comparison with now; adjust the code
around the dt, date_str and now usage in DoverDistrictCouncil (the
function/method handling currentScheduledDate) so all datetimes are
timezone-aware UTC.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 25f0e36e-1cf2-433c-ac93-0d66edd1c1c6
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py
🚧 Files skipped from review as they are similar to previous changes (1)
- uk_bin_collection/tests/input.json
| import requests | ||
|
|
||
| from uk_bin_collection.uk_bin_collection.common import * # Consider specific imports | ||
| from uk_bin_collection.uk_bin_collection.common import * |
There was a problem hiding this comment.
Replace wildcard import to resolve Ruff F403 and make dependencies explicit.
from ...common import * is flagged by static analysis. Import only the symbols used in this module (check_postcode, date_format) to keep linting and symbol resolution deterministic.
Suggested change
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import check_postcode, date_format🧰 Tools
🪛 Ruff (0.15.12)
[error] 5-5: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py` at line
5, Replace the wildcard import in this module with explicit imports to satisfy
Ruff F403: remove "from uk_bin_collection.uk_bin_collection.common import *" and
instead import only the symbols used here—import check_postcode and date_format
from uk_bin_collection.uk_bin_collection.common (these are referenced by the
DoverDistrictCouncil logic), ensuring the rest of the module continues to
reference check_postcode and date_format unchanged.
| result = resp.json() | ||
|
|
||
| now = datetime.now(timezone.utc) | ||
| data = {"bins": []} | ||
|
|
||
| for service in result.get("activeServices", []): | ||
| service_name = service.get("serviceName", "") | ||
| for schedule in service.get("serviceSchedules", []): | ||
| date_str = schedule.get("currentScheduledDate") |
There was a problem hiding this comment.
Fail fast on unexpected API response shape instead of silently returning empty bins.
Using .get(..., []) here can mask upstream API changes/errors as “no collections”. Validate required keys/types and raise a clear exception when the payload is not the expected schema.
Suggested hardening
resp.raise_for_status()
result = resp.json()
+ active_services = result.get("activeServices")
+ if not isinstance(active_services, list):
+ raise ValueError("Unexpected getCollectionDays response: missing/invalid 'activeServices'")
now = datetime.now(timezone.utc)
data = {"bins": []}
- for service in result.get("activeServices", []):
+ for service in active_services:
service_name = service.get("serviceName", "")
- for schedule in service.get("serviceSchedules", []):
+ schedules = service.get("serviceSchedules")
+ if not isinstance(schedules, list):
+ raise ValueError(f"Unexpected serviceSchedules for service '{service_name}'")
+ for schedule in schedules:Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py` around
lines 72 - 80, The code currently uses permissive .get(..., []) calls on result
which can hide API schema changes; instead validate that resp.json() returns a
dict with an "activeServices" key holding a list and that each service dict
contains a "serviceName" (str) and "serviceSchedules" (list) and each schedule
contains "currentScheduledDate" (non-empty string) before processing; if any of
these checks fail raise a clear exception (e.g. ValueError) including
identifying context (resp.status_code or resp.text) so failures are explicit;
replace uses of result.get("activeServices", []) and
service.get("serviceSchedules", []) with explicit isinstance checks and explicit
error raises referencing result, activeServices, serviceSchedules,
service_name/currentScheduledDate.
| dt = datetime.fromisoformat(date_str) | ||
| if dt > now: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify repo Python compatibility and current ISO parsing usage.
fd -i 'pyproject.toml' 'setup.cfg' 'setup.py' 'tox.ini' \
| xargs -r rg -n "requires-python|python_requires|envlist"
rg -n --type=py "fromisoformat\\("Repository: robbrad/UKBinCollectionData
Length of output: 1060
🏁 Script executed:
# Read the DoverDistrictCouncil.py file to see the full context around lines 83-84
head -n 100 uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py | tail -n 30Repository: robbrad/UKBinCollectionData
Length of output: 1084
🏁 Script executed:
# Find and read Python version requirements
find . -maxdepth 2 -type f \( -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" \)Repository: robbrad/UKBinCollectionData
Length of output: 88
🏁 Script executed:
# Check the imports and beginning of DoverDistrictCouncil.py
head -n 20 uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.pyRepository: robbrad/UKBinCollectionData
Length of output: 684
🏁 Script executed:
# Read pyproject.toml to check Python version requirements
cat pyproject.tomlRepository: robbrad/UKBinCollectionData
Length of output: 2146
🏁 Script executed:
# Search for examples of what currentScheduledDate might contain
# Look for test data, API docs, or other clues about the date format
rg -A 3 -B 3 "currentScheduledDate" --type=pyRepository: robbrad/UKBinCollectionData
Length of output: 1821
🏁 Script executed:
# Check if there are any test files or fixtures that show what currentScheduledDate format is
find . -type f \( -name "*test*" -o -name "*fixture*" \) -path "*[Dd]over*" | head -20Repository: robbrad/UKBinCollectionData
Length of output: 53
🏁 Script executed:
# Search for any examples or documentation of the API response format
rg -i "currentScheduledDate" --type=json --type=yaml --type=txtRepository: robbrad/UKBinCollectionData
Length of output: 53
🏁 Script executed:
# Test what happens with Python 3.12+ fromisoformat behavior with Z suffix
python3 << 'EOF'
import sys
from datetime import datetime, timezone
print(f"Python version: {sys.version}")
# Test case 1: ISO string with Z suffix (common in API responses)
test_cases = [
"2024-01-15T10:30:00Z",
"2024-01-15T10:30:00+00:00",
"2024-01-15T10:30:00",
]
now = datetime.now(timezone.utc)
print(f"\nnow (aware UTC): {now} - tzinfo: {now.tzinfo}")
for test_str in test_cases:
print(f"\n--- Testing: {test_str} ---")
try:
# Try current approach (no Z handling)
dt = datetime.fromisoformat(test_str)
print(f" fromisoformat() result: {dt} - tzinfo: {dt.tzinfo}")
# Try comparison
result = dt > now
print(f" dt > now: {result}")
except Exception as e:
print(f" ERROR with fromisoformat(): {type(e).__name__}: {e}")
try:
# Try suggested approach
dt_fixed = datetime.fromisoformat(test_str.replace("Z", "+00:00"))
if dt_fixed.tzinfo is None:
dt_fixed = dt_fixed.replace(tzinfo=timezone.utc)
result = dt_fixed > now
print(f" Fixed approach: {dt_fixed} - tzinfo: {dt_fixed.tzinfo}, dt > now: {result}")
except Exception as e:
print(f" ERROR with fixed approach: {type(e).__name__}: {e}")
EOFRepository: robbrad/UKBinCollectionData
Length of output: 872
🏁 Script executed:
# Test Python 3.12+ Z handling to verify it works
python3.12 << 'EOF' 2>&1 || echo "Python 3.12 not available, but 3.11 test confirmed behavior"
import sys
from datetime import datetime, timezone
print(f"Python version: {sys.version}")
test_str = "2024-01-15T10:30:00Z"
dt = datetime.fromisoformat(test_str)
print(f"fromisoformat('...Z'): {dt} - tzinfo: {dt.tzinfo}")
EOFRepository: robbrad/UKBinCollectionData
Length of output: 180
🏁 Script executed:
# Check if there are any other references to the Dover API format or what the actual data looks like
rg -i "dove" --type=py -A 5 -B 5 | head -50Repository: robbrad/UKBinCollectionData
Length of output: 1222
🏁 Script executed:
# Look for any existing handling of currentScheduledDate in tests or examples
find . -type f -name "*.py" | xargs rg -l "currentScheduledDate" | xargs -I {} grep -C 10 "currentScheduledDate" {}Repository: robbrad/UKBinCollectionData
Length of output: 1881
Normalize parsed schedule datetimes to timezone-aware UTC before comparison.
The now variable is timezone-aware UTC, but datetime.fromisoformat(date_str) can produce naive datetimes if currentScheduledDate lacks timezone info. This causes TypeError on comparison. Add explicit Z-to-offset replacement and tzinfo normalization.
Suggested fix
- dt = datetime.fromisoformat(date_str)
+ dt = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+ if dt.tzinfo is None:
+ dt = dt.replace(tzinfo=timezone.utc)(This pattern is already used in GosportBoroughCouncil.py.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/DoverDistrictCouncil.py` around
lines 83 - 84, The parsed datetime from date_str (dt =
datetime.fromisoformat(date_str)) may be naive and causes TypeError when
compared to timezone-aware now; update the parsing to normalize to UTC like in
GosportBoroughCouncil.py: parse with
datetime.fromisoformat(date_str.replace('Z', '+00:00')), then if dt.tzinfo is
None set dt = dt.replace(tzinfo=timezone.utc) and/or call dt =
dt.astimezone(timezone.utc) before the comparison with now; adjust the code
around the dt, date_str and now usage in DoverDistrictCouncil (the
function/method handling currentScheduledDate) so all datetimes are
timezone-aware UTC.
Dover migrated from
collections.dover.gov.uk(server-rendered HTML withresults-table-wrapperdivs) toportal.waste.dover.gov.uk(Next.js SPA). The old URL redirects to the new domain and the HTML tables no longer exist.Rewritten as pure
requestsagainst the new JSON API:/api/getPropertySearch— postcode search withcouncilId: "39", returns property list withpointIdand UPRN/api/getCollectionDays— takespointId+pointType: "PointAddress"+councilId, returns active services withserviceSchedulescontainingcurrentScheduledDateNo Selenium needed. Updated
input.jsonwith postcode and new URL.Summary by CodeRabbit
Improvements
Tests