feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium#2061
feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium#2061InertiaUK wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughCalderdale scraper replaced Selenium with requests + BeautifulSoup, added a UPRN/PAON address-matching helper, performs two POSTs to fetch address and collection table, parses bin types and next collection dates, and updated tests to remove web_driver configuration. ChangesCalderdale Selenium-to-HTTP Migration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2061 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)
1-1: 💤 Low valueRemove unused import.
The
remodule is imported but not used anywhere in this file.-import re🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1, Remove the unused import of the re module from CalderdaleCouncil.py: delete the line "import re" (the module is not referenced anywhere in the file, so simply removing that import from the top of the file and running tests/lint will resolve the warning).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 108-117: The date parsing block currently swallows ValueError from
datetime.strptime for date_text which can hide format changes; update the code
in CalderdaleCouncil (the parsing loop that uses datetime.strptime, date_text,
date_format, bin_type and appends to data["bins"]) to record parse failures
instead of silently continuing — either log a warning with context (include the
failing date_text and bin_type) or collect failures and raise a descriptive
exception after the loop so callers notice; ensure the chosen logger or
exception includes which date_text and expected format (date_format) caused the
failure.
- Around line 24-34: The current PAON/UPRN matching logic (variables paon,
paon_norm and list valid) silently returns valid[0][0] when no match is found;
change it so that if the caller supplied a user identifier (paon or UPRN) and no
matching entry in valid is found, the function raises a clear error (e.g.,
ValueError or a custom exception) indicating no match for the provided
identifier, and only fall back to returning valid[0][0] when neither paon nor
UPRN were supplied (i.e., both are falsy). Locate the matching block that
iterates over valid and update control flow to perform the raise on missing
user-supplied matches and preserve the existing default behavior only for truly
absent identifiers.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused import of the re module from CalderdaleCouncil.py:
delete the line "import re" (the module is not referenced anywhere in the file,
so simply removing that import from the top of the file and running tests/lint
will resolve the warning).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 846cae2c-2bfc-4f5a-9545-4b8b1b0cac6e
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
| if paon: | ||
| paon_norm = str(paon).strip().upper() | ||
| for val, text in valid: | ||
| text_upper = text.upper() | ||
| if text_upper.startswith(paon_norm + " ") or text_upper.startswith(paon_norm + ","): | ||
| return val | ||
| for val, text in valid: | ||
| if paon_norm in text.upper(): | ||
| return val | ||
|
|
||
| return valid[0][0] |
There was a problem hiding this comment.
Silent fallback may mask misconfigurations.
When a user-supplied UPRN or PAON fails to match any option, the function silently returns the first address. This could lead to returning incorrect collection data without any warning.
Consider raising an error when a user-provided identifier fails to match, and only defaulting to the first address when neither UPRN nor PAON were supplied:
🛡️ Proposed fix
if paon:
paon_norm = str(paon).strip().upper()
for val, text in valid:
text_upper = text.upper()
if text_upper.startswith(paon_norm + " ") or text_upper.startswith(paon_norm + ","):
return val
for val, text in valid:
if paon_norm in text.upper():
return val
+ raise ValueError(f"No address matching house number/name '{paon}' found in dropdown")
- return valid[0][0]
+ if uprn:
+ raise ValueError(f"UPRN '{uprn}' not found in dropdown")
+
+ # No identifier provided - default to first address
+ return valid[0][0]Based on learnings: In uk_bin_collection/**/*.py, prefer explicit failures over silent defaults to ensure format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 24 - 34, The current PAON/UPRN matching logic (variables paon, paon_norm
and list valid) silently returns valid[0][0] when no match is found; change it
so that if the caller supplied a user identifier (paon or UPRN) and no matching
entry in valid is found, the function raises a clear error (e.g., ValueError or
a custom exception) indicating no match for the provided identifier, and only
fall back to returning valid[0][0] when neither paon nor UPRN were supplied
(i.e., both are falsy). Locate the matching block that iterates over valid and
update control flow to perform the raise on missing user-supplied matches and
preserve the existing default behavior only for truly absent identifiers.
| try: | ||
| parsed = datetime.strptime(date_text, "%A %d %B %Y") | ||
| data["bins"].append( | ||
| { | ||
| "type": bin_type, | ||
| "collectionDate": parsed.strftime(date_format), | ||
| } | ||
| ) | ||
| except ValueError: | ||
| continue |
There was a problem hiding this comment.
Silent date parse failures may hide format changes.
When datetime.strptime fails, the exception is caught and silently discarded. If the council changes their date format, this code will silently return incomplete data rather than alerting to the issue.
Consider logging a warning or raising after accumulating failures:
🛡️ Proposed fix with warning
+import logging
+
+logger = logging.getLogger(__name__)
+
# In the parsing loop:
try:
parsed = datetime.strptime(date_text, "%A %d %B %Y")
data["bins"].append(
{
"type": bin_type,
"collectionDate": parsed.strftime(date_format),
}
)
except ValueError:
- continue
+ logger.warning(f"Failed to parse date '{date_text}' for bin type '{bin_type}'")
+ continueBased on learnings: In uk_bin_collection/**/*.py, prefer explicit failures over silent defaults or swallowed errors to ensure format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 108 - 117, The date parsing block currently swallows ValueError from
datetime.strptime for date_text which can hide format changes; update the code
in CalderdaleCouncil (the parsing loop that uses datetime.strptime, date_text,
date_format, bin_type and appends to data["bins"]) to record parse failures
instead of silently continuing — either log a warning with context (include the
failing date_text and bin_type) or collect failures and raise a descriptive
exception after the loop so callers notice; ensure the chosen logger or
exception includes which date_text and expected format (date_format) caused the
failure.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 54-62: The POST requests using session.post (the call that posts
base_url with data including "postcode": user_postcode and the second POST later
in the file) lack a timeout and can hang; update both session.post invocations
to pass a sensible timeout argument (e.g., timeout=10) so requests will raise on
slow/unresponsive servers, keep using resp.raise_for_status() afterward, and
ensure the change references the same variables (base_url, user_postcode) and
response handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7751eb98-b133-4987-9a32-c76acef6d9a3
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
| resp = session.post( | ||
| base_url, | ||
| data={ | ||
| "postcode": user_postcode, | ||
| "email-address": "", | ||
| "find": "Find an address for this postcode", | ||
| }, | ||
| ) | ||
| resp.raise_for_status() |
There was a problem hiding this comment.
Add timeout to HTTP requests to prevent hanging.
The session.post() call has no timeout. If the council's server is slow or unresponsive, this will block indefinitely.
🛡️ Proposed fix to add timeout
resp = session.post(
base_url,
data={
"postcode": user_postcode,
"email-address": "",
"find": "Find an address for this postcode",
},
+ timeout=30,
)Also apply to the second POST at line 73-83.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resp = session.post( | |
| base_url, | |
| data={ | |
| "postcode": user_postcode, | |
| "email-address": "", | |
| "find": "Find an address for this postcode", | |
| }, | |
| ) | |
| resp.raise_for_status() | |
| resp = session.post( | |
| base_url, | |
| data={ | |
| "postcode": user_postcode, | |
| "email-address": "", | |
| "find": "Find an address for this postcode", | |
| }, | |
| timeout=30, | |
| ) | |
| resp.raise_for_status() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 54 - 62, The POST requests using session.post (the call that posts
base_url with data including "postcode": user_postcode and the second POST later
in the file) lack a timeout and can hang; update both session.post invocations
to pass a sensible timeout argument (e.g., timeout=10) so requests will raise on
slow/unresponsive servers, keep using resp.raise_for_status() afterward, and
ensure the change references the same variables (base_url, user_postcode) and
response handling.
UPRN is no longer required. Users can provide postcode + house number as an alternative. Matches by house number in address dropdown text. Removes Selenium dependency - uses pure HTTP POST to JSP form. Backward compatible - existing UPRN lookups still work.
02f3700 to
004741e
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)
1-1: 💤 Low valueRemove unused
import re.The
remodule is imported but never used in this file.-import re from datetime import datetime🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1, Remove the unused import by deleting the standalone "import re" statement at the top of the file (the unused import in CalderdaleCouncil.py); ensure no other references to the "re" module exist in functions or classes in that file (e.g., any methods in the CalderdaleCouncil class) before removing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused import by deleting the standalone "import re"
statement at the top of the file (the unused import in CalderdaleCouncil.py);
ensure no other references to the "re" module exist in functions or classes in
that file (e.g., any methods in the CalderdaleCouncil class) before removing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cfd36d21-e774-4ab6-a1fd-5166f4eb277f
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
🚧 Files skipped from review as they are similar to previous changes (1)
- uk_bin_collection/tests/input.json
Summary
How it works
Testing
HX3 5EQ+ UPRN100051326778✅HX3 5EQ+ paon95✅Summary by CodeRabbit
Bug Fixes
Documentation