Skip to content

feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium#2061

Open
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:feat/calderdale-postcode-lookup
Open

feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium#2061
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:feat/calderdale-postcode-lookup

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

Summary

  • Users can now provide either UPRN or postcode + house number — whichever they have.
  • UPRN takes priority when provided (backward compatible with existing users).
  • Removes Selenium dependency — uses pure HTTP POST to the JSP form. The reCAPTCHA was JavaScript-only on the Selenium path; the server-side JSP endpoint has no captcha.
  • When only postcode + house number is given, the scraper matches by house number/name in the address dropdown text.
  • Falls back to first address if no match found.

How it works

  1. POST postcode → get address dropdown
  2. If UPRN provided → match by UPRN value in dropdown options
  3. If no UPRN → match by house number/name in option text
  4. POST matched UPRN → get collection table

Testing

  • UPRN path (backward compat): HX3 5EQ + UPRN 100051326778
  • Postcode + house number: HX3 5EQ + paon 95
  • API end-to-end via bin-resolve-v2.php ✅

Summary by CodeRabbit

  • Bug Fixes

    • Calderdale bin collection now fetches data via direct HTTP (no browser automation), improving reliability.
    • Address matching enhanced: postcode + house number are sufficient; UPRN accepted but optional.
  • Documentation

    • Test data and notes updated to reflect the new Calderdale configuration and required inputs.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

Calderdale scraper replaced Selenium with requests + BeautifulSoup, added a UPRN/PAON address-matching helper, performs two POSTs to fetch address and collection table, parses bin types and next collection dates, and updated tests to remove web_driver configuration.

Changes

Calderdale Selenium-to-HTTP Migration

Layer / File(s) Summary
Import and address matching setup
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
Replaced Selenium imports with requests and BeautifulSoup. Added _match_address() to select dropdown options by exact zero-padded UPRN or case-insensitive PAON prefix/substring matching, raising on no options.
HTTP-based collection data parsing
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
Rewrote parse_data() to use two sequential HTTP POSTs: first to get the address select#uprn and determine matched UPRN via _match_address(), second to retrieve table#collection. Parses bin type and next collection date from paragraphs containing "will be your next collection", sorts by parsed date, and uses resp.raise_for_status() plus ValueError checks for missing elements.
Test configuration update
uk_bin_collection/tests/input.json
Updated CalderdaleCouncil test entry with new example postcode, house_number, and uprn; removed web_driver/Selenium config; revised wiki_note to state postcode + house number are sufficient and UPRN is accepted but not required.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐰 I hopped from Selenium to HTTP light,
POSTs and BeautifulSoup parse through the night.
UPRN or house number, the match is now neat,
Dates sorted and tidy — no browser to meet.
Hooray for clean code and a faster bin beat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding postcode + house number lookup capability and removing the Selenium dependency from CalderdaleCouncil.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (004741e).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2061   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)

1-1: 💤 Low value

Remove unused import.

The re module is imported but not used anywhere in this file.

-import re
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Remove the unused import of the re module from CalderdaleCouncil.py: delete the
line "import re" (the module is not referenced anywhere in the file, so simply
removing that import from the top of the file and running tests/lint will
resolve the warning).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 108-117: The date parsing block currently swallows ValueError from
datetime.strptime for date_text which can hide format changes; update the code
in CalderdaleCouncil (the parsing loop that uses datetime.strptime, date_text,
date_format, bin_type and appends to data["bins"]) to record parse failures
instead of silently continuing — either log a warning with context (include the
failing date_text and bin_type) or collect failures and raise a descriptive
exception after the loop so callers notice; ensure the chosen logger or
exception includes which date_text and expected format (date_format) caused the
failure.
- Around line 24-34: The current PAON/UPRN matching logic (variables paon,
paon_norm and list valid) silently returns valid[0][0] when no match is found;
change it so that if the caller supplied a user identifier (paon or UPRN) and no
matching entry in valid is found, the function raises a clear error (e.g.,
ValueError or a custom exception) indicating no match for the provided
identifier, and only fall back to returning valid[0][0] when neither paon nor
UPRN were supplied (i.e., both are falsy). Locate the matching block that
iterates over valid and update control flow to perform the raise on missing
user-supplied matches and preserve the existing default behavior only for truly
absent identifiers.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused import of the re module from CalderdaleCouncil.py:
delete the line "import re" (the module is not referenced anywhere in the file,
so simply removing that import from the top of the file and running tests/lint
will resolve the warning).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 846cae2c-2bfc-4f5a-9545-4b8b1b0cac6e

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 601d10f.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py

Comment on lines +24 to +34
if paon:
paon_norm = str(paon).strip().upper()
for val, text in valid:
text_upper = text.upper()
if text_upper.startswith(paon_norm + " ") or text_upper.startswith(paon_norm + ","):
return val
for val, text in valid:
if paon_norm in text.upper():
return val

return valid[0][0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Silent fallback may mask misconfigurations.

When a user-supplied UPRN or PAON fails to match any option, the function silently returns the first address. This could lead to returning incorrect collection data without any warning.

Consider raising an error when a user-provided identifier fails to match, and only defaulting to the first address when neither UPRN nor PAON were supplied:

🛡️ Proposed fix
     if paon:
         paon_norm = str(paon).strip().upper()
         for val, text in valid:
             text_upper = text.upper()
             if text_upper.startswith(paon_norm + " ") or text_upper.startswith(paon_norm + ","):
                 return val
         for val, text in valid:
             if paon_norm in text.upper():
                 return val
+        raise ValueError(f"No address matching house number/name '{paon}' found in dropdown")

-    return valid[0][0]
+    if uprn:
+        raise ValueError(f"UPRN '{uprn}' not found in dropdown")
+
+    # No identifier provided - default to first address
+    return valid[0][0]

Based on learnings: In uk_bin_collection/**/*.py, prefer explicit failures over silent defaults to ensure format changes are detected early.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 24 - 34, The current PAON/UPRN matching logic (variables paon, paon_norm
and list valid) silently returns valid[0][0] when no match is found; change it
so that if the caller supplied a user identifier (paon or UPRN) and no matching
entry in valid is found, the function raises a clear error (e.g., ValueError or
a custom exception) indicating no match for the provided identifier, and only
fall back to returning valid[0][0] when neither paon nor UPRN were supplied
(i.e., both are falsy). Locate the matching block that iterates over valid and
update control flow to perform the raise on missing user-supplied matches and
preserve the existing default behavior only for truly absent identifiers.

Comment on lines +108 to +117
try:
parsed = datetime.strptime(date_text, "%A %d %B %Y")
data["bins"].append(
{
"type": bin_type,
"collectionDate": parsed.strftime(date_format),
}
)
except ValueError:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent date parse failures may hide format changes.

When datetime.strptime fails, the exception is caught and silently discarded. If the council changes their date format, this code will silently return incomplete data rather than alerting to the issue.

Consider logging a warning or raising after accumulating failures:

🛡️ Proposed fix with warning
+import logging
+
+logger = logging.getLogger(__name__)
+
 # In the parsing loop:
                     try:
                         parsed = datetime.strptime(date_text, "%A %d %B %Y")
                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": parsed.strftime(date_format),
                             }
                         )
                     except ValueError:
-                        continue
+                        logger.warning(f"Failed to parse date '{date_text}' for bin type '{bin_type}'")
+                        continue

Based on learnings: In uk_bin_collection/**/*.py, prefer explicit failures over silent defaults or swallowed errors to ensure format changes are detected early.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 108 - 117, The date parsing block currently swallows ValueError from
datetime.strptime for date_text which can hide format changes; update the code
in CalderdaleCouncil (the parsing loop that uses datetime.strptime, date_text,
date_format, bin_type and appends to data["bins"]) to record parse failures
instead of silently continuing — either log a warning with context (include the
failing date_text and bin_type) or collect failures and raise a descriptive
exception after the loop so callers notice; ensure the chosen logger or
exception includes which date_text and expected format (date_format) caused the
failure.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 54-62: The POST requests using session.post (the call that posts
base_url with data including "postcode": user_postcode and the second POST later
in the file) lack a timeout and can hang; update both session.post invocations
to pass a sensible timeout argument (e.g., timeout=10) so requests will raise on
slow/unresponsive servers, keep using resp.raise_for_status() afterward, and
ensure the change references the same variables (base_url, user_postcode) and
response handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7751eb98-b133-4987-9a32-c76acef6d9a3

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 02f3700.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py

Comment on lines +54 to +62
resp = session.post(
base_url,
data={
"postcode": user_postcode,
"email-address": "",
"find": "Find an address for this postcode",
},
)
resp.raise_for_status()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to HTTP requests to prevent hanging.

The session.post() call has no timeout. If the council's server is slow or unresponsive, this will block indefinitely.

🛡️ Proposed fix to add timeout
         resp = session.post(
             base_url,
             data={
                 "postcode": user_postcode,
                 "email-address": "",
                 "find": "Find an address for this postcode",
             },
+            timeout=30,
         )

Also apply to the second POST at line 73-83.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resp = session.post(
base_url,
data={
"postcode": user_postcode,
"email-address": "",
"find": "Find an address for this postcode",
},
)
resp.raise_for_status()
resp = session.post(
base_url,
data={
"postcode": user_postcode,
"email-address": "",
"find": "Find an address for this postcode",
},
timeout=30,
)
resp.raise_for_status()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 54 - 62, The POST requests using session.post (the call that posts
base_url with data including "postcode": user_postcode and the second POST later
in the file) lack a timeout and can hang; update both session.post invocations
to pass a sensible timeout argument (e.g., timeout=10) so requests will raise on
slow/unresponsive servers, keep using resp.raise_for_status() afterward, and
ensure the change references the same variables (base_url, user_postcode) and
response handling.

UPRN is no longer required. Users can provide postcode + house number
as an alternative. Matches by house number in address dropdown text.
Removes Selenium dependency - uses pure HTTP POST to JSP form.
Backward compatible - existing UPRN lookups still work.
@InertiaUK InertiaUK force-pushed the feat/calderdale-postcode-lookup branch from 02f3700 to 004741e Compare May 12, 2026 15:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)

1-1: 💤 Low value

Remove unused import re.

The re module is imported but never used in this file.

-import re
 from datetime import datetime
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Remove the unused import by deleting the standalone "import re" statement at the
top of the file (the unused import in CalderdaleCouncil.py); ensure no other
references to the "re" module exist in functions or classes in that file (e.g.,
any methods in the CalderdaleCouncil class) before removing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused import by deleting the standalone "import re"
statement at the top of the file (the unused import in CalderdaleCouncil.py);
ensure no other references to the "re" module exist in functions or classes in
that file (e.g., any methods in the CalderdaleCouncil class) before removing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cfd36d21-e774-4ab6-a1fd-5166f4eb277f

📥 Commits

Reviewing files that changed from the base of the PR and between 02f3700 and 004741e.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • uk_bin_collection/tests/input.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant