feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium by InertiaUK · Pull Request #2061 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-12T14:00:24Z

Summary

Users can now provide either UPRN or postcode + house number — whichever they have.
UPRN takes priority when provided (backward compatible with existing users).
Removes Selenium dependency — uses pure HTTP POST to the JSP form. The reCAPTCHA was JavaScript-only on the Selenium path; the server-side JSP endpoint has no captcha.
When only postcode + house number is given, the scraper matches by house number/name in the address dropdown text.
Falls back to first address if no match found.

How it works

POST postcode → get address dropdown
If UPRN provided → match by UPRN value in dropdown options
If no UPRN → match by house number/name in option text
POST matched UPRN → get collection table

Testing

UPRN path (backward compat): HX3 5EQ + UPRN 100051326778 ✅
Postcode + house number: HX3 5EQ + paon 95 ✅
API end-to-end via bin-resolve-v2.php ✅

Summary by CodeRabbit

Bug Fixes
- Calderdale bin collection now fetches data via direct HTTP (no browser automation), improving reliability.
- Address matching enhanced: postcode + house number are sufficient; UPRN accepted but optional.
Documentation
- Test data and notes updated to reflect the new Calderdale configuration and required inputs.

coderabbitai · 2026-05-12T14:01:37Z

📝 Walkthrough

Walkthrough

Calderdale scraper replaced Selenium with requests + BeautifulSoup, added a UPRN/PAON address-matching helper, performs two POSTs to fetch address and collection table, parses bin types and next collection dates, and updated tests to remove web_driver configuration.

Changes

Calderdale Selenium-to-HTTP Migration

Layer / File(s)	Summary
Import and address matching setup `uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`	Replaced Selenium imports with `requests` and `BeautifulSoup`. Added `_match_address()` to select dropdown options by exact zero-padded UPRN or case-insensitive PAON prefix/substring matching, raising on no options.
HTTP-based collection data parsing `uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`	Rewrote `parse_data()` to use two sequential HTTP POSTs: first to get the address `select#uprn` and determine matched UPRN via `_match_address()`, second to retrieve `table#collection`. Parses bin type and next collection date from paragraphs containing "will be your next collection", sorts by parsed date, and uses `resp.raise_for_status()` plus `ValueError` checks for missing elements.
Test configuration update `uk_bin_collection/tests/input.json`	Updated `CalderdaleCouncil` test entry with new example `postcode`, `house_number`, and `uprn`; removed `web_driver`/Selenium config; revised `wiki_note` to state postcode + house number are sufficient and UPRN is accepted but not required.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

robbrad/UKBinCollectionData#1996: Implements UPRN/PAON matching fallback logic in other council address dropdowns.
robbrad/UKBinCollectionData#1659: Replaces Selenium lookups with direct HTTP fetch-and-parse logic and updates test input metadata.
robbrad/UKBinCollectionData#1742: Similar refactor converting Selenium-based parsing to HTTP/JSON requests and parsing.

Suggested reviewers

dp247

Poem

🐰 I hopped from Selenium to HTTP light,
POSTs and BeautifulSoup parse through the night.
UPRN or house number, the match is now neat,
Dates sorted and tidy — no browser to meet.
Hooray for clean code and a faster bin beat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding postcode + house number lookup capability and removing the Selenium dependency from CalderdaleCouncil.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-12T14:05:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (004741e).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2061   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai · 2026-05-12T14:06:31Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)

1-1: 💤 Low value

Remove unused import.

The re module is imported but not used anywhere in this file.

-import re

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Remove the unused import of the re module from CalderdaleCouncil.py: delete the
line "import re" (the module is not referenced anywhere in the file, so simply
removing that import from the top of the file and running tests/lint will
resolve the warning).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 108-117: The date parsing block currently swallows ValueError from
datetime.strptime for date_text which can hide format changes; update the code
in CalderdaleCouncil (the parsing loop that uses datetime.strptime, date_text,
date_format, bin_type and appends to data["bins"]) to record parse failures
instead of silently continuing — either log a warning with context (include the
failing date_text and bin_type) or collect failures and raise a descriptive
exception after the loop so callers notice; ensure the chosen logger or
exception includes which date_text and expected format (date_format) caused the
failure.
- Around line 24-34: The current PAON/UPRN matching logic (variables paon,
paon_norm and list valid) silently returns valid[0][0] when no match is found;
change it so that if the caller supplied a user identifier (paon or UPRN) and no
matching entry in valid is found, the function raises a clear error (e.g.,
ValueError or a custom exception) indicating no match for the provided
identifier, and only fall back to returning valid[0][0] when neither paon nor
UPRN were supplied (i.e., both are falsy). Locate the matching block that
iterates over valid and update control flow to perform the raise on missing
user-supplied matches and preserve the existing default behavior only for truly
absent identifiers.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused import of the re module from CalderdaleCouncil.py:
delete the line "import re" (the module is not referenced anywhere in the file,
so simply removing that import from the top of the file and running tests/lint
will resolve the warning).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 846cae2c-2bfc-4f5a-9545-4b8b1b0cac6e

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 601d10f.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py

coderabbitai · 2026-05-12T14:06:34Z

+    if paon:
+        paon_norm = str(paon).strip().upper()
+        for val, text in valid:
+            text_upper = text.upper()
+            if text_upper.startswith(paon_norm + " ") or text_upper.startswith(paon_norm + ","):
+                return val
+        for val, text in valid:
+            if paon_norm in text.upper():
+                return val

+    return valid[0][0]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Silent fallback may mask misconfigurations.

When a user-supplied UPRN or PAON fails to match any option, the function silently returns the first address. This could lead to returning incorrect collection data without any warning.

Consider raising an error when a user-provided identifier fails to match, and only defaulting to the first address when neither UPRN nor PAON were supplied:

🛡️ Proposed fix

if paon: paon_norm = str(paon).strip().upper() for val, text in valid: text_upper = text.upper() if text_upper.startswith(paon_norm + " ") or text_upper.startswith(paon_norm + ","): return val for val, text in valid: if paon_norm in text.upper(): return val + raise ValueError(f"No address matching house number/name '{paon}' found in dropdown") - return valid[0][0] + if uprn: + raise ValueError(f"UPRN '{uprn}' not found in dropdown") + + # No identifier provided - default to first address + return valid[0][0]

Based on learnings: In uk_bin_collection/**/*.py, prefer explicit failures over silent defaults to ensure format changes are detected early.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around lines 24 - 34, The current PAON/UPRN matching logic (variables paon, paon_norm and list valid) silently returns valid[0][0] when no match is found; change it so that if the caller supplied a user identifier (paon or UPRN) and no matching entry in valid is found, the function raises a clear error (e.g., ValueError or a custom exception) indicating no match for the provided identifier, and only fall back to returning valid[0][0] when neither paon nor UPRN were supplied (i.e., both are falsy). Locate the matching block that iterates over valid and update control flow to perform the raise on missing user-supplied matches and preserve the existing default behavior only for truly absent identifiers.

coderabbitai · 2026-05-12T14:06:34Z

+                    try:
+                        parsed = datetime.strptime(date_text, "%A %d %B %Y")
+                        data["bins"].append(
+                            {
+                                "type": bin_type,
+                                "collectionDate": parsed.strftime(date_format),
+                            }
+                        )
+                    except ValueError:
+                        continue


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent date parse failures may hide format changes.

When datetime.strptime fails, the exception is caught and silently discarded. If the council changes their date format, this code will silently return incomplete data rather than alerting to the issue.

Consider logging a warning or raising after accumulating failures:

🛡️ Proposed fix with warning

+import logging + +logger = logging.getLogger(__name__) + # In the parsing loop: try: parsed = datetime.strptime(date_text, "%A %d %B %Y") data["bins"].append( { "type": bin_type, "collectionDate": parsed.strftime(date_format), } ) except ValueError: - continue + logger.warning(f"Failed to parse date '{date_text}' for bin type '{bin_type}'") + continue

Based on learnings: In uk_bin_collection/**/*.py, prefer explicit failures over silent defaults or swallowed errors to ensure format changes are detected early.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around lines 108 - 117, The date parsing block currently swallows ValueError from datetime.strptime for date_text which can hide format changes; update the code in CalderdaleCouncil (the parsing loop that uses datetime.strptime, date_text, date_format, bin_type and appends to data["bins"]) to record parse failures instead of silently continuing — either log a warning with context (include the failing date_text and bin_type) or collect failures and raise a descriptive exception after the loop so callers notice; ensure the chosen logger or exception includes which date_text and expected format (date_format) caused the failure.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 54-62: The POST requests using session.post (the call that posts
base_url with data including "postcode": user_postcode and the second POST later
in the file) lack a timeout and can hang; update both session.post invocations
to pass a sensible timeout argument (e.g., timeout=10) so requests will raise on
slow/unresponsive servers, keep using resp.raise_for_status() afterward, and
ensure the change references the same variables (base_url, user_postcode) and
response handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7751eb98-b133-4987-9a32-c76acef6d9a3

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 02f3700.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py

coderabbitai · 2026-05-12T14:11:29Z

+        resp = session.post(
+            base_url,
+            data={
+                "postcode": user_postcode,
+                "email-address": "",
+                "find": "Find an address for this postcode",
+            },
+        )
+        resp.raise_for_status()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to HTTP requests to prevent hanging.

The session.post() call has no timeout. If the council's server is slow or unresponsive, this will block indefinitely.

🛡️ Proposed fix to add timeout

resp = session.post( base_url, data={ "postcode": user_postcode, "email-address": "", "find": "Find an address for this postcode", }, + timeout=30, )

Also apply to the second POST at line 73-83.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

resp = session.post(

base_url,

data={

"postcode": user_postcode,

"email-address": "",

"find": "Find an address for this postcode",

},

)

resp.raise_for_status()

resp = session.post(

base_url,

data={

"postcode": user_postcode,

"email-address": "",

"find": "Find an address for this postcode",

},

timeout=30,

)

resp.raise_for_status()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around lines 54 - 62, The POST requests using session.post (the call that posts base_url with data including "postcode": user_postcode and the second POST later in the file) lack a timeout and can hang; update both session.post invocations to pass a sensible timeout argument (e.g., timeout=10) so requests will raise on slow/unresponsive servers, keep using resp.raise_for_status() afterward, and ensure the change references the same variables (base_url, user_postcode) and response handling.

UPRN is no longer required. Users can provide postcode + house number as an alternative. Matches by house number in address dropdown text. Removes Selenium dependency - uses pure HTTP POST to JSP form. Backward compatible - existing UPRN lookups still work.

coderabbitai

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (1)

1-1: 💤 Low value

Remove unused import re.

The re module is imported but never used in this file.

-import re
 from datetime import datetime

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Remove the unused import by deleting the standalone "import re" statement at the
top of the file (the unused import in CalderdaleCouncil.py); ensure no other
references to the "re" module exist in functions or classes in that file (e.g.,
any methods in the CalderdaleCouncil class) before removing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused import by deleting the standalone "import re"
statement at the top of the file (the unused import in CalderdaleCouncil.py);
ensure no other references to the "re" module exist in functions or classes in
that file (e.g., any methods in the CalderdaleCouncil class) before removing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cfd36d21-e774-4ab6-a1fd-5166f4eb277f

📥 Commits

Reviewing files that changed from the base of the PR and between 02f3700 and 004741e.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py

🚧 Files skipped from review as they are similar to previous changes (1)

uk_bin_collection/tests/input.json

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

InertiaUK force-pushed the feat/calderdale-postcode-lookup branch from 02f3700 to 004741e Compare May 12, 2026 15:21

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium#2061

feat: CalderdaleCouncil - add postcode + house number lookup, remove Selenium#2061
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:feat/calderdale-postcode-lookup

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading