fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID by InertiaUK · Pull Request #2044 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-12T09:41:33Z

The scraper required a PID parameter passed via URL or house_number field, but the PID is actually the property's UPRN. The old config had no UPRN or postcode, only a street name in house_number.

Rewritten to:

Accept UPRN directly (used as PID to construct the My Richmond URL)
Accept postcode + house number for a 3-step form lookup: postcode → street (USRN) → property (UPRN/PID)

Removed Selenium dependency — pure requests now. Updated input.json with UPRN and postcode for the test address.

Summary by CodeRabbit

Refactor
- Richmond upon Thames bin collection scraper refactored: property lookup now accepts UPRN or postcode + house number, improving reliability of waste collection retrieval.
Documentation
- Updated local configuration/instructions to reflect UPRN or postcode+house number usage and new URL handling.

coderabbitai · 2026-05-12T09:41:47Z

📝 Walkthrough

Walkthrough

The Richmond upon Thames scraper now accepts a UPRN or derives a PID from postcode + house number via a multi-step My Richmond lookup, fetches the property page, and parses waste collection sections into bins with collectionDate fields.

Changes

Richmond upon Thames UPRN/Postcode Lookup Refactor

Layer / File(s)	Summary
PID lookup entry point and Richmond URL infrastructure `uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` (`lines 4-122`)	Module constants (`BASE_URL`, `MY_RICHMOND_URL`, `MY_RICHMOND_PROPERTY_URL`, `HEADERS`) and imports added; `parse_data` reworked to accept `uprn` or `postcode`+`paon`/`number`, derive `pid` via `_lookup_pid`, build the property URL, and fetch HTML. New `_fetch` helper and `_lookup_pid` implement the multi-step My Richmond lookup flow.
Waste HTML extraction and bin date parsing `uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` (`lines 123-189`)	`_parse_waste` extracts waste sections from the property HTML by iterating `<h4>` sections and collecting dates from `<ul><li>` with a `<p>` fallback; `_extract_waste_block` regexes updated to match current `my-waste` div or legacy `my_waste` anchor.
Test configuration for UPRN and postcode parameters `uk_bin_collection/tests/input.json` (`lines 1470-1475`)	`LondonBoroughOfRichmondUponThames` test entry updated to provide `postcode` and `uprn` with `skip_get_url: true`, a new `url`, and an updated `wiki_note` describing UPRN-as-PID or postcode + house number usage.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A Richmond lookup, three steps in the dance,
Tokens and streets align at just a glance,
UPRN found, the property page unfurls,
Bins and dates collected for curious squirrels,
Hooray — cartwheels for tidy, trash-day whirls!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: enabling direct UPRN support as PID for the Richmond council scraper, which is the primary objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-12T09:44:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (e5f0076).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2044   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

86-86: 💤 Low value

Unused loop variable street_name.

Per static analysis hint, rename to _street_name to indicate it's intentionally unused.

♻️ Rename unused variable

-        for usrn, street_name in street_usrns:
+        for usrn, _street_name in street_usrns:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
at line 86, The loop in LondonBoroughOfRichmondUponThames.py uses an unused
second tuple variable named street_name; rename it to _street_name in the for
loop (for usrn, _street_name in street_usrns:) to make the intent explicit to
linters and readers, and ensure there are no subsequent references to
street_name elsewhere in the method or function before committing the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 113-117: The current substring check (paon_lower in text) can
incorrectly match "1" inside "10"/"11"; update the option-matching logic in the
loop over uprn_select.find_all("option") to use a stricter match: either use a
word-boundary regex (e.g. re.search(r'\b' + re.escape(paon_lower) + r'\b',
text)) or compare the first token (text.split()[0] == paon_lower) / prefix with
delimiter checks, and only return val when that stricter match succeeds; ensure
to import re if you choose the regex approach.

---

Nitpick comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Line 86: The loop in LondonBoroughOfRichmondUponThames.py uses an unused
second tuple variable named street_name; rename it to _street_name in the for
loop (for usrn, _street_name in street_usrns:) to make the intent explicit to
linters and readers, and ensure there are no subsequent references to
street_name elsewhere in the method or function before committing the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ff688f1-bd09-41f7-9528-8ae94870fe2a

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 7f8f2b5.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py

coderabbitai · 2026-05-12T09:44:58Z

+            for opt in uprn_select.find_all("option"):
+                val = opt.get("value", "")
+                text = opt.text.strip().lower()
+                if val and paon_lower in text:
+                    return val


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Substring match may select the wrong property.

Using paon_lower in text means house number "1" will also match "10", "11", "100", etc. Consider a stricter match (word boundary or exact prefix):

🛠️ Suggested fix using word-boundary check

for opt in uprn_select.find_all("option"): val = opt.get("value", "") text = opt.text.strip().lower() - if val and paon_lower in text: + # Match as a word boundary to avoid "1" matching "10", "11", etc. + if val and re.search(rf"\b{re.escape(paon_lower)}\b", text): return val

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for opt in uprn_select.find_all("option"):

val = opt.get("value", "")

text = opt.text.strip().lower()

if val and paon_lower in text:

return val

for opt in uprn_select.find_all("option"):

val = opt.get("value", "")

text = opt.text.strip().lower()

# Match as a word boundary to avoid "1" matching "10", "11", etc.

if val and re.search(rf"\b{re.escape(paon_lower)}\b", text):

return val

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` around lines 113 - 117, The current substring check (paon_lower in text) can incorrectly match "1" inside "10"/"11"; update the option-matching logic in the loop over uprn_select.find_all("option") to use a stricter match: either use a word-boundary regex (e.g. re.search(r'\b' + re.escape(paon_lower) + r'\b', text)) or compare the first token (text.split()[0] == paon_lower) / prefix with delimiter checks, and only return val when that stricter match succeeds; ensure to import re if you choose the regex approach.

coderabbitai

♻️ Duplicate comments (1)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

113-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a bounded match before returning the UPRN.

paon_lower in text can still pick the wrong option, e.g. 1 matching 10 or 11, which means the scraper may fetch another property's bin dates.

🛠️ Safer match

             for opt in uprn_select.find_all("option"):
                 val = opt.get("value", "")
                 text = opt.text.strip().lower()
-                if val and paon_lower in text:
+                if val and re.search(rf"\b{re.escape(paon_lower)}\b", text):
                     return val

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 113 - 117, The current loop in LondonBoroughOfRichmondUponThames.py
uses a substring check (paon_lower in text) which can false-match (e.g., "1"
matching "10"); change the match in the uprn_select.find_all("option") loop to a
bounded match using the variables val, text and paon_lower — for example use a
regex word-boundary match (with re.escape(paon_lower)) or split the option text
into tokens and compare tokens for equality (or token-prefix if you need "1A"
rules) before returning val; ensure you import re if using regex and update the
condition that determines the return so only exact/word-bounded PAON matches
pass.

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

87-91: ⚡ Quick win

Fail explicitly if the postcode form tokens disappear.

This currently dereferences form.find(...).get("value") directly, so a small Richmond markup change becomes an opaque AttributeError instead of a clear scraper failure.

🛠️ Suggested guard

             form = usrn_select.find_parent("form")
-            token = form.find(
-                "input", {"name": "__RequestVerificationToken"}
-            ).get("value")
-            ufprt = form.find("input", {"name": "ufprt"}).get("value")
+            token_input = (
+                form.find("input", {"name": "__RequestVerificationToken"})
+                if form
+                else None
+            )
+            ufprt_input = form.find("input", {"name": "ufprt"}) if form else None
+            if not form or not token_input or not ufprt_input:
+                raise RuntimeError(
+                    "Richmond: postcode form tokens were not found; the form layout may have changed."
+                )
+            token = token_input.get("value")
+            ufprt = ufprt_input.get("value")

Based on learnings: council parsers in uk_bin_collection/**/*.py should fail explicitly when remote HTML no longer matches expectations.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 87 - 91, The code currently assumes form.find(...).get("value")
always succeeds for the inputs named "__RequestVerificationToken" and "ufprt"
(starting from the usrn_select → form traversal), which will raise an opaque
AttributeError if the markup changes; update the scraping block that computes
token and ufprt (referencing usrn_select and form) to explicitly check that form
is not None and that form.find(...) returns a non-None element for both names
before calling .get("value"), and if any are missing raise a clear, descriptive
exception (e.g., RuntimeError or ValueError) indicating which token is missing
and that the postcode form structure has changed so the scraper can fail fast
and provide actionable debug info.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 113-117: The current loop in LondonBoroughOfRichmondUponThames.py
uses a substring check (paon_lower in text) which can false-match (e.g., "1"
matching "10"); change the match in the uprn_select.find_all("option") loop to a
bounded match using the variables val, text and paon_lower — for example use a
regex word-boundary match (with re.escape(paon_lower)) or split the option text
into tokens and compare tokens for equality (or token-prefix if you need "1A"
rules) before returning val; ensure you import re if using regex and update the
condition that determines the return so only exact/word-bounded PAON matches
pass.

---

Nitpick comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 87-91: The code currently assumes form.find(...).get("value")
always succeeds for the inputs named "__RequestVerificationToken" and "ufprt"
(starting from the usrn_select → form traversal), which will raise an opaque
AttributeError if the markup changes; update the scraping block that computes
token and ufprt (referencing usrn_select and form) to explicitly check that form
is not None and that form.find(...) returns a non-None element for both names
before calling .get("value"), and if any are missing raise a clear, descriptive
exception (e.g., RuntimeError or ValueError) indicating which token is missing
and that the postcode form structure has changed so the scraper can fail fast
and provide actionable debug info.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c7e63a8-868c-4a16-94fa-20e3543e54f7

📥 Commits

Reviewing files that changed from the base of the PR and between 7f8f2b5 and e5f0076.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py

🚧 Files skipped from review as they are similar to previous changes (1)

uk_bin_collection/tests/input.json

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

fix: london borough of richmond - support uprn directly as pid

e5f0076

InertiaUK force-pushed the fix/richmond-uprn-as-pid branch from 7f8f2b5 to e5f0076 Compare May 12, 2026 15:23

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID#2044

fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID#2044
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/richmond-uprn-as-pid

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading