Skip to content

fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID#2044

Open
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/richmond-uprn-as-pid
Open

fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID#2044
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/richmond-uprn-as-pid

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

The scraper required a PID parameter passed via URL or house_number field, but the PID is actually the property's UPRN. The old config had no UPRN or postcode, only a street name in house_number.

Rewritten to:

  1. Accept UPRN directly (used as PID to construct the My Richmond URL)
  2. Accept postcode + house number for a 3-step form lookup: postcode → street (USRN) → property (UPRN/PID)

Removed Selenium dependency — pure requests now. Updated input.json with UPRN and postcode for the test address.

Summary by CodeRabbit

  • Refactor
    • Richmond upon Thames bin collection scraper refactored: property lookup now accepts UPRN or postcode + house number, improving reliability of waste collection retrieval.
  • Documentation
    • Updated local configuration/instructions to reflect UPRN or postcode+house number usage and new URL handling.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

The Richmond upon Thames scraper now accepts a UPRN or derives a PID from postcode + house number via a multi-step My Richmond lookup, fetches the property page, and parses waste collection sections into bins with collectionDate fields.

Changes

Richmond upon Thames UPRN/Postcode Lookup Refactor

Layer / File(s) Summary
PID lookup entry point and Richmond URL infrastructure
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (lines 4-122)
Module constants (BASE_URL, MY_RICHMOND_URL, MY_RICHMOND_PROPERTY_URL, HEADERS) and imports added; parse_data reworked to accept uprn or postcode+paon/number, derive pid via _lookup_pid, build the property URL, and fetch HTML. New _fetch helper and _lookup_pid implement the multi-step My Richmond lookup flow.
Waste HTML extraction and bin date parsing
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (lines 123-189)
_parse_waste extracts waste sections from the property HTML by iterating <h4> sections and collecting dates from <ul><li> with a <p> fallback; _extract_waste_block regexes updated to match current my-waste div or legacy my_waste anchor.
Test configuration for UPRN and postcode parameters
uk_bin_collection/tests/input.json (lines 1470-1475)
LondonBoroughOfRichmondUponThames test entry updated to provide postcode and uprn with skip_get_url: true, a new url, and an updated wiki_note describing UPRN-as-PID or postcode + house number usage.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A Richmond lookup, three steps in the dance,
Tokens and streets align at just a glance,
UPRN found, the property page unfurls,
Bins and dates collected for curious squirrels,
Hooray — cartwheels for tidy, trash-day whirls!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: enabling direct UPRN support as PID for the Richmond council scraper, which is the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (e5f0076).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2044   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

86-86: 💤 Low value

Unused loop variable street_name.

Per static analysis hint, rename to _street_name to indicate it's intentionally unused.

♻️ Rename unused variable
-        for usrn, street_name in street_usrns:
+        for usrn, _street_name in street_usrns:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
at line 86, The loop in LondonBoroughOfRichmondUponThames.py uses an unused
second tuple variable named street_name; rename it to _street_name in the for
loop (for usrn, _street_name in street_usrns:) to make the intent explicit to
linters and readers, and ensure there are no subsequent references to
street_name elsewhere in the method or function before committing the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 113-117: The current substring check (paon_lower in text) can
incorrectly match "1" inside "10"/"11"; update the option-matching logic in the
loop over uprn_select.find_all("option") to use a stricter match: either use a
word-boundary regex (e.g. re.search(r'\b' + re.escape(paon_lower) + r'\b',
text)) or compare the first token (text.split()[0] == paon_lower) / prefix with
delimiter checks, and only return val when that stricter match succeeds; ensure
to import re if you choose the regex approach.

---

Nitpick comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Line 86: The loop in LondonBoroughOfRichmondUponThames.py uses an unused
second tuple variable named street_name; rename it to _street_name in the for
loop (for usrn, _street_name in street_usrns:) to make the intent explicit to
linters and readers, and ensure there are no subsequent references to
street_name elsewhere in the method or function before committing the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ff688f1-bd09-41f7-9528-8ae94870fe2a

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 7f8f2b5.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py

Comment on lines +113 to +117
for opt in uprn_select.find_all("option"):
val = opt.get("value", "")
text = opt.text.strip().lower()
if val and paon_lower in text:
return val
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Substring match may select the wrong property.

Using paon_lower in text means house number "1" will also match "10", "11", "100", etc. Consider a stricter match (word boundary or exact prefix):

🛠️ Suggested fix using word-boundary check
             for opt in uprn_select.find_all("option"):
                 val = opt.get("value", "")
                 text = opt.text.strip().lower()
-                if val and paon_lower in text:
+                # Match as a word boundary to avoid "1" matching "10", "11", etc.
+                if val and re.search(rf"\b{re.escape(paon_lower)}\b", text):
                     return val
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for opt in uprn_select.find_all("option"):
val = opt.get("value", "")
text = opt.text.strip().lower()
if val and paon_lower in text:
return val
for opt in uprn_select.find_all("option"):
val = opt.get("value", "")
text = opt.text.strip().lower()
# Match as a word boundary to avoid "1" matching "10", "11", etc.
if val and re.search(rf"\b{re.escape(paon_lower)}\b", text):
return val
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 113 - 117, The current substring check (paon_lower in text) can
incorrectly match "1" inside "10"/"11"; update the option-matching logic in the
loop over uprn_select.find_all("option") to use a stricter match: either use a
word-boundary regex (e.g. re.search(r'\b' + re.escape(paon_lower) + r'\b',
text)) or compare the first token (text.split()[0] == paon_lower) / prefix with
delimiter checks, and only return val when that stricter match succeeds; ensure
to import re if you choose the regex approach.

@InertiaUK InertiaUK force-pushed the fix/richmond-uprn-as-pid branch from 7f8f2b5 to e5f0076 Compare May 12, 2026 15:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

113-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a bounded match before returning the UPRN.

paon_lower in text can still pick the wrong option, e.g. 1 matching 10 or 11, which means the scraper may fetch another property's bin dates.

🛠️ Safer match
             for opt in uprn_select.find_all("option"):
                 val = opt.get("value", "")
                 text = opt.text.strip().lower()
-                if val and paon_lower in text:
+                if val and re.search(rf"\b{re.escape(paon_lower)}\b", text):
                     return val
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 113 - 117, The current loop in LondonBoroughOfRichmondUponThames.py
uses a substring check (paon_lower in text) which can false-match (e.g., "1"
matching "10"); change the match in the uprn_select.find_all("option") loop to a
bounded match using the variables val, text and paon_lower — for example use a
regex word-boundary match (with re.escape(paon_lower)) or split the option text
into tokens and compare tokens for equality (or token-prefix if you need "1A"
rules) before returning val; ensure you import re if using regex and update the
condition that determines the return so only exact/word-bounded PAON matches
pass.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

87-91: ⚡ Quick win

Fail explicitly if the postcode form tokens disappear.

This currently dereferences form.find(...).get("value") directly, so a small Richmond markup change becomes an opaque AttributeError instead of a clear scraper failure.

🛠️ Suggested guard
             form = usrn_select.find_parent("form")
-            token = form.find(
-                "input", {"name": "__RequestVerificationToken"}
-            ).get("value")
-            ufprt = form.find("input", {"name": "ufprt"}).get("value")
+            token_input = (
+                form.find("input", {"name": "__RequestVerificationToken"})
+                if form
+                else None
+            )
+            ufprt_input = form.find("input", {"name": "ufprt"}) if form else None
+            if not form or not token_input or not ufprt_input:
+                raise RuntimeError(
+                    "Richmond: postcode form tokens were not found; the form layout may have changed."
+                )
+            token = token_input.get("value")
+            ufprt = ufprt_input.get("value")

Based on learnings: council parsers in uk_bin_collection/**/*.py should fail explicitly when remote HTML no longer matches expectations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 87 - 91, The code currently assumes form.find(...).get("value")
always succeeds for the inputs named "__RequestVerificationToken" and "ufprt"
(starting from the usrn_select → form traversal), which will raise an opaque
AttributeError if the markup changes; update the scraping block that computes
token and ufprt (referencing usrn_select and form) to explicitly check that form
is not None and that form.find(...) returns a non-None element for both names
before calling .get("value"), and if any are missing raise a clear, descriptive
exception (e.g., RuntimeError or ValueError) indicating which token is missing
and that the postcode form structure has changed so the scraper can fail fast
and provide actionable debug info.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 113-117: The current loop in LondonBoroughOfRichmondUponThames.py
uses a substring check (paon_lower in text) which can false-match (e.g., "1"
matching "10"); change the match in the uprn_select.find_all("option") loop to a
bounded match using the variables val, text and paon_lower — for example use a
regex word-boundary match (with re.escape(paon_lower)) or split the option text
into tokens and compare tokens for equality (or token-prefix if you need "1A"
rules) before returning val; ensure you import re if using regex and update the
condition that determines the return so only exact/word-bounded PAON matches
pass.

---

Nitpick comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 87-91: The code currently assumes form.find(...).get("value")
always succeeds for the inputs named "__RequestVerificationToken" and "ufprt"
(starting from the usrn_select → form traversal), which will raise an opaque
AttributeError if the markup changes; update the scraping block that computes
token and ufprt (referencing usrn_select and form) to explicitly check that form
is not None and that form.find(...) returns a non-None element for both names
before calling .get("value"), and if any are missing raise a clear, descriptive
exception (e.g., RuntimeError or ValueError) indicating which token is missing
and that the postcode form structure has changed so the scraper can fail fast
and provide actionable debug info.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c7e63a8-868c-4a16-94fa-20e3543e54f7

📥 Commits

Reviewing files that changed from the base of the PR and between 7f8f2b5 and e5f0076.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • uk_bin_collection/tests/input.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant