fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID#2044
fix: LondonBoroughOfRichmondUponThames — support UPRN directly as PID#2044InertiaUK wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThe Richmond upon Thames scraper now accepts a UPRN or derives a PID from postcode + house number via a multi-step My Richmond lookup, fetches the property page, and parses waste collection sections into bins with collectionDate fields. ChangesRichmond upon Thames UPRN/Postcode Lookup Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2044 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)
86-86: 💤 Low valueUnused loop variable
street_name.Per static analysis hint, rename to
_street_nameto indicate it's intentionally unused.♻️ Rename unused variable
- for usrn, street_name in street_usrns: + for usrn, _street_name in street_usrns:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` at line 86, The loop in LondonBoroughOfRichmondUponThames.py uses an unused second tuple variable named street_name; rename it to _street_name in the for loop (for usrn, _street_name in street_usrns:) to make the intent explicit to linters and readers, and ensure there are no subsequent references to street_name elsewhere in the method or function before committing the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 113-117: The current substring check (paon_lower in text) can
incorrectly match "1" inside "10"/"11"; update the option-matching logic in the
loop over uprn_select.find_all("option") to use a stricter match: either use a
word-boundary regex (e.g. re.search(r'\b' + re.escape(paon_lower) + r'\b',
text)) or compare the first token (text.split()[0] == paon_lower) / prefix with
delimiter checks, and only return val when that stricter match succeeds; ensure
to import re if you choose the regex approach.
---
Nitpick comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Line 86: The loop in LondonBoroughOfRichmondUponThames.py uses an unused
second tuple variable named street_name; rename it to _street_name in the for
loop (for usrn, _street_name in street_usrns:) to make the intent explicit to
linters and readers, and ensure there are no subsequent references to
street_name elsewhere in the method or function before committing the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2ff688f1-bd09-41f7-9528-8ae94870fe2a
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py
| for opt in uprn_select.find_all("option"): | ||
| val = opt.get("value", "") | ||
| text = opt.text.strip().lower() | ||
| if val and paon_lower in text: | ||
| return val |
There was a problem hiding this comment.
Substring match may select the wrong property.
Using paon_lower in text means house number "1" will also match "10", "11", "100", etc. Consider a stricter match (word boundary or exact prefix):
🛠️ Suggested fix using word-boundary check
for opt in uprn_select.find_all("option"):
val = opt.get("value", "")
text = opt.text.strip().lower()
- if val and paon_lower in text:
+ # Match as a word boundary to avoid "1" matching "10", "11", etc.
+ if val and re.search(rf"\b{re.escape(paon_lower)}\b", text):
return val📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for opt in uprn_select.find_all("option"): | |
| val = opt.get("value", "") | |
| text = opt.text.strip().lower() | |
| if val and paon_lower in text: | |
| return val | |
| for opt in uprn_select.find_all("option"): | |
| val = opt.get("value", "") | |
| text = opt.text.strip().lower() | |
| # Match as a word boundary to avoid "1" matching "10", "11", etc. | |
| if val and re.search(rf"\b{re.escape(paon_lower)}\b", text): | |
| return val |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 113 - 117, The current substring check (paon_lower in text) can
incorrectly match "1" inside "10"/"11"; update the option-matching logic in the
loop over uprn_select.find_all("option") to use a stricter match: either use a
word-boundary regex (e.g. re.search(r'\b' + re.escape(paon_lower) + r'\b',
text)) or compare the first token (text.split()[0] == paon_lower) / prefix with
delimiter checks, and only return val when that stricter match succeeds; ensure
to import re if you choose the regex approach.
7f8f2b5 to
e5f0076
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)
113-117:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse a bounded match before returning the UPRN.
paon_lower in textcan still pick the wrong option, e.g.1matching10or11, which means the scraper may fetch another property's bin dates.🛠️ Safer match
for opt in uprn_select.find_all("option"): val = opt.get("value", "") text = opt.text.strip().lower() - if val and paon_lower in text: + if val and re.search(rf"\b{re.escape(paon_lower)}\b", text): return val🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` around lines 113 - 117, The current loop in LondonBoroughOfRichmondUponThames.py uses a substring check (paon_lower in text) which can false-match (e.g., "1" matching "10"); change the match in the uprn_select.find_all("option") loop to a bounded match using the variables val, text and paon_lower — for example use a regex word-boundary match (with re.escape(paon_lower)) or split the option text into tokens and compare tokens for equality (or token-prefix if you need "1A" rules) before returning val; ensure you import re if using regex and update the condition that determines the return so only exact/word-bounded PAON matches pass.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)
87-91: ⚡ Quick winFail explicitly if the postcode form tokens disappear.
This currently dereferences
form.find(...).get("value")directly, so a small Richmond markup change becomes an opaqueAttributeErrorinstead of a clear scraper failure.🛠️ Suggested guard
form = usrn_select.find_parent("form") - token = form.find( - "input", {"name": "__RequestVerificationToken"} - ).get("value") - ufprt = form.find("input", {"name": "ufprt"}).get("value") + token_input = ( + form.find("input", {"name": "__RequestVerificationToken"}) + if form + else None + ) + ufprt_input = form.find("input", {"name": "ufprt"}) if form else None + if not form or not token_input or not ufprt_input: + raise RuntimeError( + "Richmond: postcode form tokens were not found; the form layout may have changed." + ) + token = token_input.get("value") + ufprt = ufprt_input.get("value")Based on learnings: council parsers in
uk_bin_collection/**/*.pyshould fail explicitly when remote HTML no longer matches expectations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py` around lines 87 - 91, The code currently assumes form.find(...).get("value") always succeeds for the inputs named "__RequestVerificationToken" and "ufprt" (starting from the usrn_select → form traversal), which will raise an opaque AttributeError if the markup changes; update the scraping block that computes token and ufprt (referencing usrn_select and form) to explicitly check that form is not None and that form.find(...) returns a non-None element for both names before calling .get("value"), and if any are missing raise a clear, descriptive exception (e.g., RuntimeError or ValueError) indicating which token is missing and that the postcode form structure has changed so the scraper can fail fast and provide actionable debug info.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 113-117: The current loop in LondonBoroughOfRichmondUponThames.py
uses a substring check (paon_lower in text) which can false-match (e.g., "1"
matching "10"); change the match in the uprn_select.find_all("option") loop to a
bounded match using the variables val, text and paon_lower — for example use a
regex word-boundary match (with re.escape(paon_lower)) or split the option text
into tokens and compare tokens for equality (or token-prefix if you need "1A"
rules) before returning val; ensure you import re if using regex and update the
condition that determines the return so only exact/word-bounded PAON matches
pass.
---
Nitpick comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 87-91: The code currently assumes form.find(...).get("value")
always succeeds for the inputs named "__RequestVerificationToken" and "ufprt"
(starting from the usrn_select → form traversal), which will raise an opaque
AttributeError if the markup changes; update the scraping block that computes
token and ufprt (referencing usrn_select and form) to explicitly check that form
is not None and that form.find(...) returns a non-None element for both names
before calling .get("value"), and if any are missing raise a clear, descriptive
exception (e.g., RuntimeError or ValueError) indicating which token is missing
and that the postcode form structure has changed so the scraper can fail fast
and provide actionable debug info.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9c7e63a8-868c-4a16-94fa-20e3543e54f7
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py
🚧 Files skipped from review as they are similar to previous changes (1)
- uk_bin_collection/tests/input.json
The scraper required a PID parameter passed via URL or house_number field, but the PID is actually the property's UPRN. The old config had no UPRN or postcode, only a street name in house_number.
Rewritten to:
Removed Selenium dependency — pure requests now. Updated input.json with UPRN and postcode for the test address.
Summary by CodeRabbit