fix: SouthOxfordshireCouncil - session establishment + bin type cleanup#1983
fix: SouthOxfordshireCouncil - session establishment + bin type cleanup#1983InertiaUK wants to merge 2 commits into
Conversation
Rotherham's own bin-day page directs residents to PDF calendars only — there is no usable web lookup at rotherham.gov.uk. The Rotherham Bins mobile app uses the shared Imactivate backend at `bins.azurewebsites.net`. Earlier this year that endpoint had no Rotherham data deployed; as of April 2026 it does: - `getaddress?postcode=S60+1JD&localauthority=Rotherham` returns the full address list with `PremiseID` per row. - `getcollections?premisesid=<id>&localauthority=Rotherham` returns 6 future collections (PINK / BLACK / GREEN bins) per property. The previous scraper only accepted a `premisesid` (or treated a numeric `uprn` as one), but standard UPRNs are not Imactivate PremiseIDs and yield empty results. Rewritten to accept `postcode` + `paon`, resolve the PremiseID via getaddress (matching against Address1 / Address2 / Street with a substring fallback for compound addresses like "Flat 3, 22A"), then fetch the schedule. `premisesid` is still honoured if explicitly passed. `input.json` updated to use postcode `S60 1JD` + paon `77` so the parity check stays green and the integration test can hit the live API. Verified: returns the full upcoming bin schedule.
…strip supplementary notes from bin type Azure App Gateway intermittently returns 403 on direct requests without a JSESSIONID session cookie. Fix: use requests.Session() to visit the page first (obtaining JSESSIONID), then set the SVBINZONE cookie and fetch the bin data. Also fixes bin type strings incorrectly including supplementary notes (e.g. "Don't forget...", "Extra garden waste...") that appear after the actual bin description in the binextra divs.
📝 WalkthroughWalkthroughThe changes migrate two council bin collection scrapers to improved data sources: Rotherham switches from UPRN-based endpoint to postcode/house-number resolution via Imactivate backend with address lookup, while SouthOxfordshire improves session persistence and bin-type parsing logic. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Imactivate as Imactivate API
participant AddressAPI as Address Resolution
participant CollectionAPI as Collection Data API
Client->>Imactivate: POST /getaddress<br/>(postcode, paon)
Imactivate->>AddressAPI: Resolve address
AddressAPI-->>Imactivate: Return matched addresses
Imactivate-->>Client: Return PremiseID<br/>(from matching fields)
Client->>CollectionAPI: GET /collection<br/>(localauthority=Rotherham,<br/>PremiseID)
CollectionAPI-->>Client: Return collection schedules
Client->>Client: Sort by collectionDate<br/>& return bins
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py (2)
54-61: Comment vs. code drift.The comment says "Match against Address2 (house number/name) first, then Street", but the loop also checks
Address1. Either dropAddress1or update the comment so the precedence is documented accurately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py` around lines 54 - 61, The comment above the loop no longer matches the checked keys: the loop iterates for key in ("Address2", "Address1", "Street") but the comment only mentions Address2 then Street; update the comment to accurately describe the precedence (Address2, then Address1, then Street) or remove Address1 from the tuple if it was added accidentally; locate the loop that iterates over rows and keys ("Address2","Address1","Street") (which returns row.get("PremiseID") on match) and either change the comment to state "Match against Address2 (house number/name) first, then Address1, then Street" or drop "Address1" from the tuple to preserve the original comment.
134-139: Silently dropping rows on date-parse failure can hide upstream format changes.If Imactivate ever changes its
CollectionDateformat, every row will silentlycontinueandparse_datawill return an emptybinslist with no signal. Tighten the catch to(ValueError, TypeError)and at minimum log the offending payload so a format change is noticed.- try: - iso_date = date_str.split("T")[0] - parsed = datetime.strptime(iso_date, "%Y-%m-%d") - formatted = parsed.strftime(date_format) - except Exception: - continue + try: + iso_date = date_str.split("T")[0] + parsed = datetime.strptime(iso_date, "%Y-%m-%d") + formatted = parsed.strftime(date_format) + except (ValueError, TypeError) as exc: + print(f"Rotherham: skipping unparseable date {date_str!r}: {exc}") + continueBased on learnings: in
uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors so format changes are detected early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py` around lines 134 - 139, The current broad except in the date parsing block swallows all errors and silently skips rows; tighten the except to catch only (ValueError, TypeError) for the date parsing in the parse_data routine (the block that computes iso_date, parsed, formatted) and, when caught, log the offending payload/row (include the raw date_str and the full row or item) via the module/logger used in this file and then re-raise or explicitly raise a descriptive exception so upstream callers detect a format change instead of returning empty bins; this ensures issues in CollectionDate format are visible while still only catching parse-related errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py`:
- Around line 63-75: The substring fallback in the rows loop (building blob from
Address1/Address2/Street and checking `target in blob`) can falsely match short
paon values (e.g., "5" matching "15 High Street"); replace this raw substring
test with a token/word-boundary match (e.g., split blob into
whitespace/punctuation-separated tokens or use a regex with \b around the
normalized `target`) so only whole PAON tokens match, remove the redundant
`target and` guard, and keep the explicit ValueError raise if no proper
token-boundary match for `paon` is found (refer to the loop over `rows`, the
`blob` construction, `paon`/`target`, and the returned `PremiseID`).
In `@uk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py`:
- Around line 91-98: The stop-marker check is inconsistent: it uses
case-insensitive `"don't" in part.lower()` but case-sensitive
`part.startswith("Extra")`, so normalize the part before checking; in the loop
that builds type_parts (variables: bin_info, type_start, type_parts, bin_type)
convert each part to lowercase once (e.g., part_lower = part.lower()) and use
part_lower.startswith("extra") alongside the existing `"don't" in part_lower`
check to break, ensuring supplemental notes like "extra…" are removed regardless
of case.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py`:
- Around line 54-61: The comment above the loop no longer matches the checked
keys: the loop iterates for key in ("Address2", "Address1", "Street") but the
comment only mentions Address2 then Street; update the comment to accurately
describe the precedence (Address2, then Address1, then Street) or remove
Address1 from the tuple if it was added accidentally; locate the loop that
iterates over rows and keys ("Address2","Address1","Street") (which returns
row.get("PremiseID") on match) and either change the comment to state "Match
against Address2 (house number/name) first, then Address1, then Street" or drop
"Address1" from the tuple to preserve the original comment.
- Around line 134-139: The current broad except in the date parsing block
swallows all errors and silently skips rows; tighten the except to catch only
(ValueError, TypeError) for the date parsing in the parse_data routine (the
block that computes iso_date, parsed, formatted) and, when caught, log the
offending payload/row (include the raw date_str and the full row or item) via
the module/logger used in this file and then re-raise or explicitly raise a
descriptive exception so upstream callers detect a format change instead of
returning empty bins; this ensures issues in CollectionDate format are visible
while still only catching parse-related errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5ae0774f-1c16-4dee-9f11-5174ceb29910
📒 Files selected for processing (3)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.pyuk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py
| # Looser substring fallback so addresses like "Flat 3, 22A" match | ||
| # against a paon of "22A". | ||
| for row in rows: | ||
| blob = " ".join( | ||
| str(row.get(k, "")).strip() | ||
| for k in ("Address1", "Address2", "Street") | ||
| ).lower() | ||
| if target and target in blob: | ||
| return str(row.get("PremiseID")) | ||
|
|
||
| raise ValueError( | ||
| f"No address matching '{paon}' for postcode {postcode}" | ||
| ) |
There was a problem hiding this comment.
Substring fallback can silently match the wrong property.
target in blob does a raw substring test, so a paon of "5" will match an address blob like "15 High Street" and the first row wins. Because the upstream loop already handles exact matches, this fallback should be tightened (token / word‑boundary match) to avoid silently returning a wrong PremiseID for users with short numeric house numbers. The target and guard on line 70 is also redundant — target was already verified non‑empty at line 47.
Per the retrieved learning, prefer explicit failure over a silent wrong‑match here.
♻️ Suggested tighter match
- # Looser substring fallback so addresses like "Flat 3, 22A" match
- # against a paon of "22A".
- for row in rows:
- blob = " ".join(
- str(row.get(k, "")).strip()
- for k in ("Address1", "Address2", "Street")
- ).lower()
- if target and target in blob:
- return str(row.get("PremiseID"))
+ # Looser token-based fallback so addresses like "Flat 3, 22A" match
+ # against a paon of "22A", without "5" matching "15"/"25"/"50".
+ import re
+ for row in rows:
+ tokens = re.findall(
+ r"[\w]+",
+ " ".join(
+ str(row.get(k, "")) for k in ("Address1", "Address2", "Street")
+ ).lower(),
+ )
+ if target in tokens:
+ return str(row.get("PremiseID"))Based on learnings: in the UKBinCollectionData project, prefer explicit failures over silent error handling when parsing council bin collection data, to ensure format changes (or here, ambiguous matches) are detected immediately rather than potentially going unnoticed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Looser substring fallback so addresses like "Flat 3, 22A" match | |
| # against a paon of "22A". | |
| for row in rows: | |
| blob = " ".join( | |
| str(row.get(k, "")).strip() | |
| for k in ("Address1", "Address2", "Street") | |
| ).lower() | |
| if target and target in blob: | |
| return str(row.get("PremiseID")) | |
| raise ValueError( | |
| f"No address matching '{paon}' for postcode {postcode}" | |
| ) | |
| # Looser token-based fallback so addresses like "Flat 3, 22A" match | |
| # against a paon of "22A", without "5" matching "15"/"25"/"50". | |
| import re | |
| for row in rows: | |
| tokens = re.findall( | |
| r"[\w]+", | |
| " ".join( | |
| str(row.get(k, "")) for k in ("Address1", "Address2", "Street") | |
| ).lower(), | |
| ) | |
| if target in tokens: | |
| return str(row.get("PremiseID")) | |
| raise ValueError( | |
| f"No address matching '{paon}' for postcode {postcode}" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py` around
lines 63 - 75, The substring fallback in the rows loop (building blob from
Address1/Address2/Street and checking `target in blob`) can falsely match short
paon values (e.g., "5" matching "15 High Street"); replace this raw substring
test with a token/word-boundary match (e.g., split blob into
whitespace/punctuation-separated tokens or use a regex with \b around the
normalized `target`) so only whole PAON tokens match, remove the redundant
`target and` guard, and keep the explicit ValueError raise if no proper
token-boundary match for `paon` is found (refer to the loop over `rows`, the
`blob` construction, `paon`/`target`, and the returned `PremiseID`).
| # Strip supplementary notes (e.g. "Don't forget...", "Extra garden waste...") | ||
| # that follow the bin-type description. | ||
| type_parts = [] | ||
| for part in bin_info[type_start:]: | ||
| if "don't" in part.lower() or part.startswith("Extra"): | ||
| break | ||
| type_parts.append(part) | ||
| bin_type = str.capitalize(" ".join(type_parts)) |
There was a problem hiding this comment.
Make the "Extra" stop-marker check case-insensitive for consistency.
The "don't" in part.lower() check is case-insensitive, but part.startswith("Extra") is case-sensitive. If the site ever returns "extra garden waste…" or "EXTRA…", the supplementary note will leak back into bin_type and silently regress the cleanup this PR is introducing. Align both checks on a single normalized form.
🧹 Proposed fix
type_parts = []
for part in bin_info[type_start:]:
- if "don't" in part.lower() or part.startswith("Extra"):
+ lowered = part.lower()
+ if "don't" in lowered or lowered.startswith("extra"):
break
type_parts.append(part)
bin_type = str.capitalize(" ".join(type_parts))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py`
around lines 91 - 98, The stop-marker check is inconsistent: it uses
case-insensitive `"don't" in part.lower()` but case-sensitive
`part.startswith("Extra")`, so normalize the part before checking; in the loop
that builds type_parts (variables: bin_info, type_start, type_parts, bin_type)
convert each part to lowercase once (e.g., part_lower = part.lower()) and use
part_lower.startswith("extra") alongside the existing `"don't" in part_lower`
check to break, ensuring supplemental notes like "extra…" are removed regardless
of case.
|
Included in May 2026 Release PR #1992. Closing. |
What
Two fixes for South Oxfordshire District Council:
1. Intermittent 403 from Azure App Gateway
Direct requests to
eform.southoxon.gov.uk/ebase/BINZONE_DESKTOP.ebare intermittently rejected (HTTP 403) without a validJSESSIONIDsession cookie. Fix: userequests.Session()to visit the endpoint first (establishing the session), then set theSVBINZONEUPRN cookie and fetch the bin data.2. Bin type strings including supplementary notes
The page's
binextradivs sometimes contain extra notes after the bin description (e.g. "Don't forget - you can put out double your garden waste...", "Extra garden waste collected for brown bin subscriber."). The previous join of allstripped_stringselements included these notes in the bin type. Fix: stop building the type string at the first element containing "don't" or starting with "Extra".Test
Tested against UPRN
10033002851(frominput.json). Returns clean bin types:Green bin, textiles, food bin and garden waste binGrey bin, small electrical items and food binChecklist
input.jsonalready has"skip_get_url": true— no change neededSummary by CodeRabbit
Release Notes
Updates
Bug Fixes