fix: SouthOxfordshireCouncil - session establishment + bin type cleanup by InertiaUK · Pull Request #1983 · robbrad/UKBinCollectionData

InertiaUK · 2026-04-27T15:54:42Z

What

Two fixes for South Oxfordshire District Council:

1. Intermittent 403 from Azure App Gateway

Direct requests to eform.southoxon.gov.uk/ebase/BINZONE_DESKTOP.eb are intermittently rejected (HTTP 403) without a valid JSESSIONID session cookie. Fix: use requests.Session() to visit the endpoint first (establishing the session), then set the SVBINZONE UPRN cookie and fetch the bin data.

2. Bin type strings including supplementary notes

The page's binextra divs sometimes contain extra notes after the bin description (e.g. "Don't forget - you can put out double your garden waste...", "Extra garden waste collected for brown bin subscriber."). The previous join of all stripped_strings elements included these notes in the bin type. Fix: stop building the type string at the first element containing "don't" or starting with "Extra".

Test

Tested against UPRN 10033002851 (from input.json). Returns clean bin types:

Green bin, textiles, food bin and garden waste bin
Grey bin, small electrical items and food bin

Checklist

input.json already has "skip_get_url": true — no change needed

Summary by CodeRabbit

Release Notes

Updates
- Rotherham: Collection lookup now uses postcode and house number instead of UPRN.
Bug Fixes
- South Oxfordshire: Fixed intermittent connection issues affecting bin collection lookups and improved bin type descriptions for clarity.
- Rotherham: Enhanced reliability and robustness through improved data source integration.

Rotherham's own bin-day page directs residents to PDF calendars only — there is no usable web lookup at rotherham.gov.uk. The Rotherham Bins mobile app uses the shared Imactivate backend at `bins.azurewebsites.net`. Earlier this year that endpoint had no Rotherham data deployed; as of April 2026 it does: - `getaddress?postcode=S60+1JD&localauthority=Rotherham` returns the full address list with `PremiseID` per row. - `getcollections?premisesid=<id>&localauthority=Rotherham` returns 6 future collections (PINK / BLACK / GREEN bins) per property. The previous scraper only accepted a `premisesid` (or treated a numeric `uprn` as one), but standard UPRNs are not Imactivate PremiseIDs and yield empty results. Rewritten to accept `postcode` + `paon`, resolve the PremiseID via getaddress (matching against Address1 / Address2 / Street with a substring fallback for compound addresses like "Flat 3, 22A"), then fetch the schedule. `premisesid` is still honoured if explicitly passed. `input.json` updated to use postcode `S60 1JD` + paon `77` so the parity check stays green and the integration test can hit the live API. Verified: returns the full upcoming bin schedule.

…strip supplementary notes from bin type Azure App Gateway intermittently returns 403 on direct requests without a JSESSIONID session cookie. Fix: use requests.Session() to visit the page first (obtaining JSESSIONID), then set the SVBINZONE cookie and fetch the bin data. Also fixes bin type strings incorrectly including supplementary notes (e.g. "Don't forget...", "Extra garden waste...") that appear after the actual bin description in the binextra divs.

coderabbitai · 2026-04-27T15:54:56Z

📝 Walkthrough

Walkthrough

The changes migrate two council bin collection scrapers to improved data sources: Rotherham switches from UPRN-based endpoint to postcode/house-number resolution via Imactivate backend with address lookup, while SouthOxfordshire improves session persistence and bin-type parsing logic.

Changes

Cohort / File(s)	Summary
Rotherham Configuration `uk_bin_collection/tests/input.json`	Updates RotherhamCouncil entry from UPRN-based URL pattern to postcode/house-number approach with Imactivate backend; sets `skip_get_url` to `true` and rewrites `wiki_note` describing new input requirements.
Rotherham Implementation `uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py`	Switches data source to Imactivate's shared backend, adds BASE/LA/UA constants and request headers/timeouts, implements premise resolution via postcode+paon lookup through `/getaddress` endpoint, maintains backward UPRN compatibility, improves payload handling, and sorts results by collection date.
SouthOxfordshire Implementation `uk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py`	Introduces persistent `requests.Session` with two-step fetch to handle session state, refines bin-type parsing by truncating supplementary notes before capitalization.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Imactivate as Imactivate API
    participant AddressAPI as Address Resolution
    participant CollectionAPI as Collection Data API
    
    Client->>Imactivate: POST /getaddress<br/>(postcode, paon)
    Imactivate->>AddressAPI: Resolve address
    AddressAPI-->>Imactivate: Return matched addresses
    Imactivate-->>Client: Return PremiseID<br/>(from matching fields)
    
    Client->>CollectionAPI: GET /collection<br/>(localauthority=Rotherham,<br/>PremiseID)
    CollectionAPI-->>Client: Return collection schedules
    Client->>Client: Sort by collectionDate<br/>& return bins

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add direct mobile API support to North Hertfordshire District Council scraper #1715: Converts council scraper from web scraping to direct API-backed implementation with address-to-premise resolution and shared backend integration.

Suggested reviewers

dp247

Poem

🐰 A rabbit's rejoicing!

From UPRN to postcode we spring,
Imactivate's service now takes the ring,
Sessions persist and addresses align,
Two councils improved, their data divine! 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title focuses exclusively on SouthOxfordshireCouncil, but the changeset includes significant updates to RotherhamCouncil as well.	Update the title to reflect all major changes, such as: 'fix: Update RotherhamCouncil to Imactivate backend and fix SouthOxfordshireCouncil session handling' or split into separate PRs.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py (2)

54-61: Comment vs. code drift.

The comment says "Match against Address2 (house number/name) first, then Street", but the loop also checks Address1. Either drop Address1 or update the comment so the precedence is documented accurately.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py` around
lines 54 - 61, The comment above the loop no longer matches the checked keys:
the loop iterates for key in ("Address2", "Address1", "Street") but the comment
only mentions Address2 then Street; update the comment to accurately describe
the precedence (Address2, then Address1, then Street) or remove Address1 from
the tuple if it was added accidentally; locate the loop that iterates over rows
and keys ("Address2","Address1","Street") (which returns row.get("PremiseID") on
match) and either change the comment to state "Match against Address2 (house
number/name) first, then Address1, then Street" or drop "Address1" from the
tuple to preserve the original comment.

134-139: Silently dropping rows on date-parse failure can hide upstream format changes.

If Imactivate ever changes its CollectionDate format, every row will silently continue and parse_data will return an empty bins list with no signal. Tighten the catch to (ValueError, TypeError) and at minimum log the offending payload so a format change is noticed.

-            try:
-                iso_date = date_str.split("T")[0]
-                parsed = datetime.strptime(iso_date, "%Y-%m-%d")
-                formatted = parsed.strftime(date_format)
-            except Exception:
-                continue
+            try:
+                iso_date = date_str.split("T")[0]
+                parsed = datetime.strptime(iso_date, "%Y-%m-%d")
+                formatted = parsed.strftime(date_format)
+            except (ValueError, TypeError) as exc:
+                print(f"Rotherham: skipping unparseable date {date_str!r}: {exc}")
+                continue

Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors so format changes are detected early.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py` around
lines 134 - 139, The current broad except in the date parsing block swallows all
errors and silently skips rows; tighten the except to catch only (ValueError,
TypeError) for the date parsing in the parse_data routine (the block that
computes iso_date, parsed, formatted) and, when caught, log the offending
payload/row (include the raw date_str and the full row or item) via the
module/logger used in this file and then re-raise or explicitly raise a
descriptive exception so upstream callers detect a format change instead of
returning empty bins; this ensures issues in CollectionDate format are visible
while still only catching parse-related errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py`:
- Around line 63-75: The substring fallback in the rows loop (building blob from
Address1/Address2/Street and checking `target in blob`) can falsely match short
paon values (e.g., "5" matching "15 High Street"); replace this raw substring
test with a token/word-boundary match (e.g., split blob into
whitespace/punctuation-separated tokens or use a regex with \b around the
normalized `target`) so only whole PAON tokens match, remove the redundant
`target and` guard, and keep the explicit ValueError raise if no proper
token-boundary match for `paon` is found (refer to the loop over `rows`, the
`blob` construction, `paon`/`target`, and the returned `PremiseID`).

In `@uk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py`:
- Around line 91-98: The stop-marker check is inconsistent: it uses
case-insensitive `"don't" in part.lower()` but case-sensitive
`part.startswith("Extra")`, so normalize the part before checking; in the loop
that builds type_parts (variables: bin_info, type_start, type_parts, bin_type)
convert each part to lowercase once (e.g., part_lower = part.lower()) and use
part_lower.startswith("extra") alongside the existing `"don't" in part_lower`
check to break, ensuring supplemental notes like "extra…" are removed regardless
of case.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py`:
- Around line 54-61: The comment above the loop no longer matches the checked
keys: the loop iterates for key in ("Address2", "Address1", "Street") but the
comment only mentions Address2 then Street; update the comment to accurately
describe the precedence (Address2, then Address1, then Street) or remove
Address1 from the tuple if it was added accidentally; locate the loop that
iterates over rows and keys ("Address2","Address1","Street") (which returns
row.get("PremiseID") on match) and either change the comment to state "Match
against Address2 (house number/name) first, then Address1, then Street" or drop
"Address1" from the tuple to preserve the original comment.
- Around line 134-139: The current broad except in the date parsing block
swallows all errors and silently skips rows; tighten the except to catch only
(ValueError, TypeError) for the date parsing in the parse_data routine (the
block that computes iso_date, parsed, formatted) and, when caught, log the
offending payload/row (include the raw date_str and the full row or item) via
the module/logger used in this file and then re-raise or explicitly raise a
descriptive exception so upstream callers detect a format change instead of
returning empty bins; this ensures issues in CollectionDate format are visible
while still only catching parse-related errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5ae0774f-1c16-4dee-9f11-5174ceb29910

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and bf6dd1f.

📒 Files selected for processing (3)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py
uk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py

coderabbitai · 2026-04-27T15:59:03Z

+        # Looser substring fallback so addresses like "Flat 3, 22A" match
+        # against a paon of "22A".
+        for row in rows:
+            blob = " ".join(
+                str(row.get(k, "")).strip()
+                for k in ("Address1", "Address2", "Street")
+            ).lower()
+            if target and target in blob:
+                return str(row.get("PremiseID"))
+
+        raise ValueError(
+            f"No address matching '{paon}' for postcode {postcode}"
+        )


⚠️ Potential issue | 🟡 Minor

Substring fallback can silently match the wrong property.

target in blob does a raw substring test, so a paon of "5" will match an address blob like "15 High Street" and the first row wins. Because the upstream loop already handles exact matches, this fallback should be tightened (token / word‑boundary match) to avoid silently returning a wrong PremiseID for users with short numeric house numbers. The target and guard on line 70 is also redundant — target was already verified non‑empty at line 47.

Per the retrieved learning, prefer explicit failure over a silent wrong‑match here.

♻️ Suggested tighter match

- # Looser substring fallback so addresses like "Flat 3, 22A" match - # against a paon of "22A". - for row in rows: - blob = " ".join( - str(row.get(k, "")).strip() - for k in ("Address1", "Address2", "Street") - ).lower() - if target and target in blob: - return str(row.get("PremiseID")) + # Looser token-based fallback so addresses like "Flat 3, 22A" match + # against a paon of "22A", without "5" matching "15"/"25"/"50". + import re + for row in rows: + tokens = re.findall( + r"[\w]+", + " ".join( + str(row.get(k, "")) for k in ("Address1", "Address2", "Street") + ).lower(), + ) + if target in tokens: + return str(row.get("PremiseID"))

Based on learnings: in the UKBinCollectionData project, prefer explicit failures over silent error handling when parsing council bin collection data, to ensure format changes (or here, ambiguous matches) are detected immediately rather than potentially going unnoticed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Looser substring fallback so addresses like "Flat 3, 22A" match

# against a paon of "22A".

for row in rows:

blob = " ".join(

str(row.get(k, "")).strip()

for k in ("Address1", "Address2", "Street")

).lower()

if target and target in blob:

return str(row.get("PremiseID"))

raise ValueError(

f"No address matching '{paon}' for postcode {postcode}"

)

# Looser token-based fallback so addresses like "Flat 3, 22A" match

# against a paon of "22A", without "5" matching "15"/"25"/"50".

import re

for row in rows:

tokens = re.findall(

r"[\w]+",

" ".join(

str(row.get(k, "")) for k in ("Address1", "Address2", "Street")

).lower(),

)

if target in tokens:

return str(row.get("PremiseID"))

raise ValueError(

f"No address matching '{paon}' for postcode {postcode}"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/RotherhamCouncil.py` around lines 63 - 75, The substring fallback in the rows loop (building blob from Address1/Address2/Street and checking `target in blob`) can falsely match short paon values (e.g., "5" matching "15 High Street"); replace this raw substring test with a token/word-boundary match (e.g., split blob into whitespace/punctuation-separated tokens or use a regex with \b around the normalized `target`) so only whole PAON tokens match, remove the redundant `target and` guard, and keep the explicit ValueError raise if no proper token-boundary match for `paon` is found (refer to the loop over `rows`, the `blob` construction, `paon`/`target`, and the returned `PremiseID`).

coderabbitai · 2026-04-27T15:59:03Z

+                # Strip supplementary notes (e.g. "Don't forget...", "Extra garden waste...")
+                # that follow the bin-type description.
+                type_parts = []
+                for part in bin_info[type_start:]:
+                    if "don't" in part.lower() or part.startswith("Extra"):
+                        break
+                    type_parts.append(part)
+                bin_type = str.capitalize(" ".join(type_parts))


⚠️ Potential issue | 🟡 Minor

Make the "Extra" stop-marker check case-insensitive for consistency.

The "don't" in part.lower() check is case-insensitive, but part.startswith("Extra") is case-sensitive. If the site ever returns "extra garden waste…" or "EXTRA…", the supplementary note will leak back into bin_type and silently regress the cleanup this PR is introducing. Align both checks on a single normalized form.

🧹 Proposed fix

type_parts = [] for part in bin_info[type_start:]: - if "don't" in part.lower() or part.startswith("Extra"): + lowered = part.lower() + if "don't" in lowered or lowered.startswith("extra"): break type_parts.append(part) bin_type = str.capitalize(" ".join(type_parts))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/SouthOxfordshireCouncil.py` around lines 91 - 98, The stop-marker check is inconsistent: it uses case-insensitive `"don't" in part.lower()` but case-sensitive `part.startswith("Extra")`, so normalize the part before checking; in the loop that builds type_parts (variables: bin_info, type_start, type_parts, bin_type) convert each part to lowercase once (e.g., part_lower = part.lower()) and use part_lower.startswith("extra") alongside the existing `"don't" in part_lower` check to break, ensuring supplemental notes like "extra…" are removed regardless of case.

robbrad · 2026-05-01T20:22:26Z

Included in May 2026 Release PR #1992. Closing.

InertiaUK added 2 commits April 25, 2026 13:19

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

robbrad mentioned this pull request May 1, 2026

May 2026 Release #1992

Merged

robbrad closed this May 1, 2026

robbrad mentioned this pull request May 2, 2026

May 2026 Release #1996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: SouthOxfordshireCouncil - session establishment + bin type cleanup#1983

fix: SouthOxfordshireCouncil - session establishment + bin type cleanup#1983
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/south-oxfordshire-session-and-bin-type

InertiaUK commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

InertiaUK commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Test

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

InertiaUK commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading