fix: EastLothianCouncil - rewrite for Drupal migration (ASP site gone) by InertiaUK · Pull Request #1942 · robbrad/UKBinCollectionData

InertiaUK · 2026-04-10T11:05:03Z

East Lothian migrated from their old ASP.NET site to a Drupal-based platform. The old scraper relied on ASP form state that no longer exists.

The new site uses a 3-step form flow with CSRF tokens. The rewrite follows the form steps: POST the postcode, select the address from the results, then parse the schedule table from the final page. CSRF tokens are extracted from hidden inputs on each step.

Tested with an EH postcode in East Lothian.

Summary by CodeRabbit

Bug Fixes

Improved reliability of East Lothian Council bin collection schedule retrieval with enhanced address matching accuracy.
More robust extraction of collection dates and schedule information from council systems.
Enhanced error reporting when schedules cannot be retrieved.

coderabbitai · 2026-04-10T11:05:19Z

📝 Walkthrough

Walkthrough

The EastLothianCouncil module's collection data parsing was refactored to replace a two-step direct AJAX endpoint flow with a session-based multi-request interaction against a new base URL. The new logic captures form identifiers, fetches an address selection list via postcode, matches addresses using normalized text comparison, and parses collection dates from <time datetime> elements instead of previously parsed human-readable strings.

Changes

Cohort / File(s)	Summary
EastLothianCouncil request and parsing logic `uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py`	Rewrote `parse_data()` method to implement session-based flow: captures form_build_id from base page, posts postcode to retrieve address list, performs normalized address matching with special token handling, derives action URL from form, posts uprn to fetch schedule, and parses collection dates from time elements rather than parsed date strings. Added form/build identifier validation and enhanced error handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

dp247

Poem

🐰 Hop! Skip! The forms are rearranged,
Session flows now gracefully exchanged,
Dates jump from strings to datetime's embrace,
Addresses matched with normalization's grace! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: a rewrite of the EastLothianCouncil scraper to handle the migration from ASP to Drupal, with the parenthetical note explaining why the change was necessary.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py (2)

95-97: Consider extracting the domain as a constant.

The domain https://collectiondates.eastlothian.gov.uk appears at both line 22 (in base_url) and line 97. Extracting it would reduce duplication.

♻️ Suggested refactor

+        domain = "https://collectiondates.eastlothian.gov.uk"
+        base_url = f"{domain}/waste-collection-schedule"
         ...
         action_url = form.get("action", "")
         if action_url.startswith("/"):
-            action_url = "https://collectiondates.eastlothian.gov.uk" + action_url
+            action_url = domain + action_url

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py` around
lines 95 - 97, Extract the repeated domain string
"https://collectiondates.eastlothian.gov.uk" into a single constant (e.g.,
EASTLOTHIAN_DOMAIN or BASE_DOMAIN) at module scope in EastLothianCouncil.py and
replace the hard-coded occurrences used to build base_url and to prefix
action_url in the form-processing code; update the logic that currently does
action_url = "https://collectiondates.eastlothian.gov.uk" + action_url to use
the new constant (preserve the leading-slash handling) and replace the base_url
construction to reference the same constant so both locations share the single
source of truth.

27-31: Consider using a context manager for the session.

The requests.Session() is created but never explicitly closed. While Python's garbage collector will eventually handle it, using a context manager ensures connections are properly released.

♻️ Suggested refactor using context manager

The session usage spans the entire method, so wrap the logic in a with block:

-        session = requests.Session()
-
-        # Step 1: GET the page to obtain session cookie and form_build_id
-        response = session.get(base_url, headers=headers)
+        with requests.Session() as session:
+            # Step 1: GET the page to obtain session cookie and form_build_id
+            response = session.get(base_url, headers=headers)

Then indent the remaining session-dependent code within the with block.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py` around
lines 27 - 31, The session created with requests.Session() in
EastLothianCouncil.py should be used as a context manager to ensure it is
closed: replace the standalone session = requests.Session() with a with
requests.Session() as session: block and indent all subsequent session-dependent
calls (e.g., session.get(base_url, headers=headers), response handling, and any
further requests) inside that block so connections are properly released when
the method (or function) finishes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py`:
- Around line 118-142: The parsing loop over days (variable days) currently
swallows missing elements (time_el/type_el) and parsing errors
(datetime.strptime ValueError) which can hide HTML changes; update the loop in
EastLothianCouncil.py to record/log parsing problems and fail fast: when time_el
or type_el are missing or date_str/waste_type are empty, append a descriptive
error entry or raise an exception (including the offending day's HTML/snippet)
instead of continue, and when datetime.strptime raises ValueError capture the
error and include the raw date_str in the error/exception; after the loop, if
bindata["bins"] is empty, raise an explicit error indicating no valid collection
entries were parsed so CI/consumers surface format changes.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py`:
- Around line 95-97: Extract the repeated domain string
"https://collectiondates.eastlothian.gov.uk" into a single constant (e.g.,
EASTLOTHIAN_DOMAIN or BASE_DOMAIN) at module scope in EastLothianCouncil.py and
replace the hard-coded occurrences used to build base_url and to prefix
action_url in the form-processing code; update the logic that currently does
action_url = "https://collectiondates.eastlothian.gov.uk" + action_url to use
the new constant (preserve the leading-slash handling) and replace the base_url
construction to reference the same constant so both locations share the single
source of truth.
- Around line 27-31: The session created with requests.Session() in
EastLothianCouncil.py should be used as a context manager to ensure it is
closed: replace the standalone session = requests.Session() with a with
requests.Session() as session: block and indent all subsequent session-dependent
calls (e.g., session.get(base_url, headers=headers), response handling, and any
further requests) inside that block so connections are properly released when
the method (or function) finishes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2dcd0dc2-834e-4c6c-baa0-4eb9868d61b6

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and 6cf1af0.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py

coderabbitai · 2026-04-10T11:08:52Z

+        # Parse collection dates from <time datetime="YYYY-MM-DD"> elements
+        days = soup.find_all("li", class_="waste-collection__day")
+        for day in days:
+            time_el = day.find("time")
+            type_el = day.find("span", class_="waste-collection__day--type")
+
+            if not time_el or not type_el:
+                continue
+
+            date_str = time_el.get("datetime", "")
+            waste_type = type_el.text.strip()
+
+            if not date_str or not waste_type:
+                continue
+
            try:
-                collection_date = datetime.strptime(
-                    remove_ordinal_indicator_from_date_string(waste_value),
-                    "%A %d %B %Y",
+                collection_date = datetime.strptime(date_str, "%Y-%m-%d")
+                bindata["bins"].append(
+                    {
+                        "type": waste_type,
+                        "collectionDate": collection_date.strftime(date_format),
+                    }
                )
-
-                bindata["bins"].append({
-                    "type": waste_label.replace(" is:", ""),
-                    "collectionDate": collection_date.strftime(date_format),
-                })
            except ValueError:
                continue


⚠️ Potential issue | 🟡 Minor

Silent failures may mask parsing issues.

The code silently continues when elements are missing (lines 124-125, 130-131) or when date parsing fails (lines 141-142). If the council changes their HTML structure, this would return an empty result with no indication of what went wrong.

Based on learnings from this repository: prefer explicit failures over silent defaults to ensure format changes are detected early.

🛡️ Suggested improvement: fail if no valid entries are parsed

# Parse collection dates from <time datetime="YYYY-MM-DD"> elements days = soup.find_all("li", class_="waste-collection__day") + if not days: + raise ValueError("No collection day elements found - page structure may have changed") + for day in days: time_el = day.find("time") type_el = day.find("span", class_="waste-collection__day--type") if not time_el or not type_el: continue date_str = time_el.get("datetime", "") waste_type = type_el.text.strip() if not date_str or not waste_type: continue try: collection_date = datetime.strptime(date_str, "%Y-%m-%d") bindata["bins"].append( { "type": waste_type, "collectionDate": collection_date.strftime(date_format), } ) except ValueError: continue + if not bindata["bins"]: + raise ValueError( + f"No valid collection dates parsed for {user_postcode} - page structure may have changed" + ) + bindata["bins"].sort(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py` around lines 118 - 142, The parsing loop over days (variable days) currently swallows missing elements (time_el/type_el) and parsing errors (datetime.strptime ValueError) which can hide HTML changes; update the loop in EastLothianCouncil.py to record/log parsing problems and fail fast: when time_el or type_el are missing or date_str/waste_type are empty, append a descriptive error entry or raise an exception (including the offending day's HTML/snippet) instead of continue, and when datetime.strptime raises ValueError capture the error and include the raw date_str in the error/exception; after the loop, if bindata["bins"] is empty, raise an explicit error indicating no valid collection entries were parsed so CI/consumers surface format changes.

robbrad · 2026-05-01T20:21:18Z

Included in May 2026 Release PR #1992. Closing.

fix: EastLothianCouncil - rewrite for Drupal migration (ASP site gone)

6cf1af0

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

robbrad mentioned this pull request May 1, 2026

May 2026 Release #1992

Merged

robbrad closed this May 1, 2026

robbrad mentioned this pull request May 2, 2026

May 2026 Release #1996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: EastLothianCouncil - rewrite for Drupal migration (ASP site gone)#1942

fix: EastLothianCouncil - rewrite for Drupal migration (ASP site gone)#1942
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/EastLothianCouncil

InertiaUK commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

InertiaUK commented Apr 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

InertiaUK commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading