fix: EastLothianCouncil - rewrite for Drupal migration (ASP site gone)#1942
fix: EastLothianCouncil - rewrite for Drupal migration (ASP site gone)#1942InertiaUK wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThe EastLothianCouncil module's collection data parsing was refactored to replace a two-step direct AJAX endpoint flow with a session-based multi-request interaction against a new base URL. The new logic captures form identifiers, fetches an address selection list via postcode, matches addresses using normalized text comparison, and parses collection dates from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py (2)
95-97: Consider extracting the domain as a constant.The domain
https://collectiondates.eastlothian.gov.ukappears at both line 22 (inbase_url) and line 97. Extracting it would reduce duplication.♻️ Suggested refactor
+ domain = "https://collectiondates.eastlothian.gov.uk" + base_url = f"{domain}/waste-collection-schedule" ... action_url = form.get("action", "") if action_url.startswith("/"): - action_url = "https://collectiondates.eastlothian.gov.uk" + action_url + action_url = domain + action_url🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py` around lines 95 - 97, Extract the repeated domain string "https://collectiondates.eastlothian.gov.uk" into a single constant (e.g., EASTLOTHIAN_DOMAIN or BASE_DOMAIN) at module scope in EastLothianCouncil.py and replace the hard-coded occurrences used to build base_url and to prefix action_url in the form-processing code; update the logic that currently does action_url = "https://collectiondates.eastlothian.gov.uk" + action_url to use the new constant (preserve the leading-slash handling) and replace the base_url construction to reference the same constant so both locations share the single source of truth.
27-31: Consider using a context manager for the session.The
requests.Session()is created but never explicitly closed. While Python's garbage collector will eventually handle it, using a context manager ensures connections are properly released.♻️ Suggested refactor using context manager
The session usage spans the entire method, so wrap the logic in a
withblock:- session = requests.Session() - - # Step 1: GET the page to obtain session cookie and form_build_id - response = session.get(base_url, headers=headers) + with requests.Session() as session: + # Step 1: GET the page to obtain session cookie and form_build_id + response = session.get(base_url, headers=headers)Then indent the remaining session-dependent code within the
withblock.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py` around lines 27 - 31, The session created with requests.Session() in EastLothianCouncil.py should be used as a context manager to ensure it is closed: replace the standalone session = requests.Session() with a with requests.Session() as session: block and indent all subsequent session-dependent calls (e.g., session.get(base_url, headers=headers), response handling, and any further requests) inside that block so connections are properly released when the method (or function) finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py`:
- Around line 118-142: The parsing loop over days (variable days) currently
swallows missing elements (time_el/type_el) and parsing errors
(datetime.strptime ValueError) which can hide HTML changes; update the loop in
EastLothianCouncil.py to record/log parsing problems and fail fast: when time_el
or type_el are missing or date_str/waste_type are empty, append a descriptive
error entry or raise an exception (including the offending day's HTML/snippet)
instead of continue, and when datetime.strptime raises ValueError capture the
error and include the raw date_str in the error/exception; after the loop, if
bindata["bins"] is empty, raise an explicit error indicating no valid collection
entries were parsed so CI/consumers surface format changes.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py`:
- Around line 95-97: Extract the repeated domain string
"https://collectiondates.eastlothian.gov.uk" into a single constant (e.g.,
EASTLOTHIAN_DOMAIN or BASE_DOMAIN) at module scope in EastLothianCouncil.py and
replace the hard-coded occurrences used to build base_url and to prefix
action_url in the form-processing code; update the logic that currently does
action_url = "https://collectiondates.eastlothian.gov.uk" + action_url to use
the new constant (preserve the leading-slash handling) and replace the base_url
construction to reference the same constant so both locations share the single
source of truth.
- Around line 27-31: The session created with requests.Session() in
EastLothianCouncil.py should be used as a context manager to ensure it is
closed: replace the standalone session = requests.Session() with a with
requests.Session() as session: block and indent all subsequent session-dependent
calls (e.g., session.get(base_url, headers=headers), response handling, and any
further requests) inside that block so connections are properly released when
the method (or function) finishes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2dcd0dc2-834e-4c6c-baa0-4eb9868d61b6
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py
| # Parse collection dates from <time datetime="YYYY-MM-DD"> elements | ||
| days = soup.find_all("li", class_="waste-collection__day") | ||
| for day in days: | ||
| time_el = day.find("time") | ||
| type_el = day.find("span", class_="waste-collection__day--type") | ||
|
|
||
| if not time_el or not type_el: | ||
| continue | ||
|
|
||
| date_str = time_el.get("datetime", "") | ||
| waste_type = type_el.text.strip() | ||
|
|
||
| if not date_str or not waste_type: | ||
| continue | ||
|
|
||
| try: | ||
| collection_date = datetime.strptime( | ||
| remove_ordinal_indicator_from_date_string(waste_value), | ||
| "%A %d %B %Y", | ||
| collection_date = datetime.strptime(date_str, "%Y-%m-%d") | ||
| bindata["bins"].append( | ||
| { | ||
| "type": waste_type, | ||
| "collectionDate": collection_date.strftime(date_format), | ||
| } | ||
| ) | ||
|
|
||
| bindata["bins"].append({ | ||
| "type": waste_label.replace(" is:", ""), | ||
| "collectionDate": collection_date.strftime(date_format), | ||
| }) | ||
| except ValueError: | ||
| continue |
There was a problem hiding this comment.
Silent failures may mask parsing issues.
The code silently continues when elements are missing (lines 124-125, 130-131) or when date parsing fails (lines 141-142). If the council changes their HTML structure, this would return an empty result with no indication of what went wrong.
Based on learnings from this repository: prefer explicit failures over silent defaults to ensure format changes are detected early.
🛡️ Suggested improvement: fail if no valid entries are parsed
# Parse collection dates from <time datetime="YYYY-MM-DD"> elements
days = soup.find_all("li", class_="waste-collection__day")
+ if not days:
+ raise ValueError("No collection day elements found - page structure may have changed")
+
for day in days:
time_el = day.find("time")
type_el = day.find("span", class_="waste-collection__day--type")
if not time_el or not type_el:
continue
date_str = time_el.get("datetime", "")
waste_type = type_el.text.strip()
if not date_str or not waste_type:
continue
try:
collection_date = datetime.strptime(date_str, "%Y-%m-%d")
bindata["bins"].append(
{
"type": waste_type,
"collectionDate": collection_date.strftime(date_format),
}
)
except ValueError:
continue
+ if not bindata["bins"]:
+ raise ValueError(
+ f"No valid collection dates parsed for {user_postcode} - page structure may have changed"
+ )
+
bindata["bins"].sort(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/EastLothianCouncil.py` around
lines 118 - 142, The parsing loop over days (variable days) currently swallows
missing elements (time_el/type_el) and parsing errors (datetime.strptime
ValueError) which can hide HTML changes; update the loop in
EastLothianCouncil.py to record/log parsing problems and fail fast: when time_el
or type_el are missing or date_str/waste_type are empty, append a descriptive
error entry or raise an exception (including the offending day's HTML/snippet)
instead of continue, and when datetime.strptime raises ValueError capture the
error and include the raw date_str in the error/exception; after the loop, if
bindata["bins"] is empty, raise an explicit error indicating no valid collection
entries were parsed so CI/consumers surface format changes.
|
Included in May 2026 Release PR #1992. Closing. |
East Lothian migrated from their old ASP.NET site to a Drupal-based platform. The old scraper relied on ASP form state that no longer exists.
The new site uses a 3-step form flow with CSRF tokens. The rewrite follows the form steps: POST the postcode, select the address from the results, then parse the schedule table from the final page. CSRF tokens are extracted from hidden inputs on each step.
Tested with an EH postcode in East Lothian.
Summary by CodeRabbit
Bug Fixes