fix: EnvironmentFirst/Eastbourne/Lewes - handle restructured page#1979
fix: EnvironmentFirst/Eastbourne/Lewes - handle restructured page#1979InertiaUK wants to merge 1 commit into
Conversation
The environmentfirst.co.uk results page (used directly by EnvironmentFirst,
EastbourneBoroughCouncil and LewesDistrictCouncil) was restructured in
April 2026:
- The dates are now wrapped in a `<strong>` inside the same `<p>` that
carries the label, with no intervening separator paragraphs. The
previous indices `page_text[2]` (rubbish), `[4]` (recycling), `[6]`
(garden) no longer line up — they are now `[1]/[2]/[3]`.
- The date string itself gained a leading weekday, so values like
`Tuesday 28th April 2026` no longer match the existing
`'%d %B %Y'` strptime format and fail with `ValueError: time data
'...' does not match format`.
Rewritten to identify each row by the surrounding label
(rubbish/recycling/garden) rather than a fixed index, and to extract
the date with a regex (`\d{1,2}(?:st|nd|rd|th)?\s+\w+\s+\d{4}`) so the
weekday prefix is ignored. Verified against UPRN 100060011258
(Eastbourne) and 100061930155 (Lewes) — full schedules return.
📝 WalkthroughWalkthroughThree council bin collection parsers were refactored from rigid index-based HTML extraction to flexible, content-driven parsing. They now dynamically iterate over paragraph elements within Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py (1)
1-1: Recommend extracting the shared parser to avoid maintaining three copies.The leading comment ("Legacy script. Copied to Lewes and Eastbourne.") and the byte-for-byte identical parse blocks across
EnvironmentFirst.py,LewesDistrictCouncil.pyandEastbourneBoroughCouncil.pymean this fix has had to be applied three times — and any future page change will need three identical edits. Consider hoisting thediv.collect→<p>→<strong>parsing into a single helper (e.g. incommon.pyor a new_environmentfirst.pymodule) and have all three council classes delegate to it. The PR description itself notes the page layout has shifted twice in twelve months, so this is a recurring cost.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py` at line 1, The three council modules (EnvironmentFirst.py, LewesDistrictCouncil.py, EastbourneBoroughCouncil.py) duplicate the same DOM parsing logic that walks div.collect → <p> → <strong>; extract that block into a single helper function (e.g. parse_environmentfirst_schedule) placed in a shared module (common.py or _environmentfirst.py) and have each council class call that helper instead of duplicating the parse code; update the council classes to import and delegate to the new function (keep the same input/output contract so callers in EnvironmentFirst, LewesDistrictCouncil, and EastbourneBoroughCouncil need only replace the inline parse block with a single function call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py`:
- Around line 35-71: The parser currently returns an empty data["bins"] silently
when parsing fails; update LewesDistrictCouncil.py so that if the collect_div
variable is None you raise an informative exception (e.g.,
RuntimeError("div.collect not found")), and after the parsing loop if
data["bins"] is still empty raise another informative exception (e.g.,
RuntimeError("no collection entries parsed from div.collect")). Also stop
swallowing date parse errors: replace the except ValueError block around
datetime.strptime(...) so it re-raises (or raises a new RuntimeError) including
the offending cleaned string and context (mentioning
remove_ordinal_indicator_from_date_string and the date_pattern) instead of
continue, so format changes are surfaced immediately.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py`:
- Line 1: The three council modules (EnvironmentFirst.py,
LewesDistrictCouncil.py, EastbourneBoroughCouncil.py) duplicate the same DOM
parsing logic that walks div.collect → <p> → <strong>; extract that block into a
single helper function (e.g. parse_environmentfirst_schedule) placed in a shared
module (common.py or _environmentfirst.py) and have each council class call that
helper instead of duplicating the parse code; update the council classes to
import and delegate to the new function (keep the same input/output contract so
callers in EnvironmentFirst, LewesDistrictCouncil, and EastbourneBoroughCouncil
need only replace the inline parse block with a single function call).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c9dee2f8-c62f-452f-9ec8-5310a89c84c6
📒 Files selected for processing (3)
uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.pyuk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.pyuk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py
| collect_div = soup.find("div", {"class": "collect"}) | ||
| if collect_div is None: | ||
| return data | ||
|
|
||
| # Parse the correct lines (find them, remove the ordinal indicator and make them the correct format date) and | ||
| # then add them to the dictionary | ||
| rubbish_day = datetime.strptime( | ||
| remove_ordinal_indicator_from_date_string( | ||
| page_text[2].find_next("strong").text | ||
| ), | ||
| "%d %B %Y", | ||
| ).strftime(date_format) | ||
| dict_data = { | ||
| "type": "Rubbish", | ||
| "collectionDate": rubbish_day, | ||
| } | ||
| data["bins"].append(dict_data) | ||
| recycling_day = datetime.strptime( | ||
| remove_ordinal_indicator_from_date_string( | ||
| page_text[4].find_next("strong").text | ||
| ), | ||
| "%d %B %Y", | ||
| ).strftime(date_format) | ||
| dict_data = { | ||
| "type": "Recycling", | ||
| "collectionDate": recycling_day, | ||
| } | ||
| data["bins"].append(dict_data) | ||
| date_pattern = re.compile( | ||
| r"(\d{1,2}(?:st|nd|rd|th)?\s+[A-Za-z]+\s+\d{4})" | ||
| ) | ||
|
|
||
| if len(page_text) > 5: | ||
| garden_day = datetime.strptime( | ||
| remove_ordinal_indicator_from_date_string( | ||
| page_text[6].find_next("strong").text | ||
| ), | ||
| "%d %B %Y", | ||
| ).strftime(date_format) | ||
| dict_data = { | ||
| "type": "Garden", | ||
| "collectionDate": garden_day, | ||
| } | ||
| data["bins"].append(dict_data) | ||
| for p in collect_div.find_all("p"): | ||
| strong = p.find("strong") | ||
| if not strong: | ||
| continue | ||
| label = p.get_text(" ", strip=True).lower() | ||
| if "rubbish" in label: | ||
| bin_type = "Rubbish" | ||
| elif "recycling" in label: | ||
| bin_type = "Recycling" | ||
| elif "garden" in label: | ||
| bin_type = "Garden" | ||
| else: | ||
| continue | ||
| match = date_pattern.search(strong.get_text(" ", strip=True)) | ||
| if not match: | ||
| continue | ||
| cleaned = remove_ordinal_indicator_from_date_string(match.group(1)) | ||
| try: | ||
| collection_date = datetime.strptime( | ||
| cleaned, "%d %B %Y" | ||
| ).strftime(date_format) | ||
| except ValueError: | ||
| continue | ||
| data["bins"].append( | ||
| { | ||
| "type": bin_type, | ||
| "collectionDate": collection_date, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Avoid silently returning empty schedules when parsing fails.
If the site changes shape again, this implementation degrades to an empty bins list without raising — users will see "no upcoming collections" instead of a clear error. Consider raising when div.collect is found but no entries could be parsed (and arguably when div.collect itself is missing). The per-<p> skip on missing <strong>/non-matching keyword is fine, but the ValueError on date parsing is exactly the kind of format-shift signal that should not be swallowed silently.
Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors so that format changes are detected early.
🛡️ Suggested minimal hardening
data = {"bins": []}
collect_div = soup.find("div", {"class": "collect"})
if collect_div is None:
- return data
+ raise ValueError(
+ "Could not locate <div class='collect'> – page layout may have changed."
+ )
@@
cleaned = remove_ordinal_indicator_from_date_string(match.group(1))
- try:
- collection_date = datetime.strptime(
- cleaned, "%d %B %Y"
- ).strftime(date_format)
- except ValueError:
- continue
+ collection_date = datetime.strptime(
+ cleaned, "%d %B %Y"
+ ).strftime(date_format)
data["bins"].append(
{
"type": bin_type,
"collectionDate": collection_date,
}
)
+
+ if not data["bins"]:
+ raise ValueError(
+ "No bin collections parsed from div.collect – page layout may have changed."
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| collect_div = soup.find("div", {"class": "collect"}) | |
| if collect_div is None: | |
| return data | |
| # Parse the correct lines (find them, remove the ordinal indicator and make them the correct format date) and | |
| # then add them to the dictionary | |
| rubbish_day = datetime.strptime( | |
| remove_ordinal_indicator_from_date_string( | |
| page_text[2].find_next("strong").text | |
| ), | |
| "%d %B %Y", | |
| ).strftime(date_format) | |
| dict_data = { | |
| "type": "Rubbish", | |
| "collectionDate": rubbish_day, | |
| } | |
| data["bins"].append(dict_data) | |
| recycling_day = datetime.strptime( | |
| remove_ordinal_indicator_from_date_string( | |
| page_text[4].find_next("strong").text | |
| ), | |
| "%d %B %Y", | |
| ).strftime(date_format) | |
| dict_data = { | |
| "type": "Recycling", | |
| "collectionDate": recycling_day, | |
| } | |
| data["bins"].append(dict_data) | |
| date_pattern = re.compile( | |
| r"(\d{1,2}(?:st|nd|rd|th)?\s+[A-Za-z]+\s+\d{4})" | |
| ) | |
| if len(page_text) > 5: | |
| garden_day = datetime.strptime( | |
| remove_ordinal_indicator_from_date_string( | |
| page_text[6].find_next("strong").text | |
| ), | |
| "%d %B %Y", | |
| ).strftime(date_format) | |
| dict_data = { | |
| "type": "Garden", | |
| "collectionDate": garden_day, | |
| } | |
| data["bins"].append(dict_data) | |
| for p in collect_div.find_all("p"): | |
| strong = p.find("strong") | |
| if not strong: | |
| continue | |
| label = p.get_text(" ", strip=True).lower() | |
| if "rubbish" in label: | |
| bin_type = "Rubbish" | |
| elif "recycling" in label: | |
| bin_type = "Recycling" | |
| elif "garden" in label: | |
| bin_type = "Garden" | |
| else: | |
| continue | |
| match = date_pattern.search(strong.get_text(" ", strip=True)) | |
| if not match: | |
| continue | |
| cleaned = remove_ordinal_indicator_from_date_string(match.group(1)) | |
| try: | |
| collection_date = datetime.strptime( | |
| cleaned, "%d %B %Y" | |
| ).strftime(date_format) | |
| except ValueError: | |
| continue | |
| data["bins"].append( | |
| { | |
| "type": bin_type, | |
| "collectionDate": collection_date, | |
| } | |
| ) | |
| collect_div = soup.find("div", {"class": "collect"}) | |
| if collect_div is None: | |
| raise ValueError( | |
| "Could not locate <div class='collect'> – page layout may have changed." | |
| ) | |
| date_pattern = re.compile( | |
| r"(\d{1,2}(?:st|nd|rd|th)?\s+[A-Za-z]+\s+\d{4})" | |
| ) | |
| for p in collect_div.find_all("p"): | |
| strong = p.find("strong") | |
| if not strong: | |
| continue | |
| label = p.get_text(" ", strip=True).lower() | |
| if "rubbish" in label: | |
| bin_type = "Rubbish" | |
| elif "recycling" in label: | |
| bin_type = "Recycling" | |
| elif "garden" in label: | |
| bin_type = "Garden" | |
| else: | |
| continue | |
| match = date_pattern.search(strong.get_text(" ", strip=True)) | |
| if not match: | |
| continue | |
| cleaned = remove_ordinal_indicator_from_date_string(match.group(1)) | |
| collection_date = datetime.strptime( | |
| cleaned, "%d %B %Y" | |
| ).strftime(date_format) | |
| data["bins"].append( | |
| { | |
| "type": bin_type, | |
| "collectionDate": collection_date, | |
| } | |
| ) | |
| if not data["bins"]: | |
| raise ValueError( | |
| "No bin collections parsed from div.collect – page layout may have changed." | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py` around
lines 35 - 71, The parser currently returns an empty data["bins"] silently when
parsing fails; update LewesDistrictCouncil.py so that if the collect_div
variable is None you raise an informative exception (e.g.,
RuntimeError("div.collect not found")), and after the parsing loop if
data["bins"] is still empty raise another informative exception (e.g.,
RuntimeError("no collection entries parsed from div.collect")). Also stop
swallowing date parse errors: replace the except ValueError block around
datetime.strptime(...) so it re-raises (or raises a new RuntimeError) including
the offending cleaned string and context (mentioning
remove_ordinal_indicator_from_date_string and the date_pattern) instead of
continue, so format changes are surfaced immediately.
|
Included in May 2026 Release PR #1992. Closing. |
The
environmentfirst.co.ukresults page (used directly byEnvironmentFirstand via the same parser byEastbourneBoroughCouncil+LewesDistrictCouncil) was restructured in April 2026:<strong>inside the same<p>that carries the label. The previous separator paragraphs are gone, so the fixed indicespage_text[2]/[4]/[6]no longer line up — they're now[1]/[2]/[3].Tuesday 28th April 2026no longer match'%d %B %Y'and the parser raisesValueError: time data 'Tuesday 28 April 2026' does not match format '%d %B %Y'.Rewritten to identify each row by the surrounding label (rubbish / recycling / garden) instead of a fixed paragraph index, and to extract the date with a regex (
\d{1,2}(?:st|nd|rd|th)?\s+\w+\s+\d{4}) so the weekday prefix is harmlessly skipped.Verified against UPRN
100060011258(Eastbourne) and100061930155(Lewes) — both return their full schedules including the garden waste line.Summary by CodeRabbit
Bug Fixes