fix: EnvironmentFirst/Eastbourne/Lewes - handle restructured page by InertiaUK · Pull Request #1979 · robbrad/UKBinCollectionData

InertiaUK · 2026-04-25T12:18:15Z

The environmentfirst.co.uk results page (used directly by EnvironmentFirst and via the same parser by EastbourneBoroughCouncil + LewesDistrictCouncil) was restructured in April 2026:

The bin dates are now wrapped in a  inside the same  that carries the label. The previous separator paragraphs are gone, so the fixed indices page_text[2] / [4] / [6] no longer line up — they're now [1]/[2]/[3].
The date string also gained a leading weekday name. Values like Tuesday 28th April 2026 no longer match '%d %B %Y' and the parser raises ValueError: time data 'Tuesday 28 April 2026' does not match format '%d %B %Y'.

Rewritten to identify each row by the surrounding label (rubbish / recycling / garden) instead of a fixed paragraph index, and to extract the date with a regex (\d{1,2}(?:st|nd|rd|th)?\s+\w+\s+\d{4}) so the weekday prefix is harmlessly skipped.

Verified against UPRN 100060011258 (Eastbourne) and 100061930155 (Lewes) — both return their full schedules including the garden waste line.

Summary by CodeRabbit

Bug Fixes

Improved bin collection date extraction for Eastbourne Borough Council, Environment First, and Lewes District Council with enhanced handling of HTML structure variations and date formats, including improved error recovery for edge cases.

The environmentfirst.co.uk results page (used directly by EnvironmentFirst, EastbourneBoroughCouncil and LewesDistrictCouncil) was restructured in April 2026: - The dates are now wrapped in a `` inside the same `` that carries the label, with no intervening separator paragraphs. The previous indices `page_text[2]` (rubbish), `[4]` (recycling), `[6]` (garden) no longer line up — they are now `[1]/[2]/[3]`. - The date string itself gained a leading weekday, so values like `Tuesday 28th April 2026` no longer match the existing `'%d %B %Y'` strptime format and fail with `ValueError: time data '...' does not match format`. Rewritten to identify each row by the surrounding label (rubbish/recycling/garden) rather than a fixed index, and to extract the date with a regex (`\d{1,2}(?:st|nd|rd|th)?\s+\w+\s+\d{4}`) so the weekday prefix is ignored. Verified against UPRN 100060011258 (Eastbourne) and 100061930155 (Lewes) — full schedules return.

coderabbitai · 2026-04-25T12:18:27Z

📝 Walkthrough

Walkthrough

Three council bin collection parsers were refactored from rigid index-based HTML extraction to flexible, content-driven parsing. They now dynamically iterate over paragraph elements within div.collect containers, determine bin types via keyword matching, and handle ordinal date suffixes with regex normalization.

Changes

Cohort / File(s)	Summary
Council Bin Collection Parser Refactoring `uk_bin_collection/councils/EastbourneBoroughCouncil.py`, `EnvironmentFirst.py`, `LewesDistrictCouncil.py`	Replaced fragile index-based extraction (`page_text[2]`, `[4]`, `[6]`) with robust iteration over `<p>` tags in `div.collect`. Bin types now detected via keyword matching (rubbish, recycling, garden), dates extracted using regex supporting ordinal suffixes, and normalization/parsing applied per entry with graceful handling of missing or invalid data.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

dp247

Poem

🐰 Three parsers hop with newfound grace,
No more index-based disgrace,
With regex and keywords in hand,
Our bin dates now parse oh so grand!
Ordinals normalized—what a sight,
Robust parsing done right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fixing parsers for three councils to handle a restructured results page.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py (1)
1-1: Recommend extracting the shared parser to avoid maintaining three copies.

The leading comment ("Legacy script. Copied to Lewes and Eastbourne.") and the byte-for-byte identical parse blocks across EnvironmentFirst.py, LewesDistrictCouncil.py and EastbourneBoroughCouncil.py mean this fix has had to be applied three times — and any future page change will need three identical edits. Consider hoisting the div.collect →  →  parsing into a single helper (e.g. in common.py or a new _environmentfirst.py module) and have all three council classes delegate to it. The PR description itself notes the page layout has shifted twice in twelve months, so this is a recurring cost.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py` at line 1,
The three council modules (EnvironmentFirst.py, LewesDistrictCouncil.py,
EastbourneBoroughCouncil.py) duplicate the same DOM parsing logic that walks
div.collect → → ; extract that block into a single helper function
(e.g. parse_environmentfirst_schedule) placed in a shared module (common.py or
_environmentfirst.py) and have each council class call that helper instead of
duplicating the parse code; update the council classes to import and delegate to
the new function (keep the same input/output contract so callers in
EnvironmentFirst, LewesDistrictCouncil, and EastbourneBoroughCouncil need only
replace the inline parse block with a single function call).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py`:
- Around line 35-71: The parser currently returns an empty data["bins"] silently
when parsing fails; update LewesDistrictCouncil.py so that if the collect_div
variable is None you raise an informative exception (e.g.,
RuntimeError("div.collect not found")), and after the parsing loop if
data["bins"] is still empty raise another informative exception (e.g.,
RuntimeError("no collection entries parsed from div.collect")). Also stop
swallowing date parse errors: replace the except ValueError block around
datetime.strptime(...) so it re-raises (or raises a new RuntimeError) including
the offending cleaned string and context (mentioning
remove_ordinal_indicator_from_date_string and the date_pattern) instead of
continue, so format changes are surfaced immediately.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py`:
- Line 1: The three council modules (EnvironmentFirst.py,
LewesDistrictCouncil.py, EastbourneBoroughCouncil.py) duplicate the same DOM
parsing logic that walks div.collect → <p> → <strong>; extract that block into a
single helper function (e.g. parse_environmentfirst_schedule) placed in a shared
module (common.py or _environmentfirst.py) and have each council class call that
helper instead of duplicating the parse code; update the council classes to
import and delegate to the new function (keep the same input/output contract so
callers in EnvironmentFirst, LewesDistrictCouncil, and EastbourneBoroughCouncil
need only replace the inline parse block with a single function call).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9dee2f8-c62f-452f-9ec8-5310a89c84c6

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and c3182a6.

📒 Files selected for processing (3)

uk_bin_collection/uk_bin_collection/councils/EastbourneBoroughCouncil.py
uk_bin_collection/uk_bin_collection/councils/EnvironmentFirst.py
uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py

coderabbitai · 2026-04-25T12:22:11Z

+        collect_div = soup.find("div", {"class": "collect"})
+        if collect_div is None:
+            return data

-        # Parse the correct lines (find them, remove the ordinal indicator and make them the correct format date) and
-        # then add them to the dictionary
-        rubbish_day = datetime.strptime(
-            remove_ordinal_indicator_from_date_string(
-                page_text[2].find_next("strong").text
-            ),
-            "%d %B %Y",
-        ).strftime(date_format)
-        dict_data = {
-            "type": "Rubbish",
-            "collectionDate": rubbish_day,
-        }
-        data["bins"].append(dict_data)
-        recycling_day = datetime.strptime(
-            remove_ordinal_indicator_from_date_string(
-                page_text[4].find_next("strong").text
-            ),
-            "%d %B %Y",
-        ).strftime(date_format)
-        dict_data = {
-            "type": "Recycling",
-            "collectionDate": recycling_day,
-        }
-        data["bins"].append(dict_data)
+        date_pattern = re.compile(
+            r"(\d{1,2}(?:st|nd|rd|th)?\s+[A-Za-z]+\s+\d{4})"
+        )

-        if len(page_text) > 5:
-            garden_day = datetime.strptime(
-                remove_ordinal_indicator_from_date_string(
-                    page_text[6].find_next("strong").text
-                ),
-                "%d %B %Y",
-            ).strftime(date_format)
-            dict_data = {
-                "type": "Garden",
-                "collectionDate": garden_day,
-            }
-            data["bins"].append(dict_data)
+        for p in collect_div.find_all("p"):
+            strong = p.find("strong")
+            if not strong:
+                continue
+            label = p.get_text(" ", strip=True).lower()
+            if "rubbish" in label:
+                bin_type = "Rubbish"
+            elif "recycling" in label:
+                bin_type = "Recycling"
+            elif "garden" in label:
+                bin_type = "Garden"
+            else:
+                continue
+            match = date_pattern.search(strong.get_text(" ", strip=True))
+            if not match:
+                continue
+            cleaned = remove_ordinal_indicator_from_date_string(match.group(1))
+            try:
+                collection_date = datetime.strptime(
+                    cleaned, "%d %B %Y"
+                ).strftime(date_format)
+            except ValueError:
+                continue
+            data["bins"].append(
+                {
+                    "type": bin_type,
+                    "collectionDate": collection_date,
+                }
+            )


⚠️ Potential issue | 🟠 Major

Avoid silently returning empty schedules when parsing fails.

If the site changes shape again, this implementation degrades to an empty bins list without raising — users will see "no upcoming collections" instead of a clear error. Consider raising when div.collect is found but no entries could be parsed (and arguably when div.collect itself is missing). The per- skip on missing /non-matching keyword is fine, but the ValueError on date parsing is exactly the kind of format-shift signal that should not be swallowed silently.

Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors so that format changes are detected early.

🛡️ Suggested minimal hardening

data = {"bins": []} collect_div = soup.find("div", {"class": "collect"}) if collect_div is None: - return data + raise ValueError( + "Could not locate <div class='collect'> – page layout may have changed." + ) @@ cleaned = remove_ordinal_indicator_from_date_string(match.group(1)) - try: - collection_date = datetime.strptime( - cleaned, "%d %B %Y" - ).strftime(date_format) - except ValueError: - continue + collection_date = datetime.strptime( + cleaned, "%d %B %Y" + ).strftime(date_format) data["bins"].append( { "type": bin_type, "collectionDate": collection_date, } ) + + if not data["bins"]: + raise ValueError( + "No bin collections parsed from div.collect – page layout may have changed." + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

collect_div = soup.find("div", {"class": "collect"})

if collect_div is None:

return data

# Parse the correct lines (find them, remove the ordinal indicator and make them the correct format date) and

# then add them to the dictionary

rubbish_day = datetime.strptime(

remove_ordinal_indicator_from_date_string(

page_text[2].find_next("strong").text

),

"%d %B %Y",

).strftime(date_format)

dict_data = {

"type": "Rubbish",

"collectionDate": rubbish_day,

}

data["bins"].append(dict_data)

recycling_day = datetime.strptime(

remove_ordinal_indicator_from_date_string(

page_text[4].find_next("strong").text

),

"%d %B %Y",

).strftime(date_format)

dict_data = {

"type": "Recycling",

"collectionDate": recycling_day,

}

data["bins"].append(dict_data)

date_pattern = re.compile(

r"(\d{1,2}(?:st|nd|rd|th)?\s+[A-Za-z]+\s+\d{4})"

)

if len(page_text) > 5:

garden_day = datetime.strptime(

remove_ordinal_indicator_from_date_string(

page_text[6].find_next("strong").text

),

"%d %B %Y",

).strftime(date_format)

dict_data = {

"type": "Garden",

"collectionDate": garden_day,

}

data["bins"].append(dict_data)

for p in collect_div.find_all("p"):

strong = p.find("strong")

if not strong:

continue

label = p.get_text(" ", strip=True).lower()

if "rubbish" in label:

bin_type = "Rubbish"

elif "recycling" in label:

bin_type = "Recycling"

elif "garden" in label:

bin_type = "Garden"

else:

continue

match = date_pattern.search(strong.get_text(" ", strip=True))

if not match:

continue

cleaned = remove_ordinal_indicator_from_date_string(match.group(1))

try:

collection_date = datetime.strptime(

cleaned, "%d %B %Y"

).strftime(date_format)

except ValueError:

continue

data["bins"].append(

{

"type": bin_type,

"collectionDate": collection_date,

}

)

collect_div = soup.find("div", {"class": "collect"})

if collect_div is None:

raise ValueError(

"Could not locate <div class='collect'> – page layout may have changed."

)

date_pattern = re.compile(

r"(\d{1,2}(?:st|nd|rd|th)?\s+[A-Za-z]+\s+\d{4})"

)

for p in collect_div.find_all("p"):

strong = p.find("strong")

if not strong:

continue

label = p.get_text(" ", strip=True).lower()

if "rubbish" in label:

bin_type = "Rubbish"

elif "recycling" in label:

bin_type = "Recycling"

elif "garden" in label:

bin_type = "Garden"

else:

continue

match = date_pattern.search(strong.get_text(" ", strip=True))

if not match:

continue

cleaned = remove_ordinal_indicator_from_date_string(match.group(1))

collection_date = datetime.strptime(

cleaned, "%d %B %Y"

).strftime(date_format)

data["bins"].append(

{

"type": bin_type,

"collectionDate": collection_date,

}

)

if not data["bins"]:

raise ValueError(

"No bin collections parsed from div.collect – page layout may have changed."

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/LewesDistrictCouncil.py` around lines 35 - 71, The parser currently returns an empty data["bins"] silently when parsing fails; update LewesDistrictCouncil.py so that if the collect_div variable is None you raise an informative exception (e.g., RuntimeError("div.collect not found")), and after the parsing loop if data["bins"] is still empty raise another informative exception (e.g., RuntimeError("no collection entries parsed from div.collect")). Also stop swallowing date parse errors: replace the except ValueError block around datetime.strptime(...) so it re-raises (or raises a new RuntimeError) including the offending cleaned string and context (mentioning remove_ordinal_indicator_from_date_string and the date_pattern) instead of continue, so format changes are surfaced immediately.

robbrad · 2026-05-01T20:22:18Z

Included in May 2026 Release PR #1992. Closing.

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

robbrad mentioned this pull request May 1, 2026

May 2026 Release #1992

Merged

robbrad closed this May 1, 2026

robbrad mentioned this pull request May 2, 2026

May 2026 Release #1996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: EnvironmentFirst/Eastbourne/Lewes - handle restructured page#1979

fix: EnvironmentFirst/Eastbourne/Lewes - handle restructured page#1979
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/environmentfirst-page-restructure

InertiaUK commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

InertiaUK commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

InertiaUK commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading