Skip to content

Support Lancaster (UK) food waste collection#1895

Closed
WillFantom wants to merge 2 commits into
robbrad:masterfrom
WillFantom:master
Closed

Support Lancaster (UK) food waste collection#1895
WillFantom wants to merge 2 commits into
robbrad:masterfrom
WillFantom:master

Conversation

@WillFantom
Copy link
Copy Markdown
Contributor

@WillFantom WillFantom commented Mar 23, 2026

Recently Lancaster City Council (UK) started doing food waste collection :)

Now they have added it to the web page thing. They also decided to mess up the text that is given for other recycling. This adds support for the food waste collection and changes the logic to trim the text for the collection type to bypass some of the issues the new formatting gives.

This also updates the logic to better match that found here.

Summary by CodeRabbit

  • Bug Fixes
    • More reliable identification and normalization of bin collection types for Lancaster City Council.
    • Trimming of extraneous suffixes from service names for cleaner labels.
    • Stricter detection of missing date/type entries to avoid silent omissions.
    • Improved error handling for unexpected date formats with clearer failure signals.
    • Bin collections now consistently sorted chronologically by collection date.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Added module-level BIN_TYPES and SUFFIXES. parse_data now determines each bin's collection_type by prefix matching or suffix removal, raises on missing date/type elements and on unexpected date formats (ValueError includes context), and sorts data["bins"] by parsed collectionDate.

Changes

Cohort / File(s) Summary
Lancaster City Council Parsing Updates
uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py
Added BIN_TYPES and SUFFIXES constants. parse_data now derives collection_type via prefix matching with fallback suffix removal, raises when date/type <li> elements are missing, narrows parsing error handling to raise ValueError with contextual text, and sorts data["bins"] by collectionDate.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • dp247

Poem

🐇 I hopped through lines at break of dawn,

Found types in prefixes, trimmed the lawn,
Dates all ordered, no more stray,
Parsers tidy, bins at play,
A little hop — the job is done ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support Lancaster (UK) food waste collection' is partially related to the changeset but does not capture the main objective: it adds support for food waste collection AND refactors the parsing logic to handle changed formatting on council web pages.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (f5adb70) to head (6d5d7d0).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1895   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py (1)

74-79: Raise a parser-specific error here.

Exception makes expected markup drift look the same as any unrelated runtime failure. Using a specific parse error such as ValueError gives callers and tests a much clearer contract.

Minimal cleanup
-            raise Exception("Could not find scheduled collections section on the page")
+            raise ValueError("Could not find scheduled collections section on the page")

-            raise Exception("No collection services found")
+            raise ValueError("No collection services found")

Based on learnings, in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors. This ensures format changes are detected early. Use clear exception types, document error causes, and add tests to verify that invalid inputs raise as expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py` around
lines 74 - 79, Replace the generic Exception raises in LancasterCityCouncil.py
parsing logic with a parser-specific error to make failures explicit: change the
two raise Exception(...) occurrences triggered when services is None and when
services_sub is empty to a more specific type (e.g., ValueError or a custom
ParseError class) and update any tests or callers that expect these failures to
catch the new error; locate the checks around the variables services and
services_sub inside the LancasterCityCouncil parsing function and replace the
exception type and message accordingly so callers can distinguish parse errors
from unrelated runtime exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py`:
- Around line 113-115: The new sort uses date_format via a star import, causing
Ruff F405; replace the wildcard import with an explicit import of date_format
from the common module and ensure datetime is imported explicitly too if not
already; update the import statement to explicitly import date_format (and
datetime if missing) so the lambda using
datetime.strptime(x.get("collectionDate"), date_format) no longer relies on
import *.
- Around line 83-111: The parser currently swallows malformed rows by skipping
when date_text is blank or when a ValueError occurs; update the parsing logic in
the LancasterCityCouncil parsing method so that instead of continuing silently
you raise a clear exception (e.g., ValueError or a custom ParseError) with
contextual details (include the offending date_text, index i, and raw
services_sub snippet) whenever date_text is missing, datetime.strptime fails, or
expected sibling elements (type_text) are absent; ensure the raised message
references the symbols BIN_TYPES, SUFFIXES, and date_format and that
callers/tests expect and assert these failures rather than allowing partial
schedules to be returned.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py`:
- Around line 74-79: Replace the generic Exception raises in
LancasterCityCouncil.py parsing logic with a parser-specific error to make
failures explicit: change the two raise Exception(...) occurrences triggered
when services is None and when services_sub is empty to a more specific type
(e.g., ValueError or a custom ParseError class) and update any tests or callers
that expect these failures to catch the new error; locate the checks around the
variables services and services_sub inside the LancasterCityCouncil parsing
function and replace the exception type and message accordingly so callers can
distinguish parse errors from unrelated runtime exceptions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2a00efc-5947-417e-b636-3cc26df81693

📥 Commits

Reviewing files that changed from the base of the PR and between f5adb70 and 5db65ef.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py

Comment thread uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py Outdated
Recently Lancaster City Council (UK) started doing food waste
collection :) Now they have added it to the web page thing. They also
decided to mess up the text that is given for other recycling. This adds
support for the food waste collection and changes the logic to trim the
text for the collection type to bypass some of the issues the new
formatting gives.

This also updates the logic to better match that found in the similar
project “mampfes/hacs_waste_collection_schedule”.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py (2)

6-6: ⚠️ Potential issue | 🟠 Major

Replace wildcard import to resolve Ruff F405 for date_format.

Line 6 still uses import *, while date_format is referenced at Line 106 and Line 114; Ruff correctly flags this as F405.

Minimal fix
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import date_format
#!/bin/bash
# Verify wildcard import and date_format references in this file.
rg -n 'from uk_bin_collection\.uk_bin_collection\.common import \*|date_format' \
  uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py

Expected result after fix: no wildcard import line; date_format remains explicitly referenced.

Also applies to: 106-106, 114-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py` at line
6, The file currently uses a wildcard import "from
uk_bin_collection.uk_bin_collection.common import *" which triggers Ruff F405
for date_format; replace the wildcard import with an explicit import of the
symbols actually used (at minimum import date_format) from
uk_bin_collection.uk_bin_collection.common so LancasterCityCouncil (the
class/functions referencing date_format at the usages) can reference date_format
directly; scan the file for any other names coming from common and add them to
the explicit import list to avoid future F405 errors.

83-111: ⚠️ Potential issue | 🟠 Major

Stop silently dropping malformed schedule rows.

Line 86 and Line 109-Line 111 still allow malformed rows to be skipped, which can return partial schedules instead of surfacing a parser break.

Proposed fix
-                date_text = (
-                    services_sub[i + 1].text.strip() if services_sub[i + 1] else None
-                )
-                if date_text:
-                    try:
-                        dt = datetime.strptime(date_text, "%d/%m/%Y").date()
-                        type_text = services_sub[i + 2].text.strip()
+                try:
+                    date_text = services_sub[i + 1].text.strip()
+                    type_text = services_sub[i + 2].text.strip()
+                    if not date_text or not type_text:
+                        raise ValueError(
+                            f"Incomplete Lancaster row at index {i}: "
+                            f"date_text={date_text!r}, type_text={type_text!r}"
+                        )
+                    dt = datetime.strptime(date_text, "%d/%m/%Y").date()
                         collection_type = next(
                             (
                                 bin_type
                                 for bin_type in BIN_TYPES
                                 if type_text.startswith(bin_type)
                             ),
                             None,
                         )
                         if collection_type is None:
                             collection_type = type_text
                             for suffix in SUFFIXES:
                                 collection_type = collection_type.removesuffix(suffix)
-                        if collection_type is not None:
-                            data["bins"].append(
-                                {
-                                    "type": collection_type,
-                                    "collectionDate": dt.strftime(date_format),
-                                }
-                            )
-                    except ValueError:
-                        # Skip invalid date or missing elements
-                        continue
+                    if not collection_type:
+                        raise ValueError(
+                            f"Unresolved Lancaster collection type at index {i}: "
+                            f"type_text={type_text!r}, BIN_TYPES={BIN_TYPES}, SUFFIXES={SUFFIXES}"
+                        )
+                    data["bins"].append(
+                        {
+                            "type": collection_type,
+                            "collectionDate": dt.strftime(date_format),
+                        }
+                    )
+                except (IndexError, AttributeError, ValueError) as exc:
+                    snippet = [li.get_text(strip=True) for li in services_sub[i : i + 3]]
+                    raise ValueError(
+                        f"Unexpected Lancaster scheduled row at index {i} with "
+                        f"date_text={locals().get('date_text', None)!r}; "
+                        f"expected format date={date_format}; row={snippet}"
+                    ) from exc

Based on learnings, in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py` around
lines 83 - 111, The parser currently swallows malformed rows in
LancasterCityCouncil.py by continuing on ValueError or missing elements;
instead, in the block handling services_sub (look for services_sub, BIN_TYPES,
SUFFIXES, data["bins"], and date_format) ensure you validate presence of
services_sub[i+1] and services_sub[i+2] and raise a descriptive exception if
they are missing, and when datetime.strptime(date_text, "%d/%m/%Y") fails raise
a ValueError with contextual info (e.g., include date_text and surrounding
type_text) rather than continue, so callers get explicit failures for unexpected
schedule formats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py`:
- Line 6: The file currently uses a wildcard import "from
uk_bin_collection.uk_bin_collection.common import *" which triggers Ruff F405
for date_format; replace the wildcard import with an explicit import of the
symbols actually used (at minimum import date_format) from
uk_bin_collection.uk_bin_collection.common so LancasterCityCouncil (the
class/functions referencing date_format at the usages) can reference date_format
directly; scan the file for any other names coming from common and add them to
the explicit import list to avoid future F405 errors.
- Around line 83-111: The parser currently swallows malformed rows in
LancasterCityCouncil.py by continuing on ValueError or missing elements;
instead, in the block handling services_sub (look for services_sub, BIN_TYPES,
SUFFIXES, data["bins"], and date_format) ensure you validate presence of
services_sub[i+1] and services_sub[i+2] and raise a descriptive exception if
they are missing, and when datetime.strptime(date_text, "%d/%m/%Y") fails raise
a ValueError with contextual info (e.g., include date_text and surrounding
type_text) rather than continue, so callers get explicit failures for unexpected
schedule formats.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3da11b44-af3c-41b5-b8b0-786f579d592d

📥 Commits

Reviewing files that changed from the base of the PR and between 5db65ef and 01e187c.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py

If date parsing errors occur, expose the error correctly. Also, only
import required parts of the common lib.
@WillFantom
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py (2)

90-119: ⚠️ Potential issue | 🟠 Major

Reject blank collection types before appending.

Line 113 only guards against None, but an empty type_text still reaches the append path as "". That hides a malformed row and produces a bin with no type.

Minimal fix
                 if collection_type is None:
                     collection_type = type_text
                     for suffix in SUFFIXES:
                         collection_type = collection_type.removesuffix(suffix)
                 if collection_type is not None:
+                    if not collection_type:
+                        raise ValueError(
+                            "Unexpected blank Lancaster collection type after normalization: "
+                            f"index={i}, raw_type={type_text!r}"
+                        )
                     data["bins"].append(
                         {
                             "type": collection_type,
                             "collectionDate": dt.strftime(date_format),
                         }
Based on learnings, council parsers in this repo should fail explicitly on unexpected formats rather than returning partial data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py` around
lines 90 - 119, The code currently only checks for collection_type is None
before appending but allows empty strings (from type_item.text) to be treated as
valid, producing bins with no type; update the logic in the LancasterCityCouncil
parser (use variables date_item, type_item, BIN_TYPES, SUFFIXES,
collection_type) to explicitly reject empty or whitespace-only collection types
after trimming (e.g., if not collection_type: raise ValueError with context
including date_text and type_text) instead of appending, so malformed rows cause
a clear failure rather than producing a blank "type" in data["bins"].

81-88: ⚠️ Potential issue | 🟠 Major

Raise on truncated service rows.

Line 82 still silently drops a trailing 1-2 <li> group. Because that guard skips the body, the missing-element checks at Lines 85-88 never run, so a page shape change can still return a partial schedule instead of failing.

Minimal fix
         for i in range(0, len(services_sub), 3):
+            if i + 2 >= len(services_sub):
+                raise ValueError(
+                    "Incomplete Lancaster scheduled row: "
+                    f"index={i}, items="
+                    f"{[item.get_text(strip=True) for item in services_sub[i:]]}"
+                )
             if i + 2 < len(services_sub):
                 date_item = services_sub[i + 1]
                 type_item = services_sub[i + 2]
                 if date_item is None:
                     raise Exception(f"Missing collection date element at index {i + 1}")
                 if type_item is None:
                     raise Exception(f"Missing collection type element at index {i + 2}")
Based on learnings, council parsers in this repo should fail explicitly on unexpected formats rather than returning partial data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py` around
lines 81 - 88, The loop in LancasterCityCouncil (iterating services_sub)
currently uses "if i + 2 < len(services_sub)" which silently skips a trailing
1–2 <li> items and prevents the subsequent missing-element checks from running;
change the guard so that instead of skipping incomplete groups you detect a
truncated tail and raise an explicit error (e.g., when i+1 or i+2 are out of
range or when date_item/type_item is None) so that the parser fails fast on
unexpected page shape; specifically update the loop that processes services_sub
in LancasterCityCouncil (the block that sets date_item and type_item) to throw a
descriptive Exception when the group is incomplete rather than skipping it so
the missing-element checks always run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py`:
- Around line 90-119: The code currently only checks for collection_type is None
before appending but allows empty strings (from type_item.text) to be treated as
valid, producing bins with no type; update the logic in the LancasterCityCouncil
parser (use variables date_item, type_item, BIN_TYPES, SUFFIXES,
collection_type) to explicitly reject empty or whitespace-only collection types
after trimming (e.g., if not collection_type: raise ValueError with context
including date_text and type_text) instead of appending, so malformed rows cause
a clear failure rather than producing a blank "type" in data["bins"].
- Around line 81-88: The loop in LancasterCityCouncil (iterating services_sub)
currently uses "if i + 2 < len(services_sub)" which silently skips a trailing
1–2 <li> items and prevents the subsequent missing-element checks from running;
change the guard so that instead of skipping incomplete groups you detect a
truncated tail and raise an explicit error (e.g., when i+1 or i+2 are out of
range or when date_item/type_item is None) so that the parser fails fast on
unexpected page shape; specifically update the loop that processes services_sub
in LancasterCityCouncil (the block that sets date_item and type_item) to throw a
descriptive Exception when the group is incomplete rather than skipping it so
the missing-element checks always run.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f5b6279-ab32-4af2-98a1-1fd6e2504e9a

📥 Commits

Reviewing files that changed from the base of the PR and between 01e187c and 6d5d7d0.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/LancasterCityCouncil.py

@robbrad
Copy link
Copy Markdown
Owner

robbrad commented Mar 28, 2026

Merged into the combined March 2026 release PR #1898. Closing to reduce release noise. Thank you for the contribution!

@robbrad robbrad closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants