Fix HuntingdonDistrictCouncil scraper crash by daaaaan · Pull Request #1833 · robbrad/UKBinCollectionData

daaaaan · 2026-01-30T13:07:30Z

Summary

Fix AttributeError: 'NoneType' object has no attribute 'get_text' crash
Skip collection items without  date tags (e.g., "does not receive X collection" messages)
Improve bin type parsing using regex instead of word position indexing
Add support for food waste collection pattern
Fail fast with clear exceptions when page format changes (instead of silent failures)

Problem

The scraper crashes when the page includes "does not receive X collection" messages that lack  tags. This affects properties without certain waste services.

Changes

Crash Fix & Parsing

Check for  tag existence before calling .get_text()
Use regex to extract bin type reliably from standard format
Handle food waste's different text pattern: "Your next weekly food collection is on..."
Results now show proper types: "Domestic waste", "Dry recycling waste", "Garden waste", "Food waste"

Error Handling Improvements

Raise ValueError with raw text when bin type regex fails (instead of assigning "Unknown")
Raise ValueError with date text when datetime parsing fails (instead of silently skipping)
Validate UPRN/URL upfront and fail fast with clear message if neither provided
Error messages include debugging info (raw text, expected format) to aid troubleshooting

Testing

Tested with multiple UPRNs - output now returns correct bin types without crashing:

{
  "bins": [
    {"type": "Domestic waste", "collectionDate": "02/02/2026"},
    {"type": "Dry recycling waste", "collectionDate": "09/02/2026"},
    {"type": "Food waste", "collectionDate": "30/03/2026"}
  ]
}

Error handling verified:

Missing UPRN/URL: ValueError: Missing or invalid UPRN and no URL provided...
Format changes will raise clear exceptions with the problematic text

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- More reliable extraction of bin types and collection dates; skips incomplete or malformed entries to avoid incorrect displays.
- Improved detection of missing or changed page sections to prevent silent failures and surface clear error messages.
Improvements
- Stronger validation with explicit identifier (UPRN) support and URL fallback, plus more robust network and parsing error handling.
- Normalized bin type labels and consistent date formatting for clearer schedule display.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-30T13:07:51Z

📝 Walkthrough

Walkthrough

Reworks the Huntingdonshire scraper to fetch pages with requests (timeout/status handling), enforce UPRN or URL, parse collection dates from tags, extract/normalize bin types via regex, return a dict {"bins":[...]}, and raise ValueError on fetch or parse failures (adds re import).

Changes

Cohort / File(s) Summary

HuntingdonDistrictCouncil parser
uk_bin_collection/uk_bin_collection/councils/HuntingdonDistrictCouncil.py Replaces previous parsing with a robust flow: uses requests.get(..., timeout) and HTTP status checks; enforces UPRN or URL via check_uprn; prefers UPRN-based URLs with legacy fallback; validates presence of results container; iterates items, skips entries missing a  date, parses/normalizes bin type with regex (e.g., "X waste" or "Food waste") and parses date into collectionDate; raises ValueError on missing container, fetch failures, parsing errors; returns {"bins":[{type,collectionDate}, ...]}; adds import re and updates parse_data(self, page, **kwargs) -> dict signature.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

dp247

Poem

🐇 I nibble through regex threads tonight,
I chase each date from strong-tag light.
URLs and UPRNs align,
Bins and dates in tidy line.
Hooray — the parser hops just right!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name Status Explanation

Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Title check ✅ Passed The title 'Fix HuntingdonDistrictCouncil scraper crash' accurately summarizes the main objective of the PR—fixing a crash in the Huntingdon District Council scraper caused by missing tags.

Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests

Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

X

Mastodon

Reddit

LinkedIn

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Fix AttributeError when 'does not receive X collection' messages lack tags - Use HTTPS instead of HTTP for secure requests - Add timeout and error handling for HTTP requests - Add null check for page structure changes - Handle malformed dates gracefully - Improve bin type extraction using regex instead of word position indexing - Add support for food waste collection (different text pattern) - Add comprehensive docstrings Results now show proper types: "Domestic waste", "Dry recycling waste", "Garden waste", "Food waste" Fixes crash: 'NoneType' object has no attribute 'get_text'

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@uk_bin_collection/uk_bin_collection/councils/HuntingdonDistrictCouncil.py`:
- Around line 71-90: The parser currently falls back to "Unknown" for bin_type
and silently continues on ValueError when parsing collection_date; update the
logic in HuntingdonDistrictCouncil.py (the block using result, type_match,
bin_type, strong_tag, collection_date, date_format) to raise a clear exception
(e.g., ValueError or a custom ParseError) when the type regex fails or when
datetime.strptime fails instead of assigning "Unknown" or using continue, so
format changes surface immediately; include a descriptive message mentioning the
raw text or tag content to aid debugging.
- Around line 39-47: The current try block in HuntingdonDistrictCouncil (where
user_uprn = kwargs.get("uprn"), check_uprn(user_uprn), and url is built) can
silently continue when check_uprn does not raise; change it to explicitly
validate and raise if neither a valid UPRN nor a legacy URL is present: call
check_uprn and if it returns False (or if user_uprn is falsy) attempt to read
kwargs.get("url") into url_fallback, and if that is also falsy then raise
ValueError("Missing or invalid UPRN and no URL provided"); ensure the raised
error replaces the broad catch so the constructor/method in
HuntingdonDistrictCouncil fails fast on invalid input.

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/HuntingdonDistrictCouncil.py (1)
49-53: Replace wildcard import with explicit imports for better code clarity and explicit dependencies.

The code currently relies on from ... import * to access requests, check_uprn, and date_format. While this works because requests is imported in common.py, explicit imports make dependencies visible and are used consistently across other council modules in the codebase.
Suggested refactor
 import re
+import requests
 from datetime import datetime
 
 from bs4 import BeautifulSoup
 
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import check_uprn, date_format
 from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass

Replace silent failures with explicit exceptions: - Raise ValueError with raw text when bin type regex fails (instead of "Unknown") - Raise ValueError with date text when datetime parsing fails (instead of continue) - Validate UPRN/URL upfront and fail fast with clear message if neither provided These changes ensure page format changes surface immediately with debugging info. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… resolved - using PR #1833 comprehensive solution)

robbrad · 2026-02-01T17:54:33Z

This PR has been merged into the February 2026 consolidated release PR #1837.

Thank you for your contribution!

daaaaan force-pushed the fix/huntingdon-district-council-scraper branch from e1ea6f9 to e8b9814 Compare January 30, 2026 13:17

coderabbitai Bot reviewed Jan 30, 2026

View reviewed changes

Comment thread uk_bin_collection/uk_bin_collection/councils/HuntingdonDistrictCouncil.py Outdated

Comment thread uk_bin_collection/uk_bin_collection/councils/HuntingdonDistrictCouncil.py Outdated

robbrad added a commit that referenced this pull request Feb 1, 2026

Merge PR #1833: Fix HuntingdonDistrictCouncil scraper crash (conflict…

ad64d32

… resolved - using PR #1833 comprehensive solution)

robbrad mentioned this pull request Feb 1, 2026

February 2026 Release - Consolidated Council Fixes #1837

Merged

robbrad closed this Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HuntingdonDistrictCouncil scraper crash#1833

Fix HuntingdonDistrictCouncil scraper crash#1833
daaaaan wants to merge 2 commits into
robbrad:masterfrom
daaaaan:fix/huntingdon-district-council-scraper

daaaaan commented Jan 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

robbrad commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daaaaan commented Jan 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Crash Fix & Parsing

Error Handling Improvements

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robbrad commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daaaaan commented Jan 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 30, 2026 •

edited

Loading