Fix for Bromley council, closes #1829#1832
Conversation
📝 WalkthroughWalkthroughThis change updates HTML selectors and extraction logic in the BromleyBoroughCouncil scraper. Selectors were modified to target a waste-service-grid container structure, service title extraction now uses h3 elements with specific classes, and next collection data retrieval was refactored. An unused import was removed. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.py`:
- Around line 43-46: The parser currently calls soup.find_all("div",
class_="waste-service-grid") and assigns it to waste_services but silently
continues if the list is empty; change this to fail fast by checking if
waste_services is empty right after that call and raise a descriptive exception
(e.g., ValueError or a custom ParseError) that includes context such as "no
waste-service-grid elements found" and the council identifier; update the
function containing waste_services (the parsing routine in
BromleyBoroughCouncil.py) to raise this error so callers can detect site/format
changes rather than receiving an empty result.
- Around line 49-55: The service element lookups in BromleyBoroughCouncil.py can
raise AttributeError and are brittle: replace service.find("h3",
class_="govuk-heading-m waste-service-name").get_text(...) with a CSS selector
via service.select_one("h3.govuk-heading-m.waste-service-name") and check the
returned node is not None before calling get_text; similarly ensure
next_collection = service.find("dt", string="Next collection") is validated (or
use a selector) and raise or log a clear, contextual error if either element is
missing so callers get informative failure messages; update any surrounding code
that assumes these variables (service_title, next_collection) to handle the None
case consistently.
| # Find all elements with class 'waste-service-grid' | ||
| waste_services = soup.find_all( | ||
| "h3", class_="govuk-heading-m waste-service-name" | ||
| "div", class_="waste-service-grid" | ||
| ) |
There was a problem hiding this comment.
Fail fast when no services are found.
If the selector yields zero services, the parser silently returns an empty list and masks site changes. Raise a clear exception when waste_services is empty.
🔧 Proposed fix
- waste_services = soup.find_all(
- "div", class_="waste-service-grid"
- )
+ waste_services = soup.find_all(
+ "div", class_="waste-service-grid"
+ )
+ if not waste_services:
+ raise ValueError(
+ "BromleyBoroughCouncil: no waste-service-grid elements found"
+ )🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.py` around
lines 43 - 46, The parser currently calls soup.find_all("div",
class_="waste-service-grid") and assigns it to waste_services but silently
continues if the list is empty; change this to fail fast by checking if
waste_services is empty right after that call and raise a descriptive exception
(e.g., ValueError or a custom ParseError) that includes context such as "no
waste-service-grid elements found" and the council identifier; update the
function containing waste_services (the parsing routine in
BromleyBoroughCouncil.py) to raise this error so callers can detect site/format
changes rather than receiving an empty result.
| service_title = service.find( | ||
| "h3", class_="govuk-heading-m waste-service-name" | ||
| ).get_text(strip=True) | ||
|
|
||
| next_collection = service.find( | ||
| "dt", string="Next collection" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.py | sed -n '40,65p'Repository: robbrad/UKBinCollectionData
Length of output: 1303
🏁 Script executed:
# Check if the file exists and get its full line count
wc -l uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.pyRepository: robbrad/UKBinCollectionData
Length of output: 144
🏁 Script executed:
# Look for the complete function context around lines 49-55
cat -n uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.py | sed -n '30,80p'Repository: robbrad/UKBinCollectionData
Length of output: 2463
🌐 Web query:
BeautifulSoup class_ parameter multiple classes matching behavior
💡 Result:
In BeautifulSoup (bs4), the class_ filter has three different behaviors depending on what you pass:
- Single class name (matches ANY class on the element)
soup.find_all("div", class_="btn")Matches elements whose class list contains "btn" (even if they also have other classes). [1]
- List/iterable of class names (OR / union match)
soup.find_all("div", class_=["btn", "primary"])Matches elements that have any one of the listed classes (btn or primary, or both). [2]
- Space-separated string (exact attribute string match; order-sensitive)
soup.find_all("div", class_="btn primary")Matches only if the class attribute string is exactly "btn primary" (so "primary btn" will not match). [1]
To require multiple classes (AND / intersection)
Use a CSS selector:
soup.select("div.btn.primary")This matches elements that have both classes, in any order. [1]
Gotcha: attrs={"class": lambda ...} may not see the whole class list
When you pass a function for attrs["class"], BeautifulSoup may call it on each individual class string, not the full list—so exclusion/combination logic can behave unexpectedly. [3]
Sources:
[1] Beautiful Soup documentation (class matching + exact-string + selector recommendation) (crummy.com)
[2] Stack Overflow discussion of list meaning OR/union for class_ (stackoverflow.com)
[3] Beautiful Soup bug report explaining callable applied per-class string (bugs.launchpad.net)
Citations:
- 1: https://www.crummy.com/software/BeautifulSoup/bs4/doc/?utm_source=openai
- 2: https://stackoverflow.com/questions/5041008/how-to-find-elements-by-class?utm_source=openai
- 3: https://bugs.launchpad.net/bugs/2075974?utm_source=openai
🏁 Script executed:
# Check the HuntingdonDistrictCouncil.py file mentioned in learnings (lines 47-54)
cat -n uk_bin_collection/uk_bin_collection/councils/HuntingdonDistrictCouncil.py | sed -n '40,60p'Repository: robbrad/UKBinCollectionData
Length of output: 972
🏁 Script executed:
# Search for similar error handling patterns in other council files
rg "\.find\(" uk_bin_collection/uk_bin_collection/councils/ -A 2 -B 1 | head -50Repository: robbrad/UKBinCollectionData
Length of output: 4597
🏁 Script executed:
# Check if there are any try-except patterns around find() calls in council files
rg "try:" uk_bin_collection/uk_bin_collection/councils/ -A 5 | head -60Repository: robbrad/UKBinCollectionData
Length of output: 6326
🏁 Script executed:
# Check if there are any HTML examples or test files that show the structure
fd -e html -e json uk_bin_collection/ | head -20Repository: robbrad/UKBinCollectionData
Length of output: 448
🏁 Script executed:
# Look for any tests or documentation about BromleyBoroughCouncil
rg "BromleyBoroughCouncil" --type py -lRepository: robbrad/UKBinCollectionData
Length of output: 53
🏁 Script executed:
# Check the broader context - is there a PR or issue reference?
git log --oneline --all -- uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.py | head -10Repository: robbrad/UKBinCollectionData
Length of output: 605
Add explicit error handling and use CSS selectors for robust element matching.
Line 49–51 lacks a None check and will raise an opaque AttributeError if the element isn't found. Additionally, the class_ parameter with a space-separated string is order-sensitive, so using a CSS selector with select_one() is more robust. The code also has inconsistent error handling: line 57 checks if next_collection: but line 49 does not. Add explicit checks with informative errors at all element lookups.
🔧 Proposed fix
- service_title = service.find(
- "h3", class_="govuk-heading-m waste-service-name"
- ).get_text(strip=True)
+ title_el = service.select_one(
+ "h3.govuk-heading-m.waste-service-name"
+ )
+ if not title_el:
+ raise ValueError(
+ "BromleyBoroughCouncil: service title not found"
+ )
+ service_title = title_el.get_text(strip=True)
- next_collection = service.find(
- "dt", string="Next collection"
- )
+ next_collection = service.find(
+ "dt", string=lambda s: s and s.strip() == "Next collection"
+ )
+ if not next_collection:
+ raise ValueError(
+ "BromleyBoroughCouncil: 'Next collection' label not found"
+ )
- if next_collection:
- next_collection_date = next_collection.find_next_sibling().get_text(
- strip=True
- )
+ next_collection_value = next_collection.find_next_sibling("dd")
+ if not next_collection_value:
+ raise ValueError(
+ "BromleyBoroughCouncil: 'Next collection' value not found"
+ )
+ next_collection_date = next_collection_value.get_text(strip=True)🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/BromleyBoroughCouncil.py` around
lines 49 - 55, The service element lookups in BromleyBoroughCouncil.py can raise
AttributeError and are brittle: replace service.find("h3",
class_="govuk-heading-m waste-service-name").get_text(...) with a CSS selector
via service.select_one("h3.govuk-heading-m.waste-service-name") and check the
returned node is not None before calling get_text; similarly ensure
next_collection = service.find("dt", string="Next collection") is validated (or
use a selector) and raise or log a clear, contextual error if either element is
missing so callers get informative failure messages; update any surrounding code
that assumes these variables (service_title, next_collection) to handle the None
case consistently.
#1821 changes as they are more comprehensive)
|
This PR has been merged into the February 2026 consolidated release PR #1837. Thank you for your contribution! |
Minimal change to work with new structure of website
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.