Resolve merge messages found in SomersetCouncil, NewportCityCouncil, and TestValleyCouncil.#1660
Conversation
WalkthroughReplaces HTTP/requests + BeautifulSoup scrapers for Newport, Somerset and Test Valley with Selenium-driven flows that use WebDriver interactions, WebDriverWait synchronization, BeautifulSoup parsing of page_source, and explicit try/except/finally driver lifecycle/error handling; updates council docs for command/URL usage. (48 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Parse as parse_data()
participant Driver as WebDriver
participant Portal as Council Portal
participant DOM as Page DOM
participant BS as BeautifulSoup
Parse->>Driver: start / init WebDriver
Parse->>Driver: navigate to portal URL
Driver->>Portal: request page
Portal-->>Driver: page HTML
Parse->>Driver: locate postcode input, send keys
Parse->>Driver: click/search
Portal-->>Driver: addresses rendered
Parse->>Driver: select address
Portal-->>Driver: collections table available
Parse->>Driver: WebDriverWait for table
Driver->>DOM: poll for element
DOM-->>Driver: element present
Parse->>Driver: get page_source
Driver-->>Parse: HTML
Parse->>BS: parse HTML
BS-->>Parse: extract bin types & dates
Parse->>Driver: quit (finally)
Driver-->>Parse: closed
Parse-->>Parse: return {"bins":[...]}
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1660 +/- ##
=======================================
Coverage 86.79% 86.79%
=======================================
Files 9 9
Lines 1136 1136
=======================================
Hits 986 986
Misses 150 150 ☔ View full report in Codecov by Sentry. |
|
This got my Test Valley collections working, but had to change 2 lines in TestValleyBoroughCouncil.py 80: next_collection = soup.find("div", {"class": "fw-bold"}).get_text() 82: following_collection = soup.find( otherwise it got the same date for every bin Thanks for the fix :) |
|
I'll run a few more tests, I had to try a couple of different postcodes as the one in input.json only came back with garden and food on the website. Will double check later and I can add that extra change in for you too. |
Use collection object rather than soup to avoid always grabbing the dates from the 1st instance for every collection. Thanks to t65shd for spotting that one.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (2)
84-96: Case-sensitive parse bug for “Followed by …” dates.strptime expects the literal to match exactly. If the DOM text is “Followed by Monday 20 October”, parsing with 'followed by %A %d %B' will fail. Strip the prefix case‑insensitively, then parse only the date.
+import re @@ - following_collection = collection.find( + following_collection = collection.find( lambda t: ( t.name == "div" and t.get_text(strip=True).lower().startswith("followed by") ) ).get_text() @@ - following_collection_date = datetime.strptime( - following_collection, "followed by %A %d %B" - ) + follow_text = following_collection.strip() + # Remove "Followed by " (any case), then parse the remainder as a date + follow_text = re.sub(r'^\s*followed by\s+', '', follow_text, flags=re.I) + following_collection_date = datetime.strptime(follow_text, "%A %d %B")
1-1: datetime imported incorrectly for the way it’s used.You import the module (import datetime) but call class methods (datetime.strptime/now). This will raise AttributeError. Import the class.
-import datetime +from datetime import datetimeAlso applies to: 91-101, 97-97
🧹 Nitpick comments (5)
uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (5)
24-37: Use the provided url kwarg when present.Honors configuration and eases testing; still defaults to the canonical URL.
- driver.get( - "https://testvalley.gov.uk/wasteandrecycling/when-are-my-bins-collected/when-are-my-bins-collected" - ) + driver.get( + url + or "https://testvalley.gov.uk/wasteandrecycling/when-are-my-bins-collected/when-are-my-bins-collected" + )
46-49: Wait for a clickable button and tighten the locator.Presence doesn’t guarantee interactability; the first .govuk-button may not be the search button. Use element_to_be_clickable with a more specific XPath.
- findAddress = WebDriverWait(driver, 10).until( - EC.presence_of_element_located((By.CLASS_NAME, "govuk-button")) - ) + findAddress = WebDriverWait(driver, 10).until( + EC.element_to_be_clickable( + (By.XPATH, "//button[contains(@class,'govuk-button')][normalize-space()='Find address']") + ) + ) findAddress.click()
75-79: Optional: narrow the collection cards selection.div.p-2 is generic and may over-match. If stable, scope under the “Your next collections” section to reduce noise.
- collections = soup.find_all("div", {"class": "p-2"}) + # Example: constrain to cards within the "Your next collections" section + collections = soup.select("h2.mt-4.govuk-heading-s:-soup-contains('Your next collections') ~ div div.p-2")
78-78: Trim bin type text.Avoid stray whitespace/newlines.
- bin_type = collection.find("h3").get_text() + bin_type = collection.find("h3").get_text(strip=True)
125-129: Use logging instead of print and include context.Library code shouldn’t print; prefer logging with structured context.
- # Here you can log the exception if needed - print(f"An error occurred: {e}") + # Prefer logger (keeps HA/CLI logs clean and configurable) + import logging + logging.getLogger(__name__).exception("TestValley scraping failed")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py(0 hunks)uk_bin_collection/uk_bin_collection/councils/SomersetCouncil.py(0 hunks)uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py(1 hunks)wiki/Councils.md(3 hunks)
💤 Files with no reviewable changes (2)
- uk_bin_collection/uk_bin_collection/councils/SomersetCouncil.py
- uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py
🔇 Additional comments (5)
wiki/Councils.md (4)
2629-2640: Documentation update for New Forest URL endpoint looks good.The URL endpoint change from the previous form endpoint to
FIND_MY_BIN_BAR.ebis clearly documented with all required parameters (-s,-u,-p,-w). This aligns with the Selenium-based migration for browser automation.
3359-3368: Whitespace cleanup in South Gloucestershire documentation.The change removes extraneous spacing before the command, which is a minor formatting improvement with no functional impact.
3849-3860: Torbay documentation updated to reflect Selenium WebDriver requirement.The addition of the
-wparameter and its documentation ("required for Home Assistant") is consistent with the broader migration to Selenium-based browser automation across multiple council parsers. The parameter mirrors the format used by other Selenium-dependent councils in this documentation.
1-2: Verify auto-generation persistence.The file header indicates this is auto-generated from
uk_bin_collection/tests/input.json. Ensure that the source input file has been updated to reflect these documentation changes, so they persist on the next auto-generation run.uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (1)
80-83: Good fix: scope limited to the current collection card.Switching from soup.find(...) to collection.find(...) prevents cross-card leakage and aligns with t65shd’s observation. LGTM.
Fix to remove merge conflicts in Somerset Council, Newport City Council and Test Valley Borough Council.
Tested to make sure those were the correct changes to merge in.
Not sure where the docs: one came from but it looks valid.
Fixes #1651
Summary by CodeRabbit
Refactor
Documentation