Skip to content

Resolve merge messages found in SomersetCouncil, NewportCityCouncil, and TestValleyCouncil.#1660

Merged
robbrad merged 8 commits into
robbrad:masterfrom
geekball:fix-merge-errors
Oct 21, 2025
Merged

Resolve merge messages found in SomersetCouncil, NewportCityCouncil, and TestValleyCouncil.#1660
robbrad merged 8 commits into
robbrad:masterfrom
geekball:fix-merge-errors

Conversation

@geekball
Copy link
Copy Markdown
Contributor

@geekball geekball commented Oct 18, 2025

Fix to remove merge conflicts in Somerset Council, Newport City Council and Test Valley Borough Council.

Tested to make sure those were the correct changes to merge in.

Not sure where the docs: one came from but it looks valid.

Fixes #1651

Summary by CodeRabbit

  • Refactor

    • Improved data retrieval reliability for Newport City Council, Somerset Council, and Test Valley Borough Council.
  • Documentation

    • Updated council command docs: renamed South Kesteven entry and simplified its parameters; adjusted North Tyneside usage to require UPRN only and removed postcode requirement.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 18, 2025

Walkthrough

Replaces HTTP/requests + BeautifulSoup scrapers for Newport, Somerset and Test Valley with Selenium-driven flows that use WebDriver interactions, WebDriverWait synchronization, BeautifulSoup parsing of page_source, and explicit try/except/finally driver lifecycle/error handling; updates council docs for command/URL usage. (48 words)

Changes

Cohort / File(s) Summary
Council Selenium Migration
uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py, uk_bin_collection/uk_bin_collection/councils/SomersetCouncil.py, uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py
Replaced multi-request/form-based scrapers with Selenium browser automation. New flow: initialize WebDriver, navigate to portal, enter postcode (and PAON where required), select address, wait for collections table via WebDriverWait, parse page_source with BeautifulSoup to extract bin types and two collection dates, and return dict with "bins". Added try/except/finally for driver cleanup and error propagation. Resolved Git merge conflict markers in TestValley file.
Documentation Updates
wiki/Councils.md
Updated council command/URL notes: adjusted South Kesteven command and data-source URL, updated North Tyneside usage to the new waste-collection-schedule URL and clarified UPRN-only requirement, and revised related parameter/usage notes to reflect removed or changed flags.

Sequence Diagram(s)

sequenceDiagram
    participant Parse as parse_data()
    participant Driver as WebDriver
    participant Portal as Council Portal
    participant DOM as Page DOM
    participant BS as BeautifulSoup

    Parse->>Driver: start / init WebDriver
    Parse->>Driver: navigate to portal URL
    Driver->>Portal: request page
    Portal-->>Driver: page HTML

    Parse->>Driver: locate postcode input, send keys
    Parse->>Driver: click/search
    Portal-->>Driver: addresses rendered

    Parse->>Driver: select address
    Portal-->>Driver: collections table available

    Parse->>Driver: WebDriverWait for table
    Driver->>DOM: poll for element
    DOM-->>Driver: element present

    Parse->>Driver: get page_source
    Driver-->>Parse: HTML

    Parse->>BS: parse HTML
    BS-->>Parse: extract bin types & dates

    Parse->>Driver: quit (finally)
    Driver-->>Parse: closed

    Parse-->>Parse: return {"bins":[...]}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • dp247

Poem

🐰 I hopped from requests to a browser bright,
Keys and clicks now guide me through the night,
Waits that are patient, parsers that peek,
Conflicts resolved — the warren's sleek! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning Issue #1651 requires fixing git merge conflict markers in TestValleyBoroughCouncil.py that cause an IndentationError in Home Assistant. While the PR appears to have resolved the merge markers by selecting the Selenium-based implementation across the affected files, comments from reviewer t65shd reveal a significant logic bug where "the same date appearing for every bin" occurs, requiring two local code edits to fix (changing soup.find() to collection.find()). The PR author acknowledged this limitation and stated they "can add the extra change suggested" but the current state indicates this fix has not been committed. This leaves the PR delivering partially working code that requires user intervention to function correctly. Apply the suggested fix from t65shd: modify TestValleyBoroughCouncil.py to use collection.find("div", {"class": "fw-bold"}) and collection.find() (scoped to collection elements) instead of soup.find() for extracting next_collection and following_collection. This will ensure each bin type gets its correct collection dates rather than duplicating the same date across all bins, resolving the functional issue identified in issue #1651 and making the code usable without local user modifications.
Title Check ⚠️ Warning The pull request title states "Resolve merge messages found in SomersetCouncil, NewportCityCouncil, and TestValleyCouncil," suggesting the PR is primarily about fixing git merge conflict markers. However, the actual changes reveal substantial code refactoring: all three council files have been completely rewritten to replace HTTP/BeautifulSoup-based scraping with Selenium-driven browser automation, along with documentation updates to wiki/Councils.md. While merge conflicts may have been the initial trigger, the primary substance and effort of this PR involves migrating scraping implementations, not simply resolving merge conflicts. The title is therefore misleading about the actual scope and nature of the changes. The title should be revised to reflect the primary change: migrating from HTTP/BeautifulSoup scraping to Selenium-driven browser automation for the three council implementations. A more accurate title might be "Replace HTTP-based scraping with Selenium browser automation for Somerset, Newport, and Test Valley councils" or similar, which would clearly communicate the main architectural change to reviewers scanning the commit history.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Out of Scope Changes Check ✅ Passed The changes in this PR are narrowly focused on resolving merge conflicts across three council modules (NewportCityCouncil.py, SomersetCouncil.py, TestValleyBoroughCouncil.py) and a documentation file (wiki/Councils.md). All file modifications appear to be related to selecting between conflicting versions during merge resolution, with the Selenium-based implementations chosen as the correct version to keep. The documentation updates align with reflecting these resolved implementations and their parameter requirements. No extraneous refactoring, feature additions, or unrelated changes are evident outside the scope of merge conflict resolution.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (b9fd02f) to head (b7f3688).
⚠️ Report is 40 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1660   +/-   ##
=======================================
  Coverage   86.79%   86.79%           
=======================================
  Files           9        9           
  Lines        1136     1136           
=======================================
  Hits          986      986           
  Misses        150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@t65shd
Copy link
Copy Markdown

t65shd commented Oct 18, 2025

This got my Test Valley collections working, but had to change 2 lines in TestValleyBoroughCouncil.py

80: next_collection = soup.find("div", {"class": "fw-bold"}).get_text()
to: next_collection = collection.find("div", {"class": "fw-bold"}).get_text()

82: following_collection = soup.find(
to: following_collection = collection.find(

otherwise it got the same date for every bin

Thanks for the fix :)

@geekball
Copy link
Copy Markdown
Contributor Author

I'll run a few more tests, I had to try a couple of different postcodes as the one in input.json only came back with garden and food on the website. Will double check later and I can add that extra change in for you too.

Use collection object rather than soup to avoid always grabbing the dates from the 1st instance for every collection. Thanks to t65shd for spotting that one.
@geekball geekball marked this pull request as ready for review October 19, 2025 07:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (2)

84-96: Case-sensitive parse bug for “Followed by …” dates.

strptime expects the literal to match exactly. If the DOM text is “Followed by Monday 20 October”, parsing with 'followed by %A %d %B' will fail. Strip the prefix case‑insensitively, then parse only the date.

+import re
@@
-                following_collection = collection.find(
+                following_collection = collection.find(
                     lambda t: (
                         t.name == "div"
                         and t.get_text(strip=True).lower().startswith("followed by")
                     )
                 ).get_text()
@@
-                following_collection_date = datetime.strptime(
-                    following_collection, "followed by %A %d %B"
-                )
+                follow_text = following_collection.strip()
+                # Remove "Followed by " (any case), then parse the remainder as a date
+                follow_text = re.sub(r'^\s*followed by\s+', '', follow_text, flags=re.I)
+                following_collection_date = datetime.strptime(follow_text, "%A %d %B")

1-1: datetime imported incorrectly for the way it’s used.

You import the module (import datetime) but call class methods (datetime.strptime/now). This will raise AttributeError. Import the class.

-import datetime
+from datetime import datetime

Also applies to: 91-101, 97-97

🧹 Nitpick comments (5)
uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (5)

24-37: Use the provided url kwarg when present.

Honors configuration and eases testing; still defaults to the canonical URL.

-            driver.get(
-                "https://testvalley.gov.uk/wasteandrecycling/when-are-my-bins-collected/when-are-my-bins-collected"
-            )
+            driver.get(
+                url
+                or "https://testvalley.gov.uk/wasteandrecycling/when-are-my-bins-collected/when-are-my-bins-collected"
+            )

46-49: Wait for a clickable button and tighten the locator.

Presence doesn’t guarantee interactability; the first .govuk-button may not be the search button. Use element_to_be_clickable with a more specific XPath.

-            findAddress = WebDriverWait(driver, 10).until(
-                EC.presence_of_element_located((By.CLASS_NAME, "govuk-button"))
-            )
+            findAddress = WebDriverWait(driver, 10).until(
+                EC.element_to_be_clickable(
+                    (By.XPATH, "//button[contains(@class,'govuk-button')][normalize-space()='Find address']")
+                )
+            )
             findAddress.click()

75-79: Optional: narrow the collection cards selection.

div.p-2 is generic and may over-match. If stable, scope under the “Your next collections” section to reduce noise.

-            collections = soup.find_all("div", {"class": "p-2"})
+            # Example: constrain to cards within the "Your next collections" section
+            collections = soup.select("h2.mt-4.govuk-heading-s:-soup-contains('Your next collections') ~ div div.p-2")

78-78: Trim bin type text.

Avoid stray whitespace/newlines.

-                bin_type = collection.find("h3").get_text()
+                bin_type = collection.find("h3").get_text(strip=True)

125-129: Use logging instead of print and include context.

Library code shouldn’t print; prefer logging with structured context.

-            # Here you can log the exception if needed
-            print(f"An error occurred: {e}")
+            # Prefer logger (keeps HA/CLI logs clean and configurable)
+            import logging
+            logging.getLogger(__name__).exception("TestValley scraping failed")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11ee035 and e17bc0e.

📒 Files selected for processing (4)
  • uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py (0 hunks)
  • uk_bin_collection/uk_bin_collection/councils/SomersetCouncil.py (0 hunks)
  • uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (1 hunks)
  • wiki/Councils.md (3 hunks)
💤 Files with no reviewable changes (2)
  • uk_bin_collection/uk_bin_collection/councils/SomersetCouncil.py
  • uk_bin_collection/uk_bin_collection/councils/NewportCityCouncil.py
🔇 Additional comments (5)
wiki/Councils.md (4)

2629-2640: Documentation update for New Forest URL endpoint looks good.

The URL endpoint change from the previous form endpoint to FIND_MY_BIN_BAR.eb is clearly documented with all required parameters (-s, -u, -p, -w). This aligns with the Selenium-based migration for browser automation.


3359-3368: Whitespace cleanup in South Gloucestershire documentation.

The change removes extraneous spacing before the command, which is a minor formatting improvement with no functional impact.


3849-3860: Torbay documentation updated to reflect Selenium WebDriver requirement.

The addition of the -w parameter and its documentation ("required for Home Assistant") is consistent with the broader migration to Selenium-based browser automation across multiple council parsers. The parameter mirrors the format used by other Selenium-dependent councils in this documentation.


1-2: Verify auto-generation persistence.

The file header indicates this is auto-generated from uk_bin_collection/tests/input.json. Ensure that the source input file has been updated to reflect these documentation changes, so they persist on the next auto-generation run.

uk_bin_collection/uk_bin_collection/councils/TestValleyBoroughCouncil.py (1)

80-83: Good fix: scope limited to the current collection card.

Switching from soup.find(...) to collection.find(...) prevents cross-card leakage and aligns with t65shd’s observation. LGTM.

@geekball geekball changed the title Fix merge errors Resolve merge messages found in SomersetCouncil, NewportCityCouncil, and TestValleyCouncil. Oct 20, 2025
@robbrad robbrad merged commit 9d538bc into robbrad:master Oct 21, 2025
13 of 14 checks passed
@geekball geekball deleted the fix-merge-errors branch October 31, 2025 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Council python file contains git merge conflict markers

4 participants