fix: DurhamCouncil — switch to Selenium for JS-rendered bin data by InertiaUK · Pull Request #2047 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-12T10:00:24Z

The bin collection page at durham.gov.uk/bincollections?uprn= renders data client-side via JavaScript. The existing scraper used requests.get() which only returned the empty HTML shell — the binsrubbish/binsrecycling/binsgardenwaste divs were present but unpopulated.

Switched to Selenium with create_webdriver() to render the JS. Waits for the bin data divs to appear, then parses with BeautifulSoup as before. The CSS class selectors and date parsing are unchanged.

Added web_driver to input.json config.

Summary by CodeRabbit

Refactor
- Durham Council bin collection retrieval updated to use a JavaScript-rendering workflow for more reliable schedule and date extraction.
Documentation
- Test configuration and council note updated to reflect the new rendering approach used for Durham's data.

coderabbitai · 2026-05-12T10:00:37Z

📝 Walkthrough

Walkthrough

DurhamCouncil scraping is migrated from direct HTTP requests to Selenium WebDriver to render JavaScript-populated bin collection data. Test config now documents a web_driver endpoint; the implementation creates a WebDriver, waits for bin elements, parses driver.page_source with BeautifulSoup, extracts dates via regex, and ensures driver.quit() in a finally block.

Changes

DurhamCouncil Selenium Migration

Layer / File(s)	Summary
Test configuration and Selenium requirement documentation `uk_bin_collection/tests/input.json`	Test configuration adds `web_driver` endpoint (`http://selenium:4444`) and updates `wiki_note` to indicate Selenium is required for rendering JS-populated bin data.
Selenium WebDriver-based parsing implementation `uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`	The `parse_data` method is rewritten to create a WebDriver from optional config, construct the target URL using `uprn`, wait for `.binsrubbish` and `.binsrecycling` elements to render, parse the rendered page with BeautifulSoup, extract collection dates with a raw-string regex, and ensure the driver is quit in a finally block.

Sequence Diagram

sequenceDiagram
  participant parse_data as parse_data method
  participant webdriver as WebDriver
  participant javascript as Durham website JS
  participant beautifulsoup as BeautifulSoup
  participant bins as bins data
  parse_data->>webdriver: create_webdriver(headless, web_driver)
  webdriver->>javascript: navigate to URL with uprn
  javascript->>webdriver: render .binsrubbish and .binsrecycling
  webdriver->>webdriver: wait for elements to be present
  parse_data->>beautifulsoup: parse driver.page_source
  beautifulsoup->>parse_data: return rendered HTML tree
  parse_data->>bins: extract dates with regex and append {type, collectionDate}
  parse_data->>webdriver: finally: driver.quit()

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

robbrad/UKBinCollectionData#1765: Updates create_webdriver in common.py with single driver return and window positioning, directly supporting the Selenium migration in DurhamCouncil.

Suggested reviewers

dp247

Poem

🐰 Durham's bins now shimmer on the screen,
With Selenium's magic, rendered so clean,
JS wakes up, and WebDriver peeks to see,
Rubbish and recycling revealed with glee,
The rabbit hops: "Tests pass — tea for me!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: switching DurhamCouncil from requests to Selenium to handle JS-rendered bin data.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-12T10:02:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (e81ef13).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2047   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py`:
- Around line 44-53: The current parsing silently ignores non-matching but
non-empty collection_text; update the logic in the DurhamCouncil parser (the
block using variables collection_text, results, bin_type, date_format, and
appending to data["bins"]) to explicitly handle known non-date states (e.g.
recognised phrases like "no collections" or other council-specific tokens) and
for any other non-matching collection_text raise an exception (e.g. ValueError)
including the raw collection_text so scrapers fail noisily; keep the existing
successful path (re.search -> datetime.strptime -> append) unchanged but add
explicit checks before falling through to ensure unexpected formats are
surfaced.
- Around line 26-32: The current wait uses
WebDriverWait(...).until(EC.presence_of_element_located((By.CSS_SELECTOR,
".binsrubbish, .binsrecycling"))) which only ensures the placeholder divs exist
and can race before Durham's JS fills them; change the wait to assert that those
elements contain non-empty text (e.g., use EC.text_to_be_present_in_element for
a known substring or a custom lambda that checks element.text.strip() != "")
before creating BeautifulSoup(driver.page_source) in DurhamCouncil.py so soup
parses populated bin data rather than empty shells.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3cbe3cf6-c2d9-46b9-8b93-3161c049bf10

📥 Commits

Reviewing files that changed from the base of the PR and between 1d207b3 and e81ef13.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py

🚧 Files skipped from review as they are similar to previous changes (1)

uk_bin_collection/tests/input.json

coderabbitai · 2026-05-12T15:27:32Z

+            WebDriverWait(driver, 30).until(
+                EC.presence_of_element_located(
+                    (By.CSS_SELECTOR, ".binsrubbish, .binsrecycling")
+                )
+            )

-        # Make a BS4 object
-        soup = BeautifulSoup(page.text, features="html.parser")
+            soup = BeautifulSoup(driver.page_source, features="html.parser")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wait for populated bin text, not just the placeholder nodes.

The empty shell already contains these divs, so presence_of_element_located can succeed before Durham’s JS has written any collection data. That makes driver.page_source racey and can still parse empty bins.

Suggested fix

- WebDriverWait(driver, 30).until( - EC.presence_of_element_located( - (By.CSS_SELECTOR, ".binsrubbish, .binsrecycling") - ) - ) + WebDriverWait(driver, 30).until( + lambda d: any( + el.text.strip() + for el in d.find_elements( + By.CSS_SELECTOR, + ".binsrubbish, .binsrecycling, .binsgardenwaste", + ) + ) + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

WebDriverWait(driver, 30).until(

EC.presence_of_element_located(

(By.CSS_SELECTOR, ".binsrubbish, .binsrecycling")

)

)

# Make a BS4 object

soup = BeautifulSoup(page.text, features="html.parser")

soup = BeautifulSoup(driver.page_source, features="html.parser")

WebDriverWait(driver, 30).until(

lambda d: any(

el.text.strip()

for el in d.find_elements(

By.CSS_SELECTOR,

".binsrubbish, .binsrecycling, .binsgardenwaste",

)

)

)

soup = BeautifulSoup(driver.page_source, features="html.parser")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines 26 - 32, The current wait uses WebDriverWait(...).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".binsrubbish, .binsrecycling"))) which only ensures the placeholder divs exist and can race before Durham's JS fills them; change the wait to assert that those elements contain non-empty text (e.g., use EC.text_to_be_present_in_element for a known substring or a custom lambda that checks element.text.strip() != "") before creating BeautifulSoup(driver.page_source) in DurhamCouncil.py so soup parses populated bin data rather than empty shells.

coderabbitai · 2026-05-12T15:27:32Z

                if collection_text:
-                    results = re.search("\\d\\d? [A-Za-z]+ \\d{4}", collection_text)
+                    results = re.search(r"\d\d? [A-Za-z]+ \d{4}", collection_text)
                    if results:
                        date = datetime.strptime(results[0], "%d %B %Y")
-                        if date:
-                            data["bins"].append(
-                                {
-                                    "type": bin_type,
-                                    "collectionDate": date.strftime(date_format),
-                                }
-                            )
+                        data["bins"].append(
+                            {
+                                "type": bin_type,
+                                "collectionDate": date.strftime(date_format),
+                            }
+                        )


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Don't silently drop non-empty collection text that stops matching the date regex.

If Durham changes the copy here, this returns partial data and looks like “no collections” instead of surfacing a scraper break. Handle any known non-date states explicitly, and raise on anything else.

Suggested fix

if collection_text: results = re.search(r"\d\d? [A-Za-z]+ \d{4}", collection_text) - if results: - date = datetime.strptime(results[0], "%d %B %Y") - data["bins"].append( - { - "type": bin_type, - "collectionDate": date.strftime(date_format), - } - ) + if not results: + raise ValueError( + f"Unexpected Durham collection text for {bin_type}: {collection_text!r}" + ) + date = datetime.strptime(results[0], "%d %B %Y") + data["bins"].append( + { + "type": bin_type, + "collectionDate": date.strftime(date_format), + } + )

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if collection_text:

results = re.search("\\d\\d? [A-Za-z]+ \\d{4}", collection_text)

results = re.search(r"\d\d? [A-Za-z]+ \d{4}", collection_text)

if results:

date = datetime.strptime(results[0], "%d %B %Y")

if date:

data["bins"].append(

{

"type": bin_type,

"collectionDate": date.strftime(date_format),

}

)

data["bins"].append(

{

"type": bin_type,

"collectionDate": date.strftime(date_format),

}

)

if collection_text:

results = re.search(r"\d\d? [A-Za-z]+ \d{4}", collection_text)

if not results:

raise ValueError(

f"Unexpected Durham collection text for {bin_type}: {collection_text!r}"

)

date = datetime.strptime(results[0], "%d %B %Y")

data["bins"].append(

{

"type": bin_type,

"collectionDate": date.strftime(date_format),

}

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/DurhamCouncil.py` around lines 44 - 53, The current parsing silently ignores non-matching but non-empty collection_text; update the logic in the DurhamCouncil parser (the block using variables collection_text, results, bin_type, date_format, and appending to data["bins"]) to explicitly handle known non-date states (e.g. recognised phrases like "no collections" or other council-specific tokens) and for any other non-matching collection_text raise an exception (e.g. ValueError) including the raw collection_text so scrapers fail noisily; keep the existing successful path (re.search -> datetime.strptime -> append) unchanged but add explicit checks before falling through to ensure unexpected formats are surfaced.

fix: durham council - switch to selenium for js-rendered bin data

e81ef13

InertiaUK force-pushed the fix/durham-selenium-render branch from 1d207b3 to e81ef13 Compare May 12, 2026 15:24

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: DurhamCouncil — switch to Selenium for JS-rendered bin data#2047

fix: DurhamCouncil — switch to Selenium for JS-rendered bin data#2047
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/durham-selenium-render

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading