Skip to content

fix: NorthKestevenDistrictCouncil - make own HTTP request instead of using page param#1987

Closed
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/NorthKestevenDistrictCouncil
Closed

fix: NorthKestevenDistrictCouncil - make own HTTP request instead of using page param#1987
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/NorthKestevenDistrictCouncil

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented Apr 30, 2026

The scraper calls page.text but page is a string, not a Response object, so it throws an AttributeError. Switched to making a direct requests.get() with the UPRN embedded in the URL and added skip_get_url + uprn to input.json.

Tested against UPRN 10006545854 (Sleaford area) on 2026-04-30.

Summary by CodeRabbit

  • Chores
    • Updated NorthKestevenDistrictCouncil configuration to use a shared endpoint for retrieving bin collection data.
    • Modified data retrieval method to fetch live information directly using UPRN parameter instead of pre-embedded URLs.
    • Updated user documentation to clarify UPRN parameter entry method.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

The NorthKestevenDistrictCouncil implementation shifts from using pre-provided page content to fetching live HTML directly. The test configuration is updated with skip_get_url: true and a normalized endpoint URL. The parse_data method now validates and uses the UPRN parameter to directly query the council's bin display endpoint.

Changes

Cohort / File(s) Summary
Configuration & Test Data
uk_bin_collection/tests/input.json
Updated NorthKestevenDistrictCouncil config with skip_get_url: true, normalized URLs to /bins/display endpoint, added fixed UPRN example, and revised wiki note to direct users to provide UPRN via parameter.
Council Implementation
uk_bin_collection/uk_bin_collection/councils/NorthKestevenDistrictCouncil.py
Modified parse_data to validate UPRN and fetch live HTML via HTTP GET request to /bins/display?uprn=... with custom User-Agent header, replacing reliance on pre-fetched page content. Removed docstring and comments while preserving DOM extraction and parsing logic.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Parser as NorthKestevenDistrictCouncil
    participant HTTP as HTTP Client
    participant API as Council API
    participant Parser as DOM Parser

    Client->>Parser: parse_data(page, uprn=...)
    Parser->>Parser: check_uprn(uprn)
    Parser->>HTTP: GET /bins/display?uprn=...
    Note over HTTP: User-Agent Header
    HTTP->>API: Request with custom headers
    API-->>HTTP: HTML Response
    HTTP-->>Parser: response.text
    Parser->>Parser: Extract bin types & dates
    Parser->>Parser: Parse collection dates
    Parser-->>Client: Return bin collection dict
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • PR #1733: Adds User-Agent header to HTTP request in council scraper to ensure valid HTML response from target site.

Suggested Reviewers

  • dp247

Poem

🐰 A UPRN now flows with purpose clear,
Where live HTML fetches without fear,
No placeholders hiding in the URL's guise,
Just direct endpoints beneath the skies!
Binned logic hops forward with pride.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and directly summarizes the main change: the NorthKestevenDistrictCouncil scraper now makes its own HTTP request instead of relying on the page parameter, which is the core bug fix described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@uk_bin_collection/uk_bin_collection/councils/NorthKestevenDistrictCouncil.py`:
- Around line 10-18: Validate the uprn variable locally before constructing the
URL and making the HTTP request: do not rely on check_uprn() since it can
swallow bad input — instead explicitly check uprn (e.g., ensure it's present and
matches the expected format) and raise an exception (ValueError or a custom
exception) if invalid, then only build the url = f"...?uprn={uprn}" and call
requests.get when validation passes; reference the uprn variable, the
check_uprn() call, and the URL/request block to locate where to add this
early-fail validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc99435e-0cac-4f3c-a92c-8528165abcfd

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and 39b1fd6.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/NorthKestevenDistrictCouncil.py

Comment on lines +10 to +18
uprn = kwargs.get("uprn")
check_uprn(uprn)

url = f"https://www.n-kesteven.org.uk/bins/display?uprn={uprn}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate uprn here instead of relying on check_uprn().

check_uprn() currently swallows invalid input, so a missing fixture can still fall through to ...?uprn=None and only fail after making a remote request. Raise locally before building the URL so malformed inputs fail fast. Based on learnings: prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🔧 Suggested fix
         uprn = kwargs.get("uprn")
-        check_uprn(uprn)
+        if uprn in (None, ""):
+            raise ValueError("Invalid UPRN")
 
         url = f"https://www.n-kesteven.org.uk/bins/display?uprn={uprn}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uprn = kwargs.get("uprn")
check_uprn(uprn)
url = f"https://www.n-kesteven.org.uk/bins/display?uprn={uprn}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
uprn = kwargs.get("uprn")
if uprn in (None, ""):
raise ValueError("Invalid UPRN")
url = f"https://www.n-kesteven.org.uk/bins/display?uprn={uprn}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/NorthKestevenDistrictCouncil.py`
around lines 10 - 18, Validate the uprn variable locally before constructing the
URL and making the HTTP request: do not rely on check_uprn() since it can
swallow bad input — instead explicitly check uprn (e.g., ensure it's present and
matches the expected format) and raise an exception (ValueError or a custom
exception) if invalid, then only build the url = f"...?uprn={uprn}" and call
requests.get when validation passes; reference the uprn variable, the
check_uprn() call, and the URL/request block to locate where to add this
early-fail validation.

@robbrad robbrad mentioned this pull request May 1, 2026
@robbrad
Copy link
Copy Markdown
Owner

robbrad commented May 1, 2026

Included in May 2026 Release PR #1992. Closing.

@robbrad robbrad closed this May 1, 2026
@robbrad robbrad mentioned this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants