Skip to content

fix: add User-Agent header to KingsLynnandWestNorfolkBC scraper#1733

Merged
robbrad merged 1 commit into
robbrad:dec_releasefrom
ReeceLaww:fix/kings-lynn-west-norfolk-user-agent
Dec 7, 2025
Merged

fix: add User-Agent header to KingsLynnandWestNorfolkBC scraper#1733
robbrad merged 1 commit into
robbrad:dec_releasefrom
ReeceLaww:fix/kings-lynn-west-norfolk-user-agent

Conversation

@ReeceLaww
Copy link
Copy Markdown

@ReeceLaww ReeceLaww commented Nov 27, 2025

The Kings Lynn and West Norfolk council scraper was returning empty bin data because the website (https://www.west-norfolk.gov.uk) was blocking requests without a proper User-Agent header, resulting in a 403 Forbidden HTTP error.

Root cause:

  • The scraper was sending HTTP requests with only a Cookie header
  • The council website's server requires a User-Agent header to identify the client
  • Without this header, the server rejected the request with HTTP 403 Forbidden
  • This caused BeautifulSoup to parse an error page instead of bin collection data
  • The scraper found zero bin_date_container divs, resulting in empty bins array

Solution:

  • Added a standard Chrome User-Agent string to the request headers
  • The website now accepts the request and returns the expected HTML content
  • The scraper successfully parses bin collection dates from the response

Testing:

  • Verified with test UPRN - now returns bin collections successfully
  • Integration test passes successfully
  • All unit tests continue to pass (76/77, unrelated Chrome driver failure)

Summary by CodeRabbit

  • Chores
    • Improved HTTP request handling for enhanced compatibility with external services.

✏️ Tip: You can customize this high-level summary in your review settings.

The Kings Lynn and West Norfolk council scraper was returning empty bin data
because the website (https://www.west-norfolk.gov.uk) was blocking requests
without a proper User-Agent header, resulting in a 403 Forbidden HTTP error.

Root cause:
- The scraper was sending HTTP requests with only a Cookie header
- The council website's server requires a User-Agent header to identify the client
- Without this header, the server rejected the request with HTTP 403 Forbidden
- This caused BeautifulSoup to parse an error page instead of bin collection data
- The scraper found zero bin_date_container divs, resulting in empty bins array

Solution:
- Added a standard Chrome User-Agent string to the request headers
- The website now accepts the request and returns the expected HTML content
- The scraper successfully parses bin collection dates from the response

Testing:
- Verified with test UPRN - now returns bin collections successfully
- Integration test passes successfully
- All unit tests continue to pass (76/77, unrelated Chrome driver failure)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 27, 2025

Walkthrough

A User-Agent header is added to the HTTP request in the Kings Lynn and West Norfolk Borough Council bin collection module, while preserving the existing Cookie header with the Uprn value. No changes to parsing, query logic, or data extraction.

Changes

Cohort / File(s) Summary
HTTP Request Headers
uk_bin_collection/uk_bin_collection/councils/KingsLynnandWestNorfolkBC.py
Added User-Agent header to HTTP request alongside existing Cookie header

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

  • Single file modification affecting HTTP request headers only
  • No logic or parsing changes

Poem

🐰 A header new, so fresh and bright,
User-Agent sent with all our might,
The request flows with proper grace,
Through Norfolk's bins, we set the pace! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the primary change: adding a User-Agent header to fix the KingsLynnandWestNorfolkBC scraper.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/KingsLynnandWestNorfolkBC.py (1)

25-28: User-Agent header fix looks good; consider extracting UA to a constant.

The added User-Agent together with the existing Cookie header is a sensible, minimal fix for the 403s. To make future maintenance easier (and to reuse this UA across any other scrapers if needed), you could move the UA string to a module-level constant and reference it here.

@@
-from uk_bin_collection.uk_bin_collection.common import *
-from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass
-
-
-# import the wonderful Beautiful Soup and the URL grabber
-class CouncilClass(AbstractGetBinDataClass):
+from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass
+
+USER_AGENT = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/91.0.4472.124 Safari/537.36"
+)
+
+
+# import the wonderful Beautiful Soup and the URL grabber
+class CouncilClass(AbstractGetBinDataClass):
@@
-        headers = {
-            "Cookie": f"bcklwn_uprn={user_uprn}",
-            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
-        }
+        headers = {
+            "Cookie": f"bcklwn_uprn={user_uprn}",
+            "User-Agent": USER_AGENT,
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c8a80 and a12c533.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/KingsLynnandWestNorfolkBC.py (1 hunks)

@ReeceLaww
Copy link
Copy Markdown
Author

Any update on this please? Keen to get it implemented to fix my local council. Thank you.

@robbrad robbrad changed the base branch from master to dec_release December 7, 2025 10:23
@robbrad robbrad merged commit e03a268 into robbrad:dec_release Dec 7, 2025
1 check passed
@ReeceLaww ReeceLaww deleted the fix/kings-lynn-west-norfolk-user-agent branch March 5, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants