Skip to content

fix: ValeofGlamorganCouncil — add User-Agent header to API request#2042

Closed
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/vale-of-glamorgan-useragent
Closed

fix: ValeofGlamorganCouncil — add User-Agent header to API request#2042
InertiaUK wants to merge 1 commit into
robbrad:masterfrom
InertiaUK:fix/vale-of-glamorgan-useragent

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

The Astun Technology iShare API at myvale.valeofglamorgan.gov.uk returns HTTP 404 when called without a User-Agent header (or with the default python-requests UA). The existing scraper had various Sec-Fetch-* headers but no User-Agent.

Added a browser User-Agent header to both the initial API request and the subsequent schedule page fetch. Also cleaned up unnecessary parameters (jQuery callback timestamp, import parameter) from the API request.

Summary by CodeRabbit

  • Refactor
    • Streamlined Vale of Glamorgan Council bin collection data retrieval and improved parsing robustness.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

The parse_data method in ValeofGlamorganCouncil is refactored to streamline the HTTP request headers, simplify JSONP parameters, parse the JSON response once and reuse its structure, and update error handling in the table parsing loop.

Changes

Vale of Glamorgan Council Scraper Refactor

Layer / File(s) Summary
Request headers and parameters simplification
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py
HTTP headers dict is reduced from a larger browser/security set to core fields (User-Agent, Accept, Accept-Language, Referer). JSONP query parameters are simplified by removing dynamic import and _ fields.
JSON response parsing and schedule extraction
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py
JSON response is parsed once into a parsed object; bin_week, weekly_collection, and weekly_dates are derived from that structure and reused to build schedule_url before fetching and parsing the calendar page.
Calendar table parsing and collections update
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py
Table header lookup is moved into the try block alongside bin_date parsing; exception handler is simplified to catch Exception without binding the exception variable; additional tuples from weekly_dates are appended to the collections list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A Vale scraper hops along the path,
Trimming headers, parsing paths,
One JSON load, no waste or dread,
Dates flow clean from what's been read,
Errors caught with graceful care—
Bins collected, fresh and fair! 📦

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change—adding a User-Agent header to fix API requests for ValeofGlamorganCouncil, which aligns with the PR's core objective and file modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py (1)

62-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent exception swallowing hides parsing failures; also table_headers lookup is inefficiently inside the loop.

Two issues here:

  1. Bare except Exception: continue silently discards all errors. If the page format changes, failures will go unnoticed. At minimum, log the exception so format changes are detectable.

  2. table_headers lookup on line 64 is constant per page but now executes on every loop iteration. It should be moved outside the loop.

🔧 Proposed fix
+        table_headers = table.find("tr").find_all("th")
         for day in remove_alpha_characters(row[1].text.strip()).split():
             try:
                 bin_date = datetime(collection_year, collection_month, int(day))
-                table_headers = table.find("tr").find_all("th")
                 collections.append(
                     (
                         table_headers[1]
                         .text.strip()
                         .replace(" collection date", ""),
                         bin_date,
                     )
                 )
-            except Exception:
+            except (ValueError, IndexError) as e:
+                # Log or handle specific parsing failures
                 continue

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py`
around lines 62 - 74, The code currently computes table_headers inside the loop
and swallows all exceptions; move the table_headers =
table.find("tr").find_all("th") lookup to before the loop that builds
collections, validate its length/format and raise a clear ValueError if headers
are not as expected, then inside the loop only catch specific exceptions (e.g.,
ValueError, TypeError) and log the exception (using the module/class logger or
Python logging) rather than using a bare "except Exception: continue"; on truly
unexpected formats re-raise the error so failures are visible; keep the existing
variables (table_headers, bin_date, collections, collection_year,
collection_month, table) to locate and update the code.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py (1)

35-38: ⚡ Quick win

Consider adding error handling for the API response structure.

Direct key access to parsed["Results"]["waste"] will raise an unclear KeyError if the API response structure changes. A more explicit check would help diagnose issues faster.

♻️ Suggested improvement
         parsed = json.loads(response)
-        waste = parsed["Results"]["waste"]
+        try:
+            waste = parsed["Results"]["waste"]
+        except KeyError as e:
+            raise ValueError(
+                f"Unexpected API response structure, missing key: {e}"
+            ) from e
         bin_week = str(waste["roundday_residual"]).replace(" ", "-")
         weekly_collection = str(waste["recycling_code"]).capitalize()

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors. Use clear exception types, document error causes."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py`
around lines 35 - 38, Validate the API response structure before directly
indexing parsed["Results"]["waste"]: check that parsed is a dict,
parsed.get("Results") is present and is a dict, and that "waste" exists and is a
dict; if any check fails raise a clear ValueError (or custom exception) that
includes the unexpected response snippet and mentions the failing field (e.g.,
"Results" or "waste") so callers of the code that produces
parsed/bin_week/weekly_collection can diagnose the issue quickly; keep the
downstream logic using waste["roundday_residual"] and waste["recycling_code"]
intact once validation passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py`:
- Around line 62-74: The code currently computes table_headers inside the loop
and swallows all exceptions; move the table_headers =
table.find("tr").find_all("th") lookup to before the loop that builds
collections, validate its length/format and raise a clear ValueError if headers
are not as expected, then inside the loop only catch specific exceptions (e.g.,
ValueError, TypeError) and log the exception (using the module/class logger or
Python logging) rather than using a bare "except Exception: continue"; on truly
unexpected formats re-raise the error so failures are visible; keep the existing
variables (table_headers, bin_date, collections, collection_year,
collection_month, table) to locate and update the code.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py`:
- Around line 35-38: Validate the API response structure before directly
indexing parsed["Results"]["waste"]: check that parsed is a dict,
parsed.get("Results") is present and is a dict, and that "waste" exists and is a
dict; if any check fails raise a clear ValueError (or custom exception) that
includes the unexpected response snippet and mentions the failing field (e.g.,
"Results" or "waste") so callers of the code that produces
parsed/bin_week/weekly_collection can diagnose the issue quickly; keep the
downstream logic using waste["roundday_residual"] and waste["recycling_code"]
intact once validation passes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98d8b88f-a912-45d0-9758-d9d6f6ddfd26

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 21b15e7.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (21b15e7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2042   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@InertiaUK InertiaUK closed this May 12, 2026
@InertiaUK InertiaUK deleted the fix/vale-of-glamorgan-useragent branch May 12, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant