fix: ValeofGlamorganCouncil — add User-Agent header to API request#2042
fix: ValeofGlamorganCouncil — add User-Agent header to API request#2042InertiaUK wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThe ChangesVale of Glamorgan Council Scraper Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py (1)
62-74:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent exception swallowing hides parsing failures; also
table_headerslookup is inefficiently inside the loop.Two issues here:
Bare
except Exception: continuesilently discards all errors. If the page format changes, failures will go unnoticed. At minimum, log the exception so format changes are detectable.
table_headerslookup on line 64 is constant per page but now executes on every loop iteration. It should be moved outside the loop.🔧 Proposed fix
+ table_headers = table.find("tr").find_all("th") for day in remove_alpha_characters(row[1].text.strip()).split(): try: bin_date = datetime(collection_year, collection_month, int(day)) - table_headers = table.find("tr").find_all("th") collections.append( ( table_headers[1] .text.strip() .replace(" collection date", ""), bin_date, ) ) - except Exception: + except (ValueError, IndexError) as e: + # Log or handle specific parsing failures continueBased on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py` around lines 62 - 74, The code currently computes table_headers inside the loop and swallows all exceptions; move the table_headers = table.find("tr").find_all("th") lookup to before the loop that builds collections, validate its length/format and raise a clear ValueError if headers are not as expected, then inside the loop only catch specific exceptions (e.g., ValueError, TypeError) and log the exception (using the module/class logger or Python logging) rather than using a bare "except Exception: continue"; on truly unexpected formats re-raise the error so failures are visible; keep the existing variables (table_headers, bin_date, collections, collection_year, collection_month, table) to locate and update the code.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py (1)
35-38: ⚡ Quick winConsider adding error handling for the API response structure.
Direct key access to
parsed["Results"]["waste"]will raise an unclearKeyErrorif the API response structure changes. A more explicit check would help diagnose issues faster.♻️ Suggested improvement
parsed = json.loads(response) - waste = parsed["Results"]["waste"] + try: + waste = parsed["Results"]["waste"] + except KeyError as e: + raise ValueError( + f"Unexpected API response structure, missing key: {e}" + ) from e bin_week = str(waste["roundday_residual"]).replace(" ", "-") weekly_collection = str(waste["recycling_code"]).capitalize()Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors. Use clear exception types, document error causes."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py` around lines 35 - 38, Validate the API response structure before directly indexing parsed["Results"]["waste"]: check that parsed is a dict, parsed.get("Results") is present and is a dict, and that "waste" exists and is a dict; if any check fails raise a clear ValueError (or custom exception) that includes the unexpected response snippet and mentions the failing field (e.g., "Results" or "waste") so callers of the code that produces parsed/bin_week/weekly_collection can diagnose the issue quickly; keep the downstream logic using waste["roundday_residual"] and waste["recycling_code"] intact once validation passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py`:
- Around line 62-74: The code currently computes table_headers inside the loop
and swallows all exceptions; move the table_headers =
table.find("tr").find_all("th") lookup to before the loop that builds
collections, validate its length/format and raise a clear ValueError if headers
are not as expected, then inside the loop only catch specific exceptions (e.g.,
ValueError, TypeError) and log the exception (using the module/class logger or
Python logging) rather than using a bare "except Exception: continue"; on truly
unexpected formats re-raise the error so failures are visible; keep the existing
variables (table_headers, bin_date, collections, collection_year,
collection_month, table) to locate and update the code.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py`:
- Around line 35-38: Validate the API response structure before directly
indexing parsed["Results"]["waste"]: check that parsed is a dict,
parsed.get("Results") is present and is a dict, and that "waste" exists and is a
dict; if any check fails raise a clear ValueError (or custom exception) that
includes the unexpected response snippet and mentions the failing field (e.g.,
"Results" or "waste") so callers of the code that produces
parsed/bin_week/weekly_collection can diagnose the issue quickly; keep the
downstream logic using waste["roundday_residual"] and waste["recycling_code"]
intact once validation passes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 98d8b88f-a912-45d0-9758-d9d6f6ddfd26
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/ValeofGlamorganCouncil.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2042 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
The Astun Technology iShare API at
myvale.valeofglamorgan.gov.ukreturns HTTP 404 when called without a User-Agent header (or with the default python-requests UA). The existing scraper had various Sec-Fetch-* headers but no User-Agent.Added a browser User-Agent header to both the initial API request and the subsequent schedule page fetch. Also cleaned up unnecessary parameters (jQuery callback timestamp, import parameter) from the API request.
Summary by CodeRabbit