feat: add direct mobile API support to North Hertfordshire District Council scraper#1715
Conversation
WalkthroughNorth Hertfordshire’s data path was switched from page-scraping to Cloud9 mobile API calls. The test input was converted to a UPRN-centered configuration. New functions Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Council as NorthHertsCouncil
participant UPRNAPI as Addresses API
participant MobileAPI as Cloud9 Mobile API
Caller->>Council: parse_data(uprn OR postcode, paon)
alt UPRN not provided
Council->>UPRNAPI: lookup_uprn(postcode, paon)
UPRNAPI-->>Council: uprn or error
else UPRN provided
Council->>Council: use supplied uprn
end
rect rgb(230,245,255)
Note over Council,MobileAPI: Fetch collection data for UPRN (auth + headers)
Council->>MobileAPI: fetch_mobile_api(uprn)
MobileAPI-->>Council: JSON with container1..container8 or error
end
Council->>Council: parse containerN -> bins[type, ISO collectionDate], sort
Council-->>Caller: standardized bins array or error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
🪛 Ruff (0.14.4)uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py50-50: Avoid specifying long messages outside the exception class (TRY003) 53-53: Avoid specifying long messages outside the exception class (TRY003) 63-63: Avoid specifying long messages outside the exception class (TRY003) 66-69: Avoid specifying long messages outside the exception class (TRY003) 74-74: Avoid specifying long messages outside the exception class (TRY003) 78-81: Avoid specifying long messages outside the exception class (TRY003) 101-105: Avoid specifying long messages outside the exception class (TRY003) 109-113: Avoid specifying long messages outside the exception class (TRY003) 137-137: Avoid specifying long messages outside the exception class (TRY003) 140-143: Avoid specifying long messages outside the exception class (TRY003) 149-149: Avoid specifying long messages outside the exception class (TRY003) 158-158: Unused method argument: (ARG002) 197-199: Avoid specifying long messages outside the exception class (TRY003) 234-236: Avoid specifying long messages outside the exception class (TRY003) 🔇 Additional comments (5)
Tip 📝 Customizable high-level summaries are now available!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
uk_bin_collection/tests/input.json (1)
1707-1713: North Herts test config matches new UPRN flow; minor clarity nitThe
skip_get_url: true+ explicituprnsetup matches the new Cloud9 API–driven implementation and should exercise the direct-U PRN path correctly. One small clarity tweak you might consider is aligning theurl/wiki_command_url_overridewith the actual/wastecollections/{uprn}endpoint used in code (or explicitly noting that the URL here is illustrative only), to avoid confusion for future contributors readinginput.json.uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
13-23: Cloud9 API constants are clear; consider future-proofing auth/configThe API base URL and headers are nicely encapsulated, which keeps the mobile integration self-contained. Since both
MOBILE_API_AUTHandX-App-Versionare hard-coded, any upstream change (e.g. rotated auth or bumped app version) will require a code release; if you expect the council/app vendor to change these periodically, consider making them configurable (env or settings) so they can be updated without touching code.
100-124:fetch_mobile_apiis straightforward; align docstring with actual exceptionsThe implementation is simple and correct: build
/wastecollections/{uprn}, enforce a 200 response, and return parsed JSON. The docstring currently only advertisesrequests.RequestException, but the code will also raiseValueErroron non-200 responses; consider mentioningValueErrorin the docstring so callers know to handle both cases.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (3)
check_uprn(67-78)check_postcode(36-49)check_paon(52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
44-44: Avoid specifying long messages outside the exception class
(TRY003)
47-47: Avoid specifying long messages outside the exception class
(TRY003)
55-58: Avoid specifying long messages outside the exception class
(TRY003)
64-67: Avoid specifying long messages outside the exception class
(TRY003)
75-75: Local variable full_address is assigned to but never used
Remove assignment to unused variable full_address
(F841)
83-87: Avoid specifying long messages outside the exception class
(TRY003)
91-95: Avoid specifying long messages outside the exception class
(TRY003)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Unused method argument: page
(ARG002)
150-150: check_uprn may be undefined, or defined from star imports
(F405)
156-156: check_postcode may be undefined, or defined from star imports
(F405)
157-157: check_paon may be undefined, or defined from star imports
(F405)
171-173: Avoid specifying long messages outside the exception class
(TRY003)
198-198: date_format may be undefined, or defined from star imports
(F405)
204-206: Avoid specifying long messages outside the exception class
(TRY003)
210-210: date_format may be undefined, or defined from star imports
(F405)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)
132-213: Parse flow looks good; a few import, PII, and minor ergonomics points
- The overall flow (prefer explicit
uprn, otherwisepostcode+paon→lookup_uprn→fetch_mobile_api→ container loop) is clean and should behave well for both UPRN and legacy postcode usage.- You rely on
check_uprn,check_postcode,check_paon, anddate_formatviafrom uk_bin_collection.uk_bin_collection.common import *. Ruff is right that this makes it hard to see dependencies statically; importing those names explicitly (or adding# noqa: F403, F405if this is the project-wide pattern) would improve clarity and tooling support.- The unused
pageparameter is required by the abstract interface, but Ruff flags ARG002. If you want to satisfy the linter without changing behavior, consider renaming to_pageor adding a targeted# noqa: ARG002on the method.- The error at Lines 170–173 includes a truncated JSON dump of the API response. That’s helpful for debugging but may leak full address/UPRN details into logs if exceptions are logged verbosely; consider shortening this to a more generic message or gating the raw payload behind a debug flag to reduce PII exposure.
- For the “legacy” path,
parse_dataexpectspaoninkwargs. Please double-check that the CLI or any callers still passing house number are wiring it intopaonfor this council; otherwise that backward-compat path may never be exercised.The container loop and date handling (ISO → formatted
date_format→ sorted) look consistent with the rest of the codebase’s patterns.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
14-23: Document the hardcoded authorization credentials.While the PR notes that this mobile API authorization token is extracted from the app and unlikely to change, hardcoding credentials in source remains a maintenance risk. Consider adding a comment explaining:
- That this token is reverse-engineered from the mobile app
- The expected behavior if it does change (e.g., will all scraper instances fail simultaneously?)
- Any monitoring or fallback strategy
183-199: Add error handling for date parsing to prevent crashes on malformed data.The
datetime.fromisoformat()call at line 192 will raiseValueErrorif the API returns an unexpected date format. While the API is expected to return valid ISO format, defensive coding suggests wrapping this in try/except.Apply this fix:
# Extract collection date collection_date_str = container.get("collectionDate", "") # Skip empty collection dates if not collection_date_str: continue # Extract container description (bin type) bin_type = container.get("containerDescription", f"Container {i}") - collection_datetime = datetime.fromisoformat(collection_date_str) - - - # Parse the date - API returns ISO format like "2025-11-25T00:00:00" - bin_entry = { - "type": bin_type, - "collectionDate": collection_datetime.strftime(date_format), - } - - data["bins"].append(bin_entry) + + try: + # Parse the date - API returns ISO format like "2025-11-25T00:00:00" + collection_datetime = datetime.fromisoformat(collection_date_str) + bin_entry = { + "type": bin_type, + "collectionDate": collection_datetime.strftime(date_format), + } + data["bins"].append(bin_entry) + except ValueError: + # Skip containers with invalid date formats + continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (3)
check_uprn(67-78)check_postcode(36-49)check_paon(52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
42-42: Avoid specifying long messages outside the exception class
(TRY003)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
56-59: Avoid specifying long messages outside the exception class
(TRY003)
65-68: Avoid specifying long messages outside the exception class
(TRY003)
83-87: Avoid specifying long messages outside the exception class
(TRY003)
91-95: Avoid specifying long messages outside the exception class
(TRY003)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Unused method argument: page
(ARG002)
150-150: check_uprn may be undefined, or defined from star imports
(F405)
156-156: check_postcode may be undefined, or defined from star imports
(F405)
157-157: check_paon may be undefined, or defined from star imports
(F405)
171-173: Avoid specifying long messages outside the exception class
(TRY003)
198-198: date_format may be undefined, or defined from star imports
(F405)
204-206: Avoid specifying long messages outside the exception class
(TRY003)
210-210: date_format may be undefined, or defined from star imports
(F405)
🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)
100-124: LGTM - Clean API call with appropriate error handling.The function correctly validates the status code and has a reasonable timeout. The
response.json()call could theoretically raiseJSONDecodeErrorif the API returns invalid JSON, but this is acceptable given the API is expected to always return valid JSON.
147-161: Well-structured fallback mechanism for UPRN lookup.The dual-path approach (direct UPRN or postcode/paon lookup) maintains backward compatibility while enabling the more accurate UPRN-based method. The validation using
check_uprn,check_postcode, andcheck_paonappropriately ensures inputs are valid before processing.
175-211: Robust parsing logic with appropriate defensive checks.The container iteration correctly:
- Handles up to 8 containers as documented
- Validates container existence and type
- Skips empty collection dates
- Provides sensible defaults for missing descriptions
- Sorts results by date for consistent output
Prevent empty string edge case with `split()` by gating on truthiness of `line.strip)`
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
77-79: Minor: Comment could be more precise.The comment states "matches the start of addressLine1 or addressLine2", but the code actually checks whether
paonequals the first token (word) of either line, not just whether it appears at the start as a substring.Consider updating to:
- # Check if paon matches the start of addressLine1 or addressLine2 + # Check if paon matches the first token of addressLine1 or addressLine2
192-211: Optimize datetime handling to avoid redundant parsing.The current flow parses the ISO date string to a datetime object (line 192), formats it back to a string (line 198), then parses the string again for sorting (line 210). This is inefficient and error-prone.
Consider storing the datetime object temporarily to avoid reparsing:
# Extract container description (bin type) bin_type = container.get("containerDescription", f"Container {i}") - collection_datetime = datetime.fromisoformat(collection_date_str) - + try: + collection_datetime = datetime.fromisoformat(collection_date_str) + except ValueError: + continue # Parse the date - API returns ISO format like "2025-11-25T00:00:00" bin_entry = { "type": bin_type, "collectionDate": collection_datetime.strftime(date_format), + "_sort_date": collection_datetime, # Store datetime for sorting } data["bins"].append(bin_entry) if not data["bins"]: raise ValueError( "No valid bin collection data could be extracted from the API response" ) # Sort the bin collections by date - data["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) - ) + data["bins"].sort(key=lambda x: x.pop("_sort_date"))This avoids the redundant
strptimecall and is more efficient.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (3)
check_uprn(67-78)check_postcode(36-49)check_paon(52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
42-42: Avoid specifying long messages outside the exception class
(TRY003)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
56-59: Avoid specifying long messages outside the exception class
(TRY003)
65-68: Avoid specifying long messages outside the exception class
(TRY003)
83-87: Avoid specifying long messages outside the exception class
(TRY003)
91-95: Avoid specifying long messages outside the exception class
(TRY003)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Unused method argument: page
(ARG002)
150-150: check_uprn may be undefined, or defined from star imports
(F405)
156-156: check_postcode may be undefined, or defined from star imports
(F405)
157-157: check_paon may be undefined, or defined from star imports
(F405)
171-173: Avoid specifying long messages outside the exception class
(TRY003)
198-198: date_format may be undefined, or defined from star imports
(F405)
204-206: Avoid specifying long messages outside the exception class
(TRY003)
210-210: date_format may be undefined, or defined from star imports
(F405)
🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)
100-123: LGTM!The function properly validates the API response, includes a reasonable timeout, and provides helpful error messages. The TRY003 warnings about long error messages are a style preference—your detailed messages are user-friendly and consistent with the rest of the codebase.
147-160: Excellent backward compatibility approach!The dual input method (direct UPRN or fallback to postcode/paon lookup) is well-implemented and maintains compatibility with existing integrations while enabling the more reliable UPRN-based approach. The validation calls appropriately check inputs before proceeding.
Note: The unused
pageparameter (line 132) is required by theAbstractGetBinDataClassinterface, and the Ruff warnings about star imports are false positives—the imported functions exist incommon.py.
13-23: Verify Cloud9 API terms permit automated access before merging.The mobile app's authorization token (line 15) is extracted from the Cloud9 mobile application. A web search confirms that Cloud9 publishes Terms of Service and Acceptable Use policies governing API usage, but these terms are not publicly available—they exist in the vendor's documentation or the council's contract.
Before merging, verify:
- The exact terms of the Cloud9 API documentation or council contract to confirm automated scraping via the mobile app's auth token is permitted
- Whether the approach could trigger rate limiting or abuse detection mechanisms
- If Cloud9/North Hertfordshire have been contacted to approve this method
The implementation is otherwise clean and well-structured, but this verification step is necessary to ensure compliance and stability.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)
9-9: Consider replacing wildcard import with explicit imports.The wildcard import makes it unclear which symbols are used from
common(e.g.,check_uprn,check_postcode,check_paon,date_format). Explicit imports improve readability and make dependencies obvious.Apply this diff:
-from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import ( + check_uprn, + check_postcode, + check_paon, + date_format, +)
100-124: Consider adding error handling for malformed JSON responses.If the API returns non-JSON content (e.g., HTML error page),
response.json()at line 123 will raiseJSONDecodeErrorwithout context. Wrapping it in try-except would provide clearer error messages for debugging.Apply this diff:
- return response.json() + try: + return response.json() + except requests.exceptions.JSONDecodeError as exc: + raise ValueError( + f"Mobile API returned invalid JSON. Status: {response.status_code}" + ) from exc
176-176: Consider extracting the magic number 8 to a named constant.The container count (8) is hardcoded here. If the API changes its container limit, extracting this to a module-level constant like
MAX_CONTAINERS = 8would make the code more maintainable.Apply this diff at module level:
+# Maximum number of containers supported by the API +MAX_CONTAINERS = 8 + + def lookup_uprn(postcode: str, paon: str) -> str:Then update the loop:
# Process all 8 possible containers - for i in range(1, 9): + for i in range(1, MAX_CONTAINERS + 1):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (3)
check_uprn(67-78)check_postcode(36-49)check_paon(52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
42-42: Avoid specifying long messages outside the exception class
(TRY003)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
56-59: Avoid specifying long messages outside the exception class
(TRY003)
65-68: Avoid specifying long messages outside the exception class
(TRY003)
83-87: Avoid specifying long messages outside the exception class
(TRY003)
91-95: Avoid specifying long messages outside the exception class
(TRY003)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Unused method argument: page
(ARG002)
150-150: check_uprn may be undefined, or defined from star imports
(F405)
156-156: check_postcode may be undefined, or defined from star imports
(F405)
157-157: check_paon may be undefined, or defined from star imports
(F405)
171-173: Avoid specifying long messages outside the exception class
(TRY003)
196-196: Avoid specifying long messages outside the exception class
(TRY003)
201-201: date_format may be undefined, or defined from star imports
(F405)
208-210: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (4)
26-98: LGTM: Robust UPRN lookup with excellent error messages.The function correctly validates inputs, handles API errors, matches addresses by PAON, and provides clear, actionable error messages including links to external UPRN lookup. The past review issues have been properly addressed.
192-196: Good error handling for datetime parsing.The try-except block properly handles invalid date formats and re-raises with context. This addresses the past review comment about datetime parsing, and the fail-fast approach is preferable to silently skipping invalid data.
213-215: Verify the pop-during-sort pattern is intentional.Using
popinside the lambda has a side effect during sorting—it mutates each bin entry by removing_sort_dateas the sort progresses. While this achieves the goal of removing the temporary sort key, it's unconventional and could be surprising to maintainers.Is this pattern intentional? A more explicit two-pass approach would be clearer:
# Sort by date data["bins"].sort(key=lambda x: x["_sort_date"]) # Remove temporary sort keys for bin_entry in data["bins"]: bin_entry.pop("_sort_date")This separates concerns and makes the intent obvious without relying on lambda side effects.
14-23: Verify the hardcoded authorization token is still accepted by Cloud9's API before merging.The token cannot be verified through public documentation. The provided header decodes to
cloud9:idBcV4bor5and follows standard HTTP Basic auth format. To confirm it's still valid, test the API endpoint directly or contact Cloud9/North Hertfordshire support to ensure the token hasn't been rotated. If it becomes invalid post-deployment, all requests will fail immediately.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)
199-205: Consider skipping bins with malformedcollectionDateinstead of failing the whole parse.At the moment, any single bad
collectionDatefrom the API causesparse_datato raise and drop all other valid bins. For a third-party API this is brittle; a more resilient approach is to log/trace the bad value andcontinueso the rest of the data is still usable, for example:- try: - collection_datetime = datetime.fromisoformat(collection_date_str) - except ValueError as exc: - # re-raise with a more descriptive error message - raise ValueError("Could not parse collection date. Please check the API response.") from exc + try: + collection_datetime = datetime.fromisoformat(collection_date_str) + except ValueError: + # Skip bins with invalid date formats but keep processing others + continueThis matches the earlier suggestion to harden date parsing while maintaining graceful degradation.
🧹 Nitpick comments (5)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (5)
18-29: Re-evaluate hard-coded mobile API headers and authorization.Embedding a Basic
Authorizationheader and specificX-App-Version/X-Platformvalues directly in the code couples this scraper tightly to a particular mobile client and exposes what is effectively a credential. If possible, consider:
- Minimising the headers to only what the API truly requires (e.g.
Accept,Authorization), and- Moving the auth value (and any version identifiers that may change) into configuration or environment so they can be rotated/updated without a code change.
88-103: Defensively handle missinguprnin the addresses payload.
matching_addresses[0]["uprn"]assumes every matching record contains anuprnkey; if the API schema changes or a bad record slips through, this will raise a bareKeyError. It would be safer to use.get("uprn")and raise aValueErrorwith a clear message when the field is missing, so callers see a user-friendly error instead of an unhandled exception.
141-147: Optional: rename unusedpageparameter to avoid ARG002.
pageis intentionally unused in this API-driven implementation, but Ruff still flags it (ARG002). If you want a clean lint run, consider renaming it to_page(or similar) to document the intent and silence the warning without changing call behaviour.
154-170: Add an explicit guard when neitheruprnnorpostcode/paonare provided.Right now, if
uprnis missing and eitherpostcodeorpaonis absent, execution will still fall through tocheck_postcode(postcode)/check_paon(paon), which can yield less clear errors (or hit the postcode API with aNone/empty value). A small upfront check like:uprn = kwargs.get("uprn") if not uprn: postcode = kwargs.get("postcode") paon = kwargs.get("paon") if not postcode or not paon: raise ValueError("You must provide either 'uprn' or both 'postcode' and 'paon'.")would make the failure mode clearer and avoid unnecessary external calls.
221-224: Avoid mutating bin entries inside the sort key.Using
key=lambda x: x.pop("_sort_date")both sorts and mutates each dict, which is a bit surprising. An explicit two-step approach is easier to reason about:- data["bins"].sort( - key=lambda x: x.pop("_sort_date") - ) + data["bins"].sort(key=lambda x: x["_sort_date"]) + for bin_entry in data["bins"]: + bin_entry.pop("_sort_date", None)This keeps the mutation separate from the sort, without changing the output structure.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
check_paon(52-64)check_postcode(36-49)check_uprn(67-78)uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
48-48: Avoid specifying long messages outside the exception class
(TRY003)
51-51: Avoid specifying long messages outside the exception class
(TRY003)
62-65: Avoid specifying long messages outside the exception class
(TRY003)
71-74: Avoid specifying long messages outside the exception class
(TRY003)
89-93: Avoid specifying long messages outside the exception class
(TRY003)
97-101: Avoid specifying long messages outside the exception class
(TRY003)
124-124: Avoid specifying long messages outside the exception class
(TRY003)
124-124: f-string without any placeholders
Remove extraneous f prefix
(F541)
127-130: Avoid specifying long messages outside the exception class
(TRY003)
141-141: Unused method argument: page
(ARG002)
180-182: Avoid specifying long messages outside the exception class
(TRY003)
205-205: Avoid specifying long messages outside the exception class
(TRY003)
217-219: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)
172-220: API-driven parsing flow looks clean and readable.The loop over
container1–container8, the_sort_datehelper field, and the final sort produce a clear, deterministic bin list, and the error messages for missingwasteCollectionDates/ emptybinsare user-friendly.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)
206-210: Consider skipping invalid dates instead of failing the entire parse.The current implementation re-raises when a date can't be parsed, which will fail the entire scrape if the API returns any malformed date. To make the scraper more resilient to partial data issues, consider logging a warning and continuing instead:
try: collection_datetime = datetime.fromisoformat(collection_date_str) except ValueError as exc: - # re-raise with a more descriptive error message - raise ValueError("Could not parse collection date. Please check the API response.") from exc + # Skip bins with invalid date format and continue processing others + continueThis aligns with the earlier suggestion in past reviews to gracefully handle invalid dates rather than failing completely.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
check_paon(52-64)check_postcode(36-49)check_uprn(67-78)uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
48-48: Avoid specifying long messages outside the exception class
(TRY003)
51-51: Avoid specifying long messages outside the exception class
(TRY003)
62-65: Avoid specifying long messages outside the exception class
(TRY003)
71-74: Avoid specifying long messages outside the exception class
(TRY003)
89-93: Avoid specifying long messages outside the exception class
(TRY003)
97-101: Avoid specifying long messages outside the exception class
(TRY003)
125-125: Avoid specifying long messages outside the exception class
(TRY003)
128-131: Avoid specifying long messages outside the exception class
(TRY003)
137-137: Avoid specifying long messages outside the exception class
(TRY003)
146-146: Unused method argument: page
(ARG002)
185-187: Avoid specifying long messages outside the exception class
(TRY003)
210-210: Avoid specifying long messages outside the exception class
(TRY003)
222-224: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (8)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (8)
1-14: LGTM!The imports and documentation comments are clear and appropriate for the API-based implementation.
18-29: Confirm the hardcoded authorization token is intentionally public.The
MOBILE_API_AUTHconstant contains a hardcoded Basic authentication token. While the PR description notes this is extracted from the mobile app and unlikely to change, it's worth confirming this token is intended to be publicly visible in the repository.If this is a public API token used by the mobile app, consider adding a comment explaining its source and public nature to prevent future concerns.
32-103: LGTM!The
lookup_uprnfunction provides robust address lookup with clear error messages guiding users to findmyaddress.co.uk when issues occur. The input validation and matching logic are sound.
106-137: LGTM!The error handling is well-structured, properly catching network errors separately from JSON parsing errors. The error messages are clear and actionable.
161-174: LGTM!The UPRN handling with automatic fallback to
lookup_uprnprovides good backward compatibility while supporting the new UPRN-first approach.
177-187: LGTM!The API response validation clearly handles missing data with an actionable error message.
189-205: LGTM!The container iteration logic cleanly handles the API's eight-container structure, skipping missing or empty entries.
221-231: LGTM!The final validation ensures at least one valid collection was found, and the sorting logic correctly uses and removes the temporary
_sort_datefield.
It's definitely a bit of an edge case, but being a little more tolerant won't hurt.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
121-127: Consider validating UPRN format for robustness.While UPRNs are numeric and don't require URL encoding like postcodes, adding format validation would make the function more defensive against invalid input and provide clearer error messages.
url = f"{MOBILE_API_BASE}/wastecollections/{uprn}" + + # Validate UPRN format (typically 12 digits, but can vary) + if not uprn.isdigit(): + raise ValueError(f"Invalid UPRN format: '{uprn}'. UPRN must be numeric.") # Perform the HTTP request and surface network-layer errors clearly try:
229-231: Consider usingget()instead ofpop()in sort key for clarity.Using
pop()in the sort lambda modifies dictionaries during sorting, which works but is unconventional and may be surprising to future maintainers.More conventional approach:
- # Sort the bin collections by date - data["bins"].sort( - key=lambda x: x.pop("_sort_date") - ) + # Sort the bin collections by date + data["bins"].sort(key=lambda x: x["_sort_date"]) + + # Remove temporary sort keys + for bin_entry in data["bins"]: + bin_entry.pop("_sort_date")The current implementation is functionally correct, so this is purely a readability suggestion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
check_paon(52-64)check_postcode(36-49)check_uprn(67-78)uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
50-50: Avoid specifying long messages outside the exception class
(TRY003)
53-53: Avoid specifying long messages outside the exception class
(TRY003)
64-67: Avoid specifying long messages outside the exception class
(TRY003)
73-76: Avoid specifying long messages outside the exception class
(TRY003)
91-95: Avoid specifying long messages outside the exception class
(TRY003)
99-103: Avoid specifying long messages outside the exception class
(TRY003)
127-127: Avoid specifying long messages outside the exception class
(TRY003)
130-133: Avoid specifying long messages outside the exception class
(TRY003)
139-139: Avoid specifying long messages outside the exception class
(TRY003)
148-148: Unused method argument: page
(ARG002)
187-189: Avoid specifying long messages outside the exception class
(TRY003)
224-226: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)
124-139: LGTM! Error handling is now correct.The error handling properly separates network errors (
RequestException) from JSON parsing errors (JSONDecodeError), with clear error messages and proper exception chaining usingfrom exc. This addresses the previous review feedback.
163-176: LGTM! UPRN resolution logic is well-structured.The dual-mode approach (direct UPRN or postcode/paon lookup) maintains backward compatibility while encouraging the more reliable UPRN-based method. The validation calls (
check_uprn,check_postcode,check_paon) ensure data quality before API calls.
192-221: LGTM! Container parsing handles edge cases robustly.The iteration through 8 containers correctly handles:
- Missing or non-dict containers (lines 196-197)
- Empty collection dates (lines 203-204)
- Invalid date formats with try/except (lines 208-212, addressing previous review feedback)
The temporary
_sort_datekey is a clean approach for maintaining sort order.
... and handle request exceptions
* Amend loop variable for readability * Introduce temp variable to build list of collection data with sortable datetime field * Use dict comprehension to build final payload in order to avoid mutating dicts while looping
This PR replaces the existing Selenium web scraping implementation for North Hertfordshire District Council with a direct API approach that calls the Cloud9 mobile application API endpoints.
The primary property identification method now uses UPRN instead of postcode & house number. I've attempted to mitigate this breaking change by providing a fallback to a UPRN search (also helpfully provided by the mobile app).
Whilst the mobile app's API calls do make use of an
Authorizationheader, it's a value that is baked into their application code, so is unlikely to change, as it would require all installed instances to update or lose connectivity. Certainly, it seems less likely for that to change than for a DOM structure change which would cause a web scraper breakage.Summary by CodeRabbit