Skip to content

feat: add direct mobile API support to North Hertfordshire District Council scraper#1715

Merged
robbrad merged 20 commits into
robbrad:dec_releasefrom
robberwick:northherts-direct-url
Dec 7, 2025
Merged

feat: add direct mobile API support to North Hertfordshire District Council scraper#1715
robbrad merged 20 commits into
robbrad:dec_releasefrom
robberwick:northherts-direct-url

Conversation

@robberwick
Copy link
Copy Markdown
Contributor

@robberwick robberwick commented Nov 14, 2025

This PR replaces the existing Selenium web scraping implementation for North Hertfordshire District Council with a direct API approach that calls the Cloud9 mobile application API endpoints.

The primary property identification method now uses UPRN instead of postcode & house number. I've attempted to mitigate this breaking change by providing a fallback to a UPRN search (also helpfully provided by the mobile app).

Whilst the mobile app's API calls do make use of an Authorization header, it's a value that is baked into their application code, so is unlikely to change, as it would require all installed instances to update or lose connectivity. Certainly, it seems less likely for that to change than for a DOM structure change which would cause a web scraper breakage.

Summary by CodeRabbit

  • Refactor
    • North Hertfordshire now uses a UPRN-first lookup and a mobile API to retrieve waste collection schedules, replacing the previous scraping-based approach for faster, more reliable results; address-to-UPRN resolution is performed automatically when a UPRN isn’t provided.
  • Tests
    • Test input updated to a UPRN-driven configuration and guidance recommending using UPRNs (via FindMyAddress) with an option to override the lookup URL.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 14, 2025

Walkthrough

North Hertfordshire’s data path was switched from page-scraping to Cloud9 mobile API calls. The test input was converted to a UPRN-centered configuration. New functions lookup_uprn and fetch_mobile_api, API constants, and updated parse_data logic to parse up to eight container collection entries were added.

Changes

Cohort / File(s) Summary
Test configuration
uk_bin_collection/tests/input.json
Updated North Hertfordshire entry to a UPRN-driven config: removed house_number, postcode, web_driver; added uprn, skip_get_url, wiki_command_url_override; changed url to a UPRN endpoint and updated wiki_note.
Council implementation
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py
Replaced Selenium/BeautifulSoup scraping with Cloud9 mobile API integration; added lookup_uprn(postcode, paon) and fetch_mobile_api(uprn); introduced MOBILE_API_BASE, MOBILE_API_AUTH, MOBILE_API_HEADERS, MOBILE_API_NUM_CONTAINERS; refactored parse_data() to accept or derive UPRN and parse up to eight container collection details into standardized bin entries, with explicit error handling for missing/ambiguous data.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Council as NorthHertsCouncil
    participant UPRNAPI as Addresses API
    participant MobileAPI as Cloud9 Mobile API

    Caller->>Council: parse_data(uprn OR postcode, paon)
    alt UPRN not provided
        Council->>UPRNAPI: lookup_uprn(postcode, paon)
        UPRNAPI-->>Council: uprn or error
    else UPRN provided
        Council->>Council: use supplied uprn
    end

    rect rgb(230,245,255)
        Note over Council,MobileAPI: Fetch collection data for UPRN (auth + headers)
        Council->>MobileAPI: fetch_mobile_api(uprn)
        MobileAPI-->>Council: JSON with container1..container8 or error
    end

    Council->>Council: parse containerN -> bins[type, ISO collectionDate], sort
    Council-->>Caller: standardized bins array or error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Focus areas:
    • lookup_uprn: input validation, ambiguous/no-match handling
    • fetch_mobile_api: auth/headers, HTTP status and JSON validation
    • parse_data: parsing container1..container8, date parsing/ISO formatting, sorting, empty-result errors

Suggested reviewers

  • dp247

Poem

🐇 I hopped from pages to an API stream,

A UPRN beacon neat and keen,
Eight little bins in tidy array,
Cloud9 hummed the dates today,
Hop-hop — the rabbit codes and grins.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding direct mobile API support to the North Hertfordshire District Council scraper, replacing the previous Selenium-based approach.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a24600 and 987432d.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_paon (52-64)
  • check_postcode (36-49)
  • check_uprn (67-78)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)


53-53: Avoid specifying long messages outside the exception class

(TRY003)


63-63: Avoid specifying long messages outside the exception class

(TRY003)


66-69: Avoid specifying long messages outside the exception class

(TRY003)


74-74: Avoid specifying long messages outside the exception class

(TRY003)


78-81: Avoid specifying long messages outside the exception class

(TRY003)


101-105: Avoid specifying long messages outside the exception class

(TRY003)


109-113: Avoid specifying long messages outside the exception class

(TRY003)


137-137: Avoid specifying long messages outside the exception class

(TRY003)


140-143: Avoid specifying long messages outside the exception class

(TRY003)


149-149: Avoid specifying long messages outside the exception class

(TRY003)


158-158: Unused method argument: page

(ARG002)


197-199: Avoid specifying long messages outside the exception class

(TRY003)


234-236: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (5)

1-16: LGTM! Clean imports and clear documentation.

The file-level comments clearly document the API approach, and all imports are appropriate for the Cloud9 mobile API integration. The transition from Selenium to direct API calls is well-structured.


18-31: LGTM! Well-organized API constants.

The constants are appropriately structured, and the comment on Line 20-22 clearly documents that the authorization header is extracted from the mobile app. This approach is standard for integrating with mobile APIs where the credentials are embedded in the application binary.


34-116: Excellent implementation - all previous issues addressed.

The lookup_uprn function is well-implemented with comprehensive error handling:

  • ✅ Input validation with clear error messages
  • ✅ Proper URL encoding using the params parameter (Line 61) - addresses the previous critical URL encoding issue
  • ✅ Robust exception handling for both network errors and JSON parsing
  • ✅ Safe string splitting with guards against whitespace-only lines (Lines 92-96) - addresses previous IndexError concern
  • ✅ Helpful error messages that guide users to findmyaddress.co.uk when matches fail
  • ✅ Exception chaining maintains full context for debugging

The static analysis TRY003 warnings about "long exception messages" are false positives in this context—descriptive, user-facing error messages are appropriate for a CLI tool.


118-150: LGTM! Robust error handling for API calls.

The fetch_mobile_api function correctly handles all error scenarios:

  • ✅ Network errors caught as RequestException (Lines 134-137) - addresses previous issue where this wasn't handled
  • ✅ JSON parsing errors caught separately (Lines 146-149) - addresses previous critical issue about incorrect exception handling
  • ✅ HTTP status validation with user-helpful messages (Lines 139-143)
  • ✅ Proper exception chaining maintains debugging context
  • ✅ Reasonable 30-second timeout

The restructured error handling correctly reflects that requests.get() raises RequestException (not JSONDecodeError), and response.json() raises JSONDecodeError.


152-247: Excellent implementation with strong backward compatibility.

The parse_data method is well-designed and addresses all previous concerns:

Strengths:

  • ✅ Supports both direct UPRN input and postcode/paon lookup for backward compatibility (Lines 173-186)
  • ✅ Datetime parsing now wrapped in try/except to handle invalid formats gracefully (Lines 218-222) - addresses previous critical issue
  • ✅ Clean sorting approach using temporary _sort_date key that's filtered out before return (Lines 228, 244)
  • ✅ Gracefully handles missing/invalid container data with continue statements
  • ✅ Validates that at least one bin is extracted (Lines 233-236)
  • ✅ Clear error messages when data is missing

Static Analysis Notes:

  • The unused page parameter (ARG002) is required by the AbstractGetBinDataClass interface and is appropriately documented in the docstring as "kept for interface compatibility"
  • TRY003 warnings about exception message length are false positives—descriptive user-facing error messages are appropriate for this CLI tool context

Tip

📝 Customizable high-level summaries are now available!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide custom instructions to shape the summary (bullet lists, tables, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
uk_bin_collection/tests/input.json (1)

1707-1713: North Herts test config matches new UPRN flow; minor clarity nit

The skip_get_url: true + explicit uprn setup matches the new Cloud9 API–driven implementation and should exercise the direct-U PRN path correctly. One small clarity tweak you might consider is aligning the url/wiki_command_url_override with the actual /wastecollections/{uprn} endpoint used in code (or explicitly noting that the URL here is illustrative only), to avoid confusion for future contributors reading input.json.

uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)

13-23: Cloud9 API constants are clear; consider future-proofing auth/config

The API base URL and headers are nicely encapsulated, which keeps the mobile integration self-contained. Since both MOBILE_API_AUTH and X-App-Version are hard-coded, any upstream change (e.g. rotated auth or bumped app version) will require a code release; if you expect the council/app vendor to change these periodically, consider making them configurable (env or settings) so they can be updated without touching code.


100-124: fetch_mobile_api is straightforward; align docstring with actual exceptions

The implementation is simple and correct: build /wastecollections/{uprn}, enforce a 200 response, and return parsed JSON. The docstring currently only advertises requests.RequestException, but the code will also raise ValueError on non-200 responses; consider mentioning ValueError in the docstring so callers know to handle both cases.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c8a80 and 841a67b.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json (1 hunks)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_uprn (67-78)
  • check_postcode (36-49)
  • check_paon (52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


44-44: Avoid specifying long messages outside the exception class

(TRY003)


47-47: Avoid specifying long messages outside the exception class

(TRY003)


55-58: Avoid specifying long messages outside the exception class

(TRY003)


64-67: Avoid specifying long messages outside the exception class

(TRY003)


75-75: Local variable full_address is assigned to but never used

Remove assignment to unused variable full_address

(F841)


83-87: Avoid specifying long messages outside the exception class

(TRY003)


91-95: Avoid specifying long messages outside the exception class

(TRY003)


118-121: Avoid specifying long messages outside the exception class

(TRY003)


132-132: Unused method argument: page

(ARG002)


150-150: check_uprn may be undefined, or defined from star imports

(F405)


156-156: check_postcode may be undefined, or defined from star imports

(F405)


157-157: check_paon may be undefined, or defined from star imports

(F405)


171-173: Avoid specifying long messages outside the exception class

(TRY003)


198-198: date_format may be undefined, or defined from star imports

(F405)


204-206: Avoid specifying long messages outside the exception class

(TRY003)


210-210: date_format may be undefined, or defined from star imports

(F405)

🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)

132-213: Parse flow looks good; a few import, PII, and minor ergonomics points

  • The overall flow (prefer explicit uprn, otherwise postcode+paonlookup_uprnfetch_mobile_api → container loop) is clean and should behave well for both UPRN and legacy postcode usage.
  • You rely on check_uprn, check_postcode, check_paon, and date_format via from uk_bin_collection.uk_bin_collection.common import *. Ruff is right that this makes it hard to see dependencies statically; importing those names explicitly (or adding # noqa: F403, F405 if this is the project-wide pattern) would improve clarity and tooling support.
  • The unused page parameter is required by the abstract interface, but Ruff flags ARG002. If you want to satisfy the linter without changing behavior, consider renaming to _page or adding a targeted # noqa: ARG002 on the method.
  • The error at Lines 170–173 includes a truncated JSON dump of the API response. That’s helpful for debugging but may leak full address/UPRN details into logs if exceptions are logged verbosely; consider shortening this to a more generic message or gating the raw payload behind a debug flag to reduce PII exposure.
  • For the “legacy” path, parse_data expects paon in kwargs. Please double-check that the CLI or any callers still passing house number are wiring it into paon for this council; otherwise that backward-compat path may never be exercised.

The container loop and date handling (ISO → formatted date_format → sorted) look consistent with the rest of the codebase’s patterns.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)

14-23: Document the hardcoded authorization credentials.

While the PR notes that this mobile API authorization token is extracted from the app and unlikely to change, hardcoding credentials in source remains a maintenance risk. Consider adding a comment explaining:

  • That this token is reverse-engineered from the mobile app
  • The expected behavior if it does change (e.g., will all scraper instances fail simultaneously?)
  • Any monitoring or fallback strategy

183-199: Add error handling for date parsing to prevent crashes on malformed data.

The datetime.fromisoformat() call at line 192 will raise ValueError if the API returns an unexpected date format. While the API is expected to return valid ISO format, defensive coding suggests wrapping this in try/except.

Apply this fix:

             # Extract collection date
             collection_date_str = container.get("collectionDate", "")
 
             # Skip empty collection dates
             if not collection_date_str:
                 continue
 
             # Extract container description (bin type)
             bin_type = container.get("containerDescription", f"Container {i}")
-            collection_datetime = datetime.fromisoformat(collection_date_str)
-
-
-            # Parse the date - API returns ISO format like "2025-11-25T00:00:00"
-            bin_entry = {
-                "type": bin_type,
-                "collectionDate": collection_datetime.strftime(date_format),
-            }
-
-            data["bins"].append(bin_entry)
+            
+            try:
+                # Parse the date - API returns ISO format like "2025-11-25T00:00:00"
+                collection_datetime = datetime.fromisoformat(collection_date_str)
+                bin_entry = {
+                    "type": bin_type,
+                    "collectionDate": collection_datetime.strftime(date_format),
+                }
+                data["bins"].append(bin_entry)
+            except ValueError:
+                # Skip containers with invalid date formats
+                continue
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 841a67b and e1a2b80.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_uprn (67-78)
  • check_postcode (36-49)
  • check_paon (52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


42-42: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)


56-59: Avoid specifying long messages outside the exception class

(TRY003)


65-68: Avoid specifying long messages outside the exception class

(TRY003)


83-87: Avoid specifying long messages outside the exception class

(TRY003)


91-95: Avoid specifying long messages outside the exception class

(TRY003)


118-121: Avoid specifying long messages outside the exception class

(TRY003)


132-132: Unused method argument: page

(ARG002)


150-150: check_uprn may be undefined, or defined from star imports

(F405)


156-156: check_postcode may be undefined, or defined from star imports

(F405)


157-157: check_paon may be undefined, or defined from star imports

(F405)


171-173: Avoid specifying long messages outside the exception class

(TRY003)


198-198: date_format may be undefined, or defined from star imports

(F405)


204-206: Avoid specifying long messages outside the exception class

(TRY003)


210-210: date_format may be undefined, or defined from star imports

(F405)

🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)

100-124: LGTM - Clean API call with appropriate error handling.

The function correctly validates the status code and has a reasonable timeout. The response.json() call could theoretically raise JSONDecodeError if the API returns invalid JSON, but this is acceptable given the API is expected to always return valid JSON.


147-161: Well-structured fallback mechanism for UPRN lookup.

The dual-path approach (direct UPRN or postcode/paon lookup) maintains backward compatibility while enabling the more accurate UPRN-based method. The validation using check_uprn, check_postcode, and check_paon appropriately ensures inputs are valid before processing.


175-211: Robust parsing logic with appropriate defensive checks.

The container iteration correctly:

  • Handles up to 8 containers as documented
  • Validates container existence and type
  • Skips empty collection dates
  • Provides sensible defaults for missing descriptions
  • Sorts results by date for consistent output

 Prevent empty string edge case with `split()` by gating on truthiness of `line.strip)`
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)

77-79: Minor: Comment could be more precise.

The comment states "matches the start of addressLine1 or addressLine2", but the code actually checks whether paon equals the first token (word) of either line, not just whether it appears at the start as a substring.

Consider updating to:

-        # Check if paon matches the start of addressLine1 or addressLine2
+        # Check if paon matches the first token of addressLine1 or addressLine2

192-211: Optimize datetime handling to avoid redundant parsing.

The current flow parses the ISO date string to a datetime object (line 192), formats it back to a string (line 198), then parses the string again for sorting (line 210). This is inefficient and error-prone.

Consider storing the datetime object temporarily to avoid reparsing:

             # Extract container description (bin type)
             bin_type = container.get("containerDescription", f"Container {i}")
-            collection_datetime = datetime.fromisoformat(collection_date_str)
-
+            try:
+                collection_datetime = datetime.fromisoformat(collection_date_str)
+            except ValueError:
+                continue

             # Parse the date - API returns ISO format like "2025-11-25T00:00:00"
             bin_entry = {
                 "type": bin_type,
                 "collectionDate": collection_datetime.strftime(date_format),
+                "_sort_date": collection_datetime,  # Store datetime for sorting
             }

             data["bins"].append(bin_entry)

         if not data["bins"]:
             raise ValueError(
                 "No valid bin collection data could be extracted from the API response"
             )

         # Sort the bin collections by date
-        data["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
-        )
+        data["bins"].sort(key=lambda x: x.pop("_sort_date"))

This avoids the redundant strptime call and is more efficient.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1a2b80 and df17a45.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_uprn (67-78)
  • check_postcode (36-49)
  • check_paon (52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


42-42: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)


56-59: Avoid specifying long messages outside the exception class

(TRY003)


65-68: Avoid specifying long messages outside the exception class

(TRY003)


83-87: Avoid specifying long messages outside the exception class

(TRY003)


91-95: Avoid specifying long messages outside the exception class

(TRY003)


118-121: Avoid specifying long messages outside the exception class

(TRY003)


132-132: Unused method argument: page

(ARG002)


150-150: check_uprn may be undefined, or defined from star imports

(F405)


156-156: check_postcode may be undefined, or defined from star imports

(F405)


157-157: check_paon may be undefined, or defined from star imports

(F405)


171-173: Avoid specifying long messages outside the exception class

(TRY003)


198-198: date_format may be undefined, or defined from star imports

(F405)


204-206: Avoid specifying long messages outside the exception class

(TRY003)


210-210: date_format may be undefined, or defined from star imports

(F405)

🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)

100-123: LGTM!

The function properly validates the API response, includes a reasonable timeout, and provides helpful error messages. The TRY003 warnings about long error messages are a style preference—your detailed messages are user-friendly and consistent with the rest of the codebase.


147-160: Excellent backward compatibility approach!

The dual input method (direct UPRN or fallback to postcode/paon lookup) is well-implemented and maintains compatibility with existing integrations while enabling the more reliable UPRN-based approach. The validation calls appropriately check inputs before proceeding.

Note: The unused page parameter (line 132) is required by the AbstractGetBinDataClass interface, and the Ruff warnings about star imports are false positives—the imported functions exist in common.py.


13-23: Verify Cloud9 API terms permit automated access before merging.

The mobile app's authorization token (line 15) is extracted from the Cloud9 mobile application. A web search confirms that Cloud9 publishes Terms of Service and Acceptable Use policies governing API usage, but these terms are not publicly available—they exist in the vendor's documentation or the council's contract.

Before merging, verify:

  • The exact terms of the Cloud9 API documentation or council contract to confirm automated scraping via the mobile app's auth token is permitted
  • Whether the approach could trigger rate limiting or abuse detection mechanisms
  • If Cloud9/North Hertfordshire have been contacted to approve this method

The implementation is otherwise clean and well-structured, but this verification step is necessary to ensure compliance and stability.

Comment thread uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)

9-9: Consider replacing wildcard import with explicit imports.

The wildcard import makes it unclear which symbols are used from common (e.g., check_uprn, check_postcode, check_paon, date_format). Explicit imports improve readability and make dependencies obvious.

Apply this diff:

-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import (
+    check_uprn,
+    check_postcode,
+    check_paon,
+    date_format,
+)

100-124: Consider adding error handling for malformed JSON responses.

If the API returns non-JSON content (e.g., HTML error page), response.json() at line 123 will raise JSONDecodeError without context. Wrapping it in try-except would provide clearer error messages for debugging.

Apply this diff:

-    return response.json()
+    try:
+        return response.json()
+    except requests.exceptions.JSONDecodeError as exc:
+        raise ValueError(
+            f"Mobile API returned invalid JSON. Status: {response.status_code}"
+        ) from exc

176-176: Consider extracting the magic number 8 to a named constant.

The container count (8) is hardcoded here. If the API changes its container limit, extracting this to a module-level constant like MAX_CONTAINERS = 8 would make the code more maintainable.

Apply this diff at module level:

+# Maximum number of containers supported by the API
+MAX_CONTAINERS = 8
+
+
 def lookup_uprn(postcode: str, paon: str) -> str:

Then update the loop:

         # Process all 8 possible containers
-        for i in range(1, 9):
+        for i in range(1, MAX_CONTAINERS + 1):
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df17a45 and 1770f15.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_uprn (67-78)
  • check_postcode (36-49)
  • check_paon (52-64)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

9-9: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


42-42: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)


56-59: Avoid specifying long messages outside the exception class

(TRY003)


65-68: Avoid specifying long messages outside the exception class

(TRY003)


83-87: Avoid specifying long messages outside the exception class

(TRY003)


91-95: Avoid specifying long messages outside the exception class

(TRY003)


118-121: Avoid specifying long messages outside the exception class

(TRY003)


132-132: Unused method argument: page

(ARG002)


150-150: check_uprn may be undefined, or defined from star imports

(F405)


156-156: check_postcode may be undefined, or defined from star imports

(F405)


157-157: check_paon may be undefined, or defined from star imports

(F405)


171-173: Avoid specifying long messages outside the exception class

(TRY003)


196-196: Avoid specifying long messages outside the exception class

(TRY003)


201-201: date_format may be undefined, or defined from star imports

(F405)


208-210: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (4)

26-98: LGTM: Robust UPRN lookup with excellent error messages.

The function correctly validates inputs, handles API errors, matches addresses by PAON, and provides clear, actionable error messages including links to external UPRN lookup. The past review issues have been properly addressed.


192-196: Good error handling for datetime parsing.

The try-except block properly handles invalid date formats and re-raises with context. This addresses the past review comment about datetime parsing, and the fail-fast approach is preferable to silently skipping invalid data.


213-215: Verify the pop-during-sort pattern is intentional.

Using pop inside the lambda has a side effect during sorting—it mutates each bin entry by removing _sort_date as the sort progresses. While this achieves the goal of removing the temporary sort key, it's unconventional and could be surprising to maintainers.

Is this pattern intentional? A more explicit two-pass approach would be clearer:

# Sort by date
data["bins"].sort(key=lambda x: x["_sort_date"])

# Remove temporary sort keys
for bin_entry in data["bins"]:
    bin_entry.pop("_sort_date")

This separates concerns and makes the intent obvious without relying on lambda side effects.


14-23: Verify the hardcoded authorization token is still accepted by Cloud9's API before merging.

The token cannot be verified through public documentation. The provided header decodes to cloud9:idBcV4bor5 and follows standard HTTP Basic auth format. To confirm it's still valid, test the API endpoint directly or contact Cloud9/North Hertfordshire support to ensure the token hasn't been rotated. If it becomes invalid post-deployment, all requests will fail immediately.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)

199-205: Consider skipping bins with malformed collectionDate instead of failing the whole parse.

At the moment, any single bad collectionDate from the API causes parse_data to raise and drop all other valid bins. For a third-party API this is brittle; a more resilient approach is to log/trace the bad value and continue so the rest of the data is still usable, for example:

-            try:
-                collection_datetime = datetime.fromisoformat(collection_date_str)
-            except ValueError as exc:
-                # re-raise with a more descriptive error message
-                raise ValueError("Could not parse collection date. Please check the API response.") from exc
+            try:
+                collection_datetime = datetime.fromisoformat(collection_date_str)
+            except ValueError:
+                # Skip bins with invalid date formats but keep processing others
+                continue

This matches the earlier suggestion to harden date parsing while maintaining graceful degradation.

🧹 Nitpick comments (5)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (5)

18-29: Re-evaluate hard-coded mobile API headers and authorization.

Embedding a Basic Authorization header and specific X-App-Version/X-Platform values directly in the code couples this scraper tightly to a particular mobile client and exposes what is effectively a credential. If possible, consider:

  • Minimising the headers to only what the API truly requires (e.g. Accept, Authorization), and
  • Moving the auth value (and any version identifiers that may change) into configuration or environment so they can be rotated/updated without a code change.

88-103: Defensively handle missing uprn in the addresses payload.

matching_addresses[0]["uprn"] assumes every matching record contains an uprn key; if the API schema changes or a bad record slips through, this will raise a bare KeyError. It would be safer to use .get("uprn") and raise a ValueError with a clear message when the field is missing, so callers see a user-friendly error instead of an unhandled exception.


141-147: Optional: rename unused page parameter to avoid ARG002.

page is intentionally unused in this API-driven implementation, but Ruff still flags it (ARG002). If you want a clean lint run, consider renaming it to _page (or similar) to document the intent and silence the warning without changing call behaviour.


154-170: Add an explicit guard when neither uprn nor postcode/paon are provided.

Right now, if uprn is missing and either postcode or paon is absent, execution will still fall through to check_postcode(postcode) / check_paon(paon), which can yield less clear errors (or hit the postcode API with a None/empty value). A small upfront check like:

uprn = kwargs.get("uprn")
if not uprn:
    postcode = kwargs.get("postcode")
    paon = kwargs.get("paon")
    if not postcode or not paon:
        raise ValueError("You must provide either 'uprn' or both 'postcode' and 'paon'.")

would make the failure mode clearer and avoid unnecessary external calls.


221-224: Avoid mutating bin entries inside the sort key.

Using key=lambda x: x.pop("_sort_date") both sorts and mutates each dict, which is a bit surprising. An explicit two-step approach is easier to reason about:

-        data["bins"].sort(
-            key=lambda x: x.pop("_sort_date")
-        )
+        data["bins"].sort(key=lambda x: x["_sort_date"])
+        for bin_entry in data["bins"]:
+            bin_entry.pop("_sort_date", None)

This keeps the mutation separate from the sort, without changing the output structure.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1770f15 and 17e4119.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_paon (52-64)
  • check_postcode (36-49)
  • check_uprn (67-78)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

48-48: Avoid specifying long messages outside the exception class

(TRY003)


51-51: Avoid specifying long messages outside the exception class

(TRY003)


62-65: Avoid specifying long messages outside the exception class

(TRY003)


71-74: Avoid specifying long messages outside the exception class

(TRY003)


89-93: Avoid specifying long messages outside the exception class

(TRY003)


97-101: Avoid specifying long messages outside the exception class

(TRY003)


124-124: Avoid specifying long messages outside the exception class

(TRY003)


124-124: f-string without any placeholders

Remove extraneous f prefix

(F541)


127-130: Avoid specifying long messages outside the exception class

(TRY003)


141-141: Unused method argument: page

(ARG002)


180-182: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


217-219: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)

172-220: API-driven parsing flow looks clean and readable.

The loop over container1container8, the _sort_date helper field, and the final sort produce a clear, deterministic bin list, and the error messages for missing wasteCollectionDates / empty bins are user-friendly.

Comment thread uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1)

206-210: Consider skipping invalid dates instead of failing the entire parse.

The current implementation re-raises when a date can't be parsed, which will fail the entire scrape if the API returns any malformed date. To make the scraper more resilient to partial data issues, consider logging a warning and continuing instead:

 try:
     collection_datetime = datetime.fromisoformat(collection_date_str)
 except ValueError as exc:
-    # re-raise with a more descriptive error message
-    raise ValueError("Could not parse collection date. Please check the API response.") from exc
+    # Skip bins with invalid date format and continue processing others
+    continue

This aligns with the earlier suggestion in past reviews to gracefully handle invalid dates rather than failing completely.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17e4119 and 9679a42.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_paon (52-64)
  • check_postcode (36-49)
  • check_uprn (67-78)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

48-48: Avoid specifying long messages outside the exception class

(TRY003)


51-51: Avoid specifying long messages outside the exception class

(TRY003)


62-65: Avoid specifying long messages outside the exception class

(TRY003)


71-74: Avoid specifying long messages outside the exception class

(TRY003)


89-93: Avoid specifying long messages outside the exception class

(TRY003)


97-101: Avoid specifying long messages outside the exception class

(TRY003)


125-125: Avoid specifying long messages outside the exception class

(TRY003)


128-131: Avoid specifying long messages outside the exception class

(TRY003)


137-137: Avoid specifying long messages outside the exception class

(TRY003)


146-146: Unused method argument: page

(ARG002)


185-187: Avoid specifying long messages outside the exception class

(TRY003)


210-210: Avoid specifying long messages outside the exception class

(TRY003)


222-224: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (8)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (8)

1-14: LGTM!

The imports and documentation comments are clear and appropriate for the API-based implementation.


18-29: Confirm the hardcoded authorization token is intentionally public.

The MOBILE_API_AUTH constant contains a hardcoded Basic authentication token. While the PR description notes this is extracted from the mobile app and unlikely to change, it's worth confirming this token is intended to be publicly visible in the repository.

If this is a public API token used by the mobile app, consider adding a comment explaining its source and public nature to prevent future concerns.


32-103: LGTM!

The lookup_uprn function provides robust address lookup with clear error messages guiding users to findmyaddress.co.uk when issues occur. The input validation and matching logic are sound.


106-137: LGTM!

The error handling is well-structured, properly catching network errors separately from JSON parsing errors. The error messages are clear and actionable.


161-174: LGTM!

The UPRN handling with automatic fallback to lookup_uprn provides good backward compatibility while supporting the new UPRN-first approach.


177-187: LGTM!

The API response validation clearly handles missing data with an actionable error message.


189-205: LGTM!

The container iteration logic cleanly handles the API's eight-container structure, skipping missing or empty entries.


221-231: LGTM!

The final validation ensures at least one valid collection was found, and the sorting logic correctly uses and removes the temporary _sort_date field.

It's definitely a bit of an edge case, but being a little more tolerant won't hurt.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)

121-127: Consider validating UPRN format for robustness.

While UPRNs are numeric and don't require URL encoding like postcodes, adding format validation would make the function more defensive against invalid input and provide clearer error messages.

     url = f"{MOBILE_API_BASE}/wastecollections/{uprn}"
+    
+    # Validate UPRN format (typically 12 digits, but can vary)
+    if not uprn.isdigit():
+        raise ValueError(f"Invalid UPRN format: '{uprn}'. UPRN must be numeric.")
 
     # Perform the HTTP request and surface network-layer errors clearly
     try:

229-231: Consider using get() instead of pop() in sort key for clarity.

Using pop() in the sort lambda modifies dictionaries during sorting, which works but is unconventional and may be surprising to future maintainers.

More conventional approach:

-        # Sort the bin collections by date
-        data["bins"].sort(
-            key=lambda x: x.pop("_sort_date")
-        )
+        # Sort the bin collections by date
+        data["bins"].sort(key=lambda x: x["_sort_date"])
+        
+        # Remove temporary sort keys
+        for bin_entry in data["bins"]:
+            bin_entry.pop("_sort_date")

The current implementation is functionally correct, so this is purely a readability suggestion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9679a42 and 0a24600.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (3)
  • check_paon (52-64)
  • check_postcode (36-49)
  • check_uprn (67-78)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
🪛 Ruff (0.14.4)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)


53-53: Avoid specifying long messages outside the exception class

(TRY003)


64-67: Avoid specifying long messages outside the exception class

(TRY003)


73-76: Avoid specifying long messages outside the exception class

(TRY003)


91-95: Avoid specifying long messages outside the exception class

(TRY003)


99-103: Avoid specifying long messages outside the exception class

(TRY003)


127-127: Avoid specifying long messages outside the exception class

(TRY003)


130-133: Avoid specifying long messages outside the exception class

(TRY003)


139-139: Avoid specifying long messages outside the exception class

(TRY003)


148-148: Unused method argument: page

(ARG002)


187-189: Avoid specifying long messages outside the exception class

(TRY003)


224-226: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py (3)

124-139: LGTM! Error handling is now correct.

The error handling properly separates network errors (RequestException) from JSON parsing errors (JSONDecodeError), with clear error messages and proper exception chaining using from exc. This addresses the previous review feedback.


163-176: LGTM! UPRN resolution logic is well-structured.

The dual-mode approach (direct UPRN or postcode/paon lookup) maintains backward compatibility while encouraging the more reliable UPRN-based method. The validation calls (check_uprn, check_postcode, check_paon) ensure data quality before API calls.


192-221: LGTM! Container parsing handles edge cases robustly.

The iteration through 8 containers correctly handles:

  • Missing or non-dict containers (lines 196-197)
  • Empty collection dates (lines 203-204)
  • Invalid date formats with try/except (lines 208-212, addressing previous review feedback)

The temporary _sort_date key is a clean approach for maintaining sort order.

Comment thread uk_bin_collection/uk_bin_collection/councils/NorthHertfordshireDistrictCouncil.py Outdated
... and handle request exceptions
* Amend loop variable for readability
* Introduce temp variable to build list of collection data with sortable datetime field
* Use dict comprehension to build final payload in order to avoid mutating dicts while looping
@robbrad robbrad changed the base branch from master to dec_release December 7, 2025 10:16
@robbrad robbrad merged commit ab95aac into robbrad:dec_release Dec 7, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants