Isle of Anglesey support#1729
Conversation
|
Warning Rate limit exceeded@JosephRedfern has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 3 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdded a new Isle of Anglesey Council bin collection scraper module and a corresponding test entry; the scraper obtains an auth cookie, resolves UPRN (or uses provided UPRN), fetches schedule data via API lookups, and extracts normalized bin collection records. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Council as IsleOfAngleseyCouncil
participant Session as HTTPSession
participant API as AngleseyAPI
participant Extractor
Caller->>Council: parse_data(page, uprn OR postcode+paon)
Council->>Council: validate inputs
alt uprn not provided
Council->>Session: _initialise_session()
Session->>API: GET session endpoint
API-->>Session: auth cookie
Council->>API: _run_lookup(ADDRESS_LOOKUP_ID, {postcode, paon})
API-->>Council: address results (include UPRN)
Council->>Council: select UPRN
end
Council->>API: _run_lookup(SCHEDULE_LOOKUP_ID, {uprn})
API-->>Council: schedule payload
Council->>Extractor: _extract_bin_data(schedule)
Extractor->>Extractor: map rows -> bins, assign year, skip invalid rows
Extractor-->>Council: structured collection data
Council-->>Caller: return normalized collections
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
548d74c to
3a8e258
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
uk_bin_collection/tests/input.json (1)
1260-1267: IsleOfAngleseyCouncil test entry looks consistent; wiki_name style could be friendlier.The LAD24CD,
skip_get_url, and exampleuprnare all consistent with the new scraper and its usage of direct UPRN or postcode/PAON. To match neighbouring entries, you may want to changewiki_namefrom"IsleOfAngleseyCouncil"to a more human-readable form like"Isle of Anglesey"or"Isle of Anglesey Council, but this is cosmetic.uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (5)
1-4: Replaceimport *with explicit imports and make datetime usage explicit.Using
from uk_bin_collection.uk_bin_collection.common import *obscures which symbols are required and triggers Ruff F403/F405 (and makes it unclear wherecheck_postcode,check_uprn,date_format, anddatetimecome from). It’s safer and clearer to import exactly what you use and importdatetimefrom the standard library.A minimal refactor could look like:
-import logging -import requests -from uk_bin_collection.uk_bin_collection.common import * -from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass +import logging +from datetime import datetime + +import requests +from uk_bin_collection.uk_bin_collection.common import ( + check_postcode, + check_uprn, + date_format, +) +from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClassThis makes dependencies explicit and should resolve the Ruff F403/F405 warnings mentioned in the static analysis.
Also applies to: 134-137, 144-144, 167-167, 176-176, 187-187
27-52: Add explicit timeouts to session HTTP calls to avoid hanging on slow councils.
_initialise_sessionand_run_lookupcallself._session.get/postwithout atimeout, so a slow or non-responsive endpoint can hang indefinitely, unlikeAbstractGetBinDataClass.get_data, which uses a 120s timeout.Consider aligning with that pattern, for example:
- response = self._session.get(SESSION_URL) + response = self._session.get(SESSION_URL, timeout=120) @@ - response = self._session.post(f"{LOOKUP_URL}?id={lookup_id}", json=payload) + response = self._session.post( + f"{LOOKUP_URL}?id={lookup_id}", json=payload, timeout=120 + )You can tune the timeout value if the Anglesey endpoints are known to be slower, but having some bound is important for robustness.
61-107: UPRN lookup uses substring match on PAON; verify this is robust enough.
_get_uprn_from_postcode_and_paonnormalizes the PAON and then checkspaon_normalized in field.lower()acrossdisplay,house, andflatHouse. This is forgiving, but it may:
- Match the wrong property when multiple addresses share a common substring (e.g.,
"1"matching"10"and"11"), and- Be sensitive to users giving partial or extra text.
If the upstream data has cleaner structure (e.g., a separate house number/name field), consider tightening this to an equality or token-based match on that field first, and only fall back to substring search if needed. At minimum, it’s worth confirming against a few “tricky” postcodes with many similar addresses.
109-151: parse_data flow looks sound; you can explicitly consumepageto silence linters.The logic around accepting either
uprndirectly or resolving frompostcode+paon/number, validating viacheck_postcode, and then calling_get_uprn_from_postcode_and_paonis coherent and matches the documented kwargs.Static analysis flags
pageas unused (ARG002). To keep the AbstractGetBinDataClass interface but appease linters, you can add a no-op use:- def parse_data(self, page: str, **kwargs) -> dict: + def parse_data(self, page: str, **kwargs) -> dict: @@ - user_uprn = kwargs.get("uprn") + _ = page # required by interface; response body is not used + user_uprn = kwargs.get("uprn")This keeps behaviour identical while making the intent explicit.
153-193: Confirm date parsing/year-rollover assumptions match the API’sDateformat.
_extract_bin_dataassumes:
row["Date"]is a day and month without a year (parsed via"%d %B"and combined withcurrent_year), and- Any resulting date earlier than
nowshould be treated as the same day/month next year.This is reasonable if the API always returns a year-less recurring schedule, but it will misbehave if:
- The API ever starts including a year in
Date(parse will fail, all rows skipped), or- The schedule represents a specific fixed year (past dates being artificially bumped into next year may be misleading).
If you haven’t already, it’s worth double‑checking a few raw API responses (especially around year-end) to ensure
Datereally is justdd Monthand that the “bump to next year” logic aligns with the council’s semantics. If not, you may need to derive the year from the payload or confine results to a rolling window instead of always adding one year.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (2)
check_postcode(36-49)check_uprn(67-78)
🪛 Ruff (0.14.5)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py
3-3: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
33-33: Avoid specifying long messages outside the exception class
(TRY003)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
103-107: Avoid specifying long messages outside the exception class
(TRY003)
109-109: Unused method argument: page
(ARG002)
128-128: Avoid specifying long messages outside the exception class
(TRY003)
130-132: Avoid specifying long messages outside the exception class
(TRY003)
134-134: check_postcode may be undefined, or defined from star imports
(F405)
137-137: check_uprn may be undefined, or defined from star imports
(F405)
144-144: datetime may be undefined, or defined from star imports
(F405)
164-164: Avoid specifying long messages outside the exception class
(TRY003)
167-167: datetime may be undefined, or defined from star imports
(F405)
176-176: datetime may be undefined, or defined from star imports
(F405)
187-187: date_format may be undefined, or defined from star imports
(F405)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (1)
3-3: Replace wildcard import with explicit imports.The wildcard import makes it difficult to track which names are being imported and used throughout the module. This contributes to the F405 static analysis warnings on lines 134, 137, 144, 167, 176, and 187.
Apply this diff to use explicit imports:
-from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import ( + check_postcode, + check_uprn, + date_format, + datetime, +)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- uk_bin_collection/tests/input.json
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (2)
check_postcode(36-49)check_uprn(67-78)
🪛 Ruff (0.14.5)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py
3-3: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
33-33: Avoid specifying long messages outside the exception class
(TRY003)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
103-107: Avoid specifying long messages outside the exception class
(TRY003)
109-109: Unused method argument: page
(ARG002)
128-128: Avoid specifying long messages outside the exception class
(TRY003)
130-132: Avoid specifying long messages outside the exception class
(TRY003)
134-134: check_postcode may be undefined, or defined from star imports
(F405)
137-137: check_uprn may be undefined, or defined from star imports
(F405)
144-144: datetime may be undefined, or defined from star imports
(F405)
164-164: Avoid specifying long messages outside the exception class
(TRY003)
167-167: datetime may be undefined, or defined from star imports
(F405)
176-176: datetime may be undefined, or defined from star imports
(F405)
187-187: date_format may be undefined, or defined from star imports
(F405)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (2)
22-59: Well-structured session management.The lazy session initialization pattern is well-implemented:
- Session is created once and reused across API calls
- Clear separation between session setup and API lookup execution
- Proper HTTP error handling with
raise_for_status()The nested
.get()calls with empty dict defaults (lines 54-59) provide safe navigation of the response structure, with downstream validation ensuring data integrity.
61-193: Robust implementation with good error handling.The implementation demonstrates several strengths:
- Flexible address resolution: Supports direct UPRN or postcode+PAON lookup with clear validation (lines 120-137)
- Helpful error messages: When address matching fails, provides a list of available addresses (lines 102-107)
- Smart date handling: The year rollover logic (lines 176-182) correctly handles dates that have already passed by advancing them to the next year
- Graceful degradation: Invalid schedule rows are logged and skipped rather than failing the entire request (lines 190-191)
The
pageparameter at line 109 triggers an ARG002 warning but is required by theAbstractGetBinDataClass.parse_dataabstract method interface, so the warning can be ignored.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (4)
32-40: Consider more robust error handling.The session initialization could fail in ways not currently handled:
- If the response is not valid JSON
- If the response structure differs from expected
Consider wrapping the JSON parsing and validation in a try-except block to provide clearer error messages.
def _initialise_session(self) -> None: """Initialize session by obtaining authentication cookie.""" response = self._session.get(SESSION_URL, timeout=60) response.raise_for_status() - if not response.json().get("auth-session"): - raise ValueError("Failed to obtain session cookie") + try: + response_data = response.json() + if not response_data.get("auth-session"): + raise ValueError("Failed to obtain session cookie") + except requests.exceptions.JSONDecodeError as e: + raise ValueError(f"Invalid JSON response from session endpoint: {e}") self._have_session = True
42-66: Validate API response structure.The chained
.get()calls on lines 61-66 will returnNoneif any expected key is missing from the response. This could lead to confusing errors downstream when code expects a dictionary.Consider validating the response structure and raising a clear error if the expected keys are not present.
# Extract the nested data structure - return ( + result = ( response.json() .get("integration", {}) .get("transformed", {}) .get("rows_data", {}) ) + + if result is None: + raise ValueError(f"Unexpected API response structure for lookup {lookup_id}") + + return result
116-159: Consider timezone implications.Line 152 uses
datetime.now()without timezone information. If the server runs in a different timezone than the council's data timezone, this could lead to incorrect date calculations in_extract_bin_data.Consider using timezone-aware datetimes or documenting the timezone assumption.
+from datetime import datetime, timezone + ... # (interestingly, we can retrieve arbitrary future dates by changing calcDate) payload = { "formValues": { "Section 1": { "calcUPRN": {"value": user_uprn}, - "calcDate": {"value": datetime.now().strftime("%d/%m/%Y")}, + "calcDate": {"value": datetime.now(timezone.utc).strftime("%d/%m/%Y")}, "calcLang": {"value": "en"}, } } }Alternatively, verify whether the council API expects dates in a specific timezone and document this assumption.
162-201: Review year logic assumptions and error handling approach.The date parsing logic makes some assumptions that may need verification:
Year calculation (lines 184-190): The code assumes all dates are either in the current year or next year. If the API returns dates from earlier in the current year when it's now late in the year, they would be incorrectly moved to next year. Consider if this edge case needs handling.
Timezone consistency: Line 175 uses
datetime.now()without timezone info, which could cause issues if the server and council data use different timezones (same concern as inparse_data).Error suppression (lines 198-199): The try-except block catches errors and only logs warnings. While this prevents one bad row from breaking the entire response, it might hide systematic data quality issues. Consider whether certain errors should be fatal.
For the year logic, consider a more robust approach:
# Parse date without year, then add current year collection_date = datetime.strptime( f"{date_str} {current_year}", "%d %B %Y" ) - # If date has passed, use next year instead - if collection_date < now: + # If date is more than a month in the past, assume it's for next year + from datetime import timedelta + if collection_date < now - timedelta(days=30): collection_date = collection_date.replace(year=current_year + 1)This prevents recent past dates from being moved to next year while still handling the year rollover correctly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
uk_bin_collection/tests/input.json(1 hunks)uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (2)
check_postcode(36-49)check_uprn(67-78)uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)
🪛 Ruff (0.14.5)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py
38-38: Avoid specifying long messages outside the exception class
(TRY003)
92-92: Avoid specifying long messages outside the exception class
(TRY003)
110-114: Avoid specifying long messages outside the exception class
(TRY003)
136-136: Avoid specifying long messages outside the exception class
(TRY003)
138-140: Avoid specifying long messages outside the exception class
(TRY003)
172-172: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (4)
uk_bin_collection/tests/input.json (1)
1260-1267: LGTM!The test configuration entry is well-structured and follows the established pattern. The
skip_get_url: truesetting correctly reflects that the scraper uses API calls rather than fetching the provided URL.uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (3)
1-19: LGTM!The imports and module-level constants are well-organized and appropriately documented. The separation of lookup IDs into named constants improves maintainability.
27-30: LGTM!The initialization properly sets up session management for handling multiple authenticated API requests.
68-114: LGTM with note on static analysis warnings.The UPRN lookup logic is well-implemented with good error messages. The static analysis warnings (TRY003) about long error messages are style suggestions that can be safely ignored here—the detailed error messages provide valuable debugging information for users.
The case-insensitive matching across multiple address fields is a good approach for handling variations in address formatting.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (6)
27-31: Consider optional injection ofrequests.Sessionfor testabilityConstructing the
requests.Sessiondirectly in__init__is fine, but if you ever want to unit‑test this scraper without hitting the real endpoint, accepting an optionalsession(defaulting torequests.Session()) would make it easier to mock.
32-44: Tighten JSON decode handling in_initialise_sessionfor broaderrequestscompatibilityDepending on the
requestsversion,response.json()may raise a plainValueErrorrather thanrequests.exceptions.JSONDecodeError, so the currentexceptmay not always fire. You can make the intent clearer and avoid conflating JSON‑decode errors with your ownValueErrorby separating the decode step:- try: - if not response.json().get("auth-session"): - raise ValueError("Failed to obtain session cookie") - except requests.exceptions.JSONDecodeError as e: - raise ValueError("Failed to decode session response as JSON") from e - - self._have_session = True + try: + data = response.json() + except (ValueError, requests.exceptions.JSONDecodeError) as e: + # Depending on `requests` version, `.json()` may raise `ValueError` + # or `requests.exceptions.JSONDecodeError`. + raise ValueError("Failed to decode session response as JSON") from e + + if not data.get("auth-session"): + raise ValueError("Failed to obtain session cookie") + + self._have_session = TrueThis also plays nicer with Ruff’s TRY003 guidance by concentrating the long JSON‑decode‑failure message in one place.
45-73: Mirror the JSON decode pattern in_run_lookupfor consistency and resilienceSame JSON‑decode concern applies here; it’s safer and clearer to decode once, catch both
ValueErrorandrequests.exceptions.JSONDecodeError, then traverse the dict:- # Extract the nested data structure - try: - return ( - response.json() - .get("integration", {}) - .get("transformed", {}) - .get("rows_data", {}) - ) - except requests.exceptions.JSONDecodeError as e: - raise ValueError("Failed to decode lookup response as JSON") from e + # Extract the nested data structure + try: + data = response.json() + except (ValueError, requests.exceptions.JSONDecodeError) as e: + raise ValueError("Failed to decode lookup response as JSON") from e + + return ( + data.get("integration", {}) + .get("transformed", {}) + .get("rows_data", {}) + )This keeps error semantics consistent between session initialisation and lookups and addresses the Ruff TRY003 hint for this block as well.
74-121: Address matching logic is solid; Ruff TRY003 messages are purely stylistic hereThe postcode → address list → PAON substring match across
display/house/flatHouseis a pragmatic approach and the “available addresses (first 5)” in the error is very user‑friendly.Ruff’s TRY003 warnings on the
ValueErrormessages at Lines 98 and 116‑120 are about style (long messages embedded in generic exceptions). If you want to silence them without losing diagnostics, you could move the long text into logging and keep the raised message shorter, but there’s no functional issue here.
122-168: Input handling inparse_datais robust; consider localisingcalcDateto UK timeThe handling of
uprnvspostcode+paon/number, plus validation viacheck_postcodeandcheck_uprn(percommon.py) is clean and matches the project’s patterns. Based on relevant_code_snippets, this aligns with other council implementations.One minor edge case:
calcDateusesdatetime.now(timezone.utc). If the API interprets dates in local UK time, calling this near midnight UTC could send a different calendar date than expected during BST/GMT transitions. Not critical, but you might consider using a localised UK date (e.g.datetime.now()on a UK‑time host orzoneinfo("Europe/London")) if you ever see off‑by‑one‑day behaviour around midnight.
169-209: Date/year rollover heuristic works but could optionally be made more explicitThe extraction loop, including skipping bad rows with a warning and
exc_info=True, is good defensive coding.Two optional refinements:
- You currently infer the year and bump to
current_year + 1if the assumed‑current‑year date is more than 30 days in the past. That’s reasonable for New‑Year crossover, but if the upstream ever returns a mixture of historical and future dates, some historical ones >30 days ago would be “pushed” into next year. If that becomes an issue, you could instead explicitly discard historical dates or rely on the API to give only future ones.- For deterministic ordering of results, you might want to sort
binsby parsedcollection_datebefore returning, in case the API’s dict order changes.Neither is a blocker; the current logic is acceptable for the described use case.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (2)
uk_bin_collection/uk_bin_collection/common.py (2)
check_postcode(36-49)check_uprn(67-78)uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)
🪛 Ruff (0.14.5)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py
39-39: Avoid specifying long messages outside the exception class
(TRY003)
41-41: Avoid specifying long messages outside the exception class
(TRY003)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
98-98: Avoid specifying long messages outside the exception class
(TRY003)
116-120: Avoid specifying long messages outside the exception class
(TRY003)
142-142: Avoid specifying long messages outside the exception class
(TRY003)
144-146: Avoid specifying long messages outside the exception class
(TRY003)
180-180: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/IsleOfAngleseyCouncil.py (1)
1-20: Imports, logger, and constants are clear and idiomaticImports, module‑level logger, and the separation of
BASE_URL/lookup IDs into constants are all in line with the rest of the project and make the scraper easy to configure/maintain.
This PR introduces support for Isle of Anglesey County Council.
I've included support for either UPRN or postcode/PAON.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.