Changed to use dedicated API to retrieve collection data#1742
Conversation
WalkthroughReplaced Selenium/BeautifulSoup browser scraping with a direct JSON-RPC POST to https://www.sholland.gov.uk/apiserver/ajaxlibrary (method SouthHolland.Waste.getCollectionDates) using a validated UPRN; response JSON is parsed for bin Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (2)
31-43: Add error handling for API response failures.The code assumes a successful response with a
"result"key. JSON-RPC errors return an"error"key instead, and network/HTTP errors will cause cryptic exceptions. Consider adding defensive checks:# Parse the JSON response + response.raise_for_status() bin_collection = response.json() + if "error" in bin_collection: + raise ValueError(f"API error: {bin_collection['error']}") + # Loop through each collection in bin_collection - for collection in bin_collection["result"]: + for collection in bin_collection.get("result", []): bin_type = collection.get("typeDisplay") collection_date = collection.get("nextDate")
45-47: Consider handling missing or malformed dates in sort.If any collection entry has a
Noneor malformednextDate,strptimewill raise an exception. You could add a fallback:bindata["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d") + key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d") if x.get("collectionDate") else datetime.max )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
17-17: check_uprn may be undefined, or defined from star imports
(F405)
29-29: Probable use of requests call without timeout
(S113)
46-46: datetime may be undefined, or defined from star imports
(F405)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)
1-1: LGTM!The
requestsimport is appropriately added for the new API-based approach.
| # Make the GET request | ||
| response = requests.post(URI, json=data) |
There was a problem hiding this comment.
Add a timeout to prevent indefinite hangs; fix misleading comment.
The requests.post call lacks a timeout, which can cause the application to hang indefinitely if the server is unresponsive. Also, the comment incorrectly says "GET request" but uses POST.
- # Make the GET request
- response = requests.post(URI, json=data)
+ # Make the POST request
+ response = requests.post(URI, json=data, timeout=30)🧰 Tools
🪛 Ruff (0.14.7)
29-29: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
around lines 28-29, the inline comment incorrectly says "GET request" while the
code performs a POST and the requests.post call has no timeout; update the
comment to "Make the POST request" and add a timeout parameter (e.g. timeout=10)
to the requests.post call, and wrap the call in a try/except to catch
requests.exceptions.Timeout and requests.exceptions.RequestException to
handle/log timeouts and other network errors appropriately.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)
48-50: Add error handling for date parsing to improve robustness.The sort assumes all dates are in "%Y-%m-%d" format. If the API changes its date format, this will raise an unclear
ValueError.Consider wrapping the sort in a try-except or validating the date format earlier:
-bindata["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d") -) +try: + bindata["bins"].sort( + key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d") + ) +except (ValueError, TypeError) as e: + raise ValueError(f"Invalid date format in collection data: {e}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
17-17: check_uprn may be undefined, or defined from star imports
(F405)
18-18: Avoid specifying long messages outside the exception class
(TRY003)
49-49: datetime may be undefined, or defined from star imports
(F405)
🔇 Additional comments (5)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (5)
1-1: LGTM!The
requestsimport is appropriate for the new API-based implementation.
16-19: Past issue resolved: UPRN validation now properly enforced.The code now checks the return value of
check_uprnand raises aValueErrorwhen validation fails, preventing execution with an invalid UPRN. This addresses the previous review comment.
20-29: LGTM!The JSON-RPC 2.0 payload structure is correct, and the API endpoint configuration is appropriate.
31-32: Past issue resolved: timeout added and comment corrected.The POST request now includes a 30-second timeout to prevent indefinite hangs, and the comment accurately describes the operation. This addresses the previous review comment.
52-52: LGTM!The return statement correctly returns the bins data in the expected format.
| # Parse the JSON response | ||
| bin_collection = response.json() |
There was a problem hiding this comment.
Add error handling for HTTP and JSON-RPC responses.
The code lacks validation for HTTP errors and JSON-RPC error responses. If the API returns a non-2xx status, malformed JSON, or a JSON-RPC error object, users will encounter unclear exceptions.
Consider adding comprehensive error handling:
# Make the POST request
response = requests.post(URI, json=data, timeout=30)
+response.raise_for_status()
# Parse the JSON response
bin_collection = response.json()
+
+# Check for JSON-RPC error
+if "error" in bin_collection:
+ error = bin_collection["error"]
+ raise ValueError(f"API error: {error.get('message', 'Unknown error')}")
+
+if "result" not in bin_collection:
+ raise ValueError("Unexpected API response format: missing 'result' key")Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
around lines 34-35, add robust HTTP and JSON-RPC error handling: after making
the request check response.status_code and call response.raise_for_status() (or
explicitly raise a descriptive exception on non-2xx), parse JSON inside a
try/except to catch JSONDecodeError and raise a clear error, then validate the
parsed JSON for JSON-RPC error payloads (if an "error" key exists raise an
exception including the error contents or message) and only proceed/return the
expected result when no HTTP error, no JSON decode failure, and no JSON-RPC
error are present.
| # Loop through each collection in bin_collection | ||
| for collection in bin_collection["result"]: | ||
| bin_type = collection.get("typeDisplay") | ||
| collection_date = collection.get("nextDate") | ||
|
|
||
| dict_data = { | ||
| "type": bin_type, | ||
| "collectionDate": collection_date | ||
| } | ||
| bindata["bins"].append(dict_data) |
There was a problem hiding this comment.
Validate collection data before processing.
The loop assumes all collections have valid typeDisplay and nextDate values. If the API returns incomplete data, this can lead to runtime errors during sorting or poor user experience with missing bin types.
Add validation for the collection data:
# Loop through each collection in bin_collection
for collection in bin_collection["result"]:
bin_type = collection.get("typeDisplay")
collection_date = collection.get("nextDate")
+
+ # Skip collections with missing required data
+ if not bin_type or not collection_date:
+ continue
dict_data = {
"type": bin_type,
"collectionDate": collection_date
}
bindata["bins"].append(dict_data)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Loop through each collection in bin_collection | |
| for collection in bin_collection["result"]: | |
| bin_type = collection.get("typeDisplay") | |
| collection_date = collection.get("nextDate") | |
| dict_data = { | |
| "type": bin_type, | |
| "collectionDate": collection_date | |
| } | |
| bindata["bins"].append(dict_data) | |
| # Loop through each collection in bin_collection | |
| for collection in bin_collection["result"]: | |
| bin_type = collection.get("typeDisplay") | |
| collection_date = collection.get("nextDate") | |
| # Skip collections with missing required data | |
| if not bin_type or not collection_date: | |
| continue | |
| dict_data = { | |
| "type": bin_type, | |
| "collectionDate": collection_date | |
| } | |
| bindata["bins"].append(dict_data) |
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
around lines 37 to 46, the loop unconditionally reads collection["typeDisplay"]
and collection["nextDate"] which may be missing or malformed; update the code to
validate each collection before appending: check that "typeDisplay" is present
and non-empty, check that "nextDate" exists and is a valid date string (attempt
to parse it or use a safe date validator), skip (and optionally log) any
collection missing required fields or with an unparsable date, and only append
dicts with normalized values (or a safe default if appropriate) so downstream
sorting/processing won’t error. Ensure validation uses try/except around date
parsing and avoids adding entries with invalid data.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dec_release #1742 +/- ##
============================================
Coverage 86.79% 86.79%
============================================
Files 9 9
Lines 1136 1136
============================================
Hits 986 986
Misses 150 150 ☔ View full report in Codecov by Sentry. |
E requests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4) ../../../.cache/pypoetry/virtualenvs/uk-bin-collection-EwS6Gn8s-py3.12/lib/python3.12/site-packages/requests/models.py:975: JSONDecodeError
Double-check that the file exists (in case of a really early crash)if [ ! -f build/3.12/integration-test-results/junit.xml ]; then exit $RESULT |
Changed SouthHollandDistrictCouncil.py as per issue #1741 - now using dedicated endpoint to retrieve collection dates.
Summary by CodeRabbit
Refactor
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.