Skip to content

Changed to use dedicated API to retrieve collection data#1742

Merged
robbrad merged 2 commits into
robbrad:dec_releasefrom
rae-p:SouthHollandAPIChange
Dec 7, 2025
Merged

Changed to use dedicated API to retrieve collection data#1742
robbrad merged 2 commits into
robbrad:dec_releasefrom
rae-p:SouthHollandAPIChange

Conversation

@rae-p
Copy link
Copy Markdown
Contributor

@rae-p rae-p commented Dec 4, 2025

Changed SouthHollandDistrictCouncil.py as per issue #1741 - now using dedicated endpoint to retrieve collection dates.

Summary by CodeRabbit

  • Refactor

    • Streamlined bin collection lookup for South Holland with a direct data-fetching flow, improving speed and reliability of schedule retrieval.
    • Simplified control flow and response handling, returning normalized bin entries sorted by collection date.
  • Bug Fixes

    • Added UPRN validation to surface invalid-input errors earlier.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 4, 2025

Walkthrough

Replaced Selenium/BeautifulSoup browser scraping with a direct JSON-RPC POST to https://www.sholland.gov.uk/apiserver/ajaxlibrary (method SouthHolland.Waste.getCollectionDates) using a validated UPRN; response JSON is parsed for bin typeDisplay and nextDate, bins built and returned sorted by collectionDate.

Changes

Cohort / File(s) Summary
Data retrieval refactor
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
Removed Selenium/WebDriver and BeautifulSoup usage; added UPRN validation (check_uprn usage and ValueError on invalid input); implemented JSON-RPC POST to https://www.sholland.gov.uk/apiserver/ajaxlibrary with method SouthHolland.Waste.getCollectionDates; parse response["result"] to extract typeDisplay and nextDate; build bins list of { "type": ..., "collectionDate": ... } and sort by collectionDate; removed driver cleanup and HTML parsing logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify JSON-RPC payload structure and headers
  • Confirm UPRN validation behavior and error handling
  • Ensure response["result"] access handles possible missing/null values safely
  • Check date format assumptions (YYYY-MM-DD) and sorting correctness
  • Confirm removal of Selenium/BeautifulSoup leaves no dangling imports or references

Poem

🐇 I hopped from pages, then found a key,

JSON doors opened, set the bins free.
No browsers now — just tidy and bright,
Dates lined up neatly, all in a row tonight.
Hooray for fewer hops and lighter code delight!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing the Selenium/BS4 web scraping approach with a direct API-based solution for retrieving collection data from South Holland District Council.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (2)

31-43: Add error handling for API response failures.

The code assumes a successful response with a "result" key. JSON-RPC errors return an "error" key instead, and network/HTTP errors will cause cryptic exceptions. Consider adding defensive checks:

         # Parse the JSON response
+        response.raise_for_status()
         bin_collection = response.json()

+        if "error" in bin_collection:
+            raise ValueError(f"API error: {bin_collection['error']}")
+
         # Loop through each collection in bin_collection
-        for collection in bin_collection["result"]:
+        for collection in bin_collection.get("result", []):
             bin_type = collection.get("typeDisplay")
             collection_date = collection.get("nextDate")

45-47: Consider handling missing or malformed dates in sort.

If any collection entry has a None or malformed nextDate, strptime will raise an exception. You could add a fallback:

         bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d")
+            key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d") if x.get("collectionDate") else datetime.max
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c8a80 and 2f23a65.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • check_uprn (67-78)
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py

17-17: check_uprn may be undefined, or defined from star imports

(F405)


29-29: Probable use of requests call without timeout

(S113)


46-46: datetime may be undefined, or defined from star imports

(F405)

🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)

1-1: LGTM!

The requests import is appropriately added for the new API-based approach.

Comment thread uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py Outdated
Comment on lines +28 to +29
# Make the GET request
response = requests.post(URI, json=data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a timeout to prevent indefinite hangs; fix misleading comment.

The requests.post call lacks a timeout, which can cause the application to hang indefinitely if the server is unresponsive. Also, the comment incorrectly says "GET request" but uses POST.

-        # Make the GET request
-        response = requests.post(URI, json=data)
+        # Make the POST request
+        response = requests.post(URI, json=data, timeout=30)
🧰 Tools
🪛 Ruff (0.14.7)

29-29: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
around lines 28-29, the inline comment incorrectly says "GET request" while the
code performs a POST and the requests.post call has no timeout; update the
comment to "Make the POST request" and add a timeout parameter (e.g. timeout=10)
to the requests.post call, and wrap the call in a try/except to catch
requests.exceptions.Timeout and requests.exceptions.RequestException to
handle/log timeouts and other network errors appropriately.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)

48-50: Add error handling for date parsing to improve robustness.

The sort assumes all dates are in "%Y-%m-%d" format. If the API changes its date format, this will raise an unclear ValueError.

Consider wrapping the sort in a try-except or validating the date format earlier:

-bindata["bins"].sort(
-    key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d")
-)
+try:
+    bindata["bins"].sort(
+        key=lambda x: datetime.strptime(x.get("collectionDate"), "%Y-%m-%d")
+    )
+except (ValueError, TypeError) as e:
+    raise ValueError(f"Invalid date format in collection data: {e}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f23a65 and 569fe7f.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • check_uprn (67-78)
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py

17-17: check_uprn may be undefined, or defined from star imports

(F405)


18-18: Avoid specifying long messages outside the exception class

(TRY003)


49-49: datetime may be undefined, or defined from star imports

(F405)

🔇 Additional comments (5)
uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py (5)

1-1: LGTM!

The requests import is appropriate for the new API-based implementation.


16-19: Past issue resolved: UPRN validation now properly enforced.

The code now checks the return value of check_uprn and raises a ValueError when validation fails, preventing execution with an invalid UPRN. This addresses the previous review comment.


20-29: LGTM!

The JSON-RPC 2.0 payload structure is correct, and the API endpoint configuration is appropriate.


31-32: Past issue resolved: timeout added and comment corrected.

The POST request now includes a 30-second timeout to prevent indefinite hangs, and the comment accurately describes the operation. This addresses the previous review comment.


52-52: LGTM!

The return statement correctly returns the bins data in the expected format.

Comment on lines +34 to +35
# Parse the JSON response
bin_collection = response.json()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for HTTP and JSON-RPC responses.

The code lacks validation for HTTP errors and JSON-RPC error responses. If the API returns a non-2xx status, malformed JSON, or a JSON-RPC error object, users will encounter unclear exceptions.

Consider adding comprehensive error handling:

 # Make the POST request
 response = requests.post(URI, json=data, timeout=30)
+response.raise_for_status()
 
 # Parse the JSON response
 bin_collection = response.json()
+
+# Check for JSON-RPC error
+if "error" in bin_collection:
+    error = bin_collection["error"]
+    raise ValueError(f"API error: {error.get('message', 'Unknown error')}")
+
+if "result" not in bin_collection:
+    raise ValueError("Unexpected API response format: missing 'result' key")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
around lines 34-35, add robust HTTP and JSON-RPC error handling: after making
the request check response.status_code and call response.raise_for_status() (or
explicitly raise a descriptive exception on non-2xx), parse JSON inside a
try/except to catch JSONDecodeError and raise a clear error, then validate the
parsed JSON for JSON-RPC error payloads (if an "error" key exists raise an
exception including the error contents or message) and only proceed/return the
expected result when no HTTP error, no JSON decode failure, and no JSON-RPC
error are present.

Comment on lines +37 to +46
# Loop through each collection in bin_collection
for collection in bin_collection["result"]:
bin_type = collection.get("typeDisplay")
collection_date = collection.get("nextDate")

dict_data = {
"type": bin_type,
"collectionDate": collection_date
}
bindata["bins"].append(dict_data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate collection data before processing.

The loop assumes all collections have valid typeDisplay and nextDate values. If the API returns incomplete data, this can lead to runtime errors during sorting or poor user experience with missing bin types.

Add validation for the collection data:

 # Loop through each collection in bin_collection
 for collection in bin_collection["result"]:
     bin_type = collection.get("typeDisplay")
     collection_date = collection.get("nextDate")
+    
+    # Skip collections with missing required data
+    if not bin_type or not collection_date:
+        continue
     
     dict_data = {
         "type": bin_type,
         "collectionDate": collection_date
     }
     bindata["bins"].append(dict_data)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Loop through each collection in bin_collection
for collection in bin_collection["result"]:
bin_type = collection.get("typeDisplay")
collection_date = collection.get("nextDate")
dict_data = {
"type": bin_type,
"collectionDate": collection_date
}
bindata["bins"].append(dict_data)
# Loop through each collection in bin_collection
for collection in bin_collection["result"]:
bin_type = collection.get("typeDisplay")
collection_date = collection.get("nextDate")
# Skip collections with missing required data
if not bin_type or not collection_date:
continue
dict_data = {
"type": bin_type,
"collectionDate": collection_date
}
bindata["bins"].append(dict_data)
🤖 Prompt for AI Agents
In uk_bin_collection/uk_bin_collection/councils/SouthHollandDistrictCouncil.py
around lines 37 to 46, the loop unconditionally reads collection["typeDisplay"]
and collection["nextDate"] which may be missing or malformed; update the code to
validate each collection before appending: check that "typeDisplay" is present
and non-empty, check that "nextDate" exists and is a valid date string (attempt
to parse it or use a safe date validator), skip (and optionally log) any
collection missing required fields or with an unparsable date, and only append
dicts with normalized values (or a safe default if appropriate) so downstream
sorting/processing won’t error. Ensure validation uses try/except around date
parsing and avoids adding entries with invalid data.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (37c8a80) to head (569fe7f).
⚠️ Report is 97 commits behind head on dec_release.

Additional details and impacted files
@@             Coverage Diff              @@
##           dec_release    #1742   +/-   ##
============================================
  Coverage        86.79%   86.79%           
============================================
  Files                9        9           
  Lines             1136     1136           
============================================
  Hits               986      986           
  Misses             150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robbrad
Copy link
Copy Markdown
Owner

robbrad commented Dec 7, 2025

     try:
        return complexjson.loads(self.text, **kwargs)
    except JSONDecodeError as e:
        # Catch JSON-related errors and raise as requests.JSONDecodeError
        # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
      raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

E requests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)

../../../.cache/pypoetry/virtualenvs/uk-bin-collection-EwS6Gn8s-py3.12/lib/python3.12/site-packages/requests/models.py:975: JSONDecodeError
---------------------------- Captured stdout setup -----------------------------
Running test for council: SouthHollandDistrictCouncil

  • generated xml file: /home/runner/work/UKBinCollectionData/UKBinCollectionData/build/3.12/integration-test-results/junit.xml -
    =========================== short test summary info ============================
    FAILED uk_bin_collection/tests/step_defs/test_validate_council.py::test_scenario_outline[SouthHollandDistrictCouncil] - requests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)
    ============================== 1 failed in 4.12s ===============================

Double-check that the file exists (in case of a really early crash)

if [ ! -f build/3.12/integration-test-results/junit.xml ]; then
echo ""
> build/3.12/integration-test-results/junit.xml;
fi; \

exit $RESULT

https://github.com/robbrad/UKBinCollectionData/actions/runs/19944293890/job/57360926755?pr=1742

@robbrad robbrad changed the base branch from master to dec_release December 7, 2025 10:57
@robbrad robbrad merged commit 129d79d into robbrad:dec_release Dec 7, 2025
11 of 13 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants