Skip to content

Creating/Saving Audits#4737

Closed
jrothacker wants to merge 7 commits into
mainfrom
jr/source-of-truth-create
Closed

Creating/Saving Audits#4737
jrothacker wants to merge 7 commits into
mainfrom
jr/source-of-truth-create

Conversation

@jrothacker
Copy link
Copy Markdown
Contributor

Purpose

This PR will begin creating and updating Audits in the new Audit table.

How

Audit should now be wired into everything all the way through to dissemination. Search and all UIs are still reliant on SAC/dissemination tables.

Testing

Linting, Unit Tests, End to End tests

Notes

I truly apologize about the size of this PR.
There is a large refactor as part of this change.
I will want to do additional thorough manual testing before we move this to production.
Due to the nature of this change, I'm going to request 2 approvals before I merge.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2025

Terraform plan for meta

No changes. Your infrastructure matches the configuration.
No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

📝 Plan generated in Pull Request Checks #4436

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2025

Terraform plan for dev

Plan: 1 to add, 0 to change, 1 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.dev.module.cors.null_resource.cors_header must be replaced
-/+ resource "null_resource" "cors_header" {
!~      id       = "*******************" -> (known after apply)
!~      triggers = { # forces replacement
!~          "always_run" = "2025-03-03T19:52:20Z" -> (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

📝 Plan generated in Pull Request Checks #4436

# These come back as tuples:
# [(col1, row1, field1, link1, help-text1), (col2, row2, ...), ...]
logger.warning("%s Excel upload failed validation: %s", form_section, err)
return JsonResponse({"errors": list(err), "type": "error_row"}, status=400)

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 1 year ago

To fix the problem, we need to ensure that the error details returned to the user do not contain sensitive information. Instead of returning the raw ValidationError details, we should log the error on the server and return a generic error message to the user. This approach maintains the ability to debug issues while protecting sensitive information from being exposed.

  1. Modify the exception handling block for ValidationError to log the error details and return a generic error message.
  2. Ensure that the logging configuration is set up to capture these error details for debugging purposes.
Suggested changeset 1
backend/audit/views/excel_file_handler.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/audit/views/excel_file_handler.py b/backend/audit/views/excel_file_handler.py
--- a/backend/audit/views/excel_file_handler.py
+++ b/backend/audit/views/excel_file_handler.py
@@ -153,3 +153,3 @@
             logger.warning("%s Excel upload failed validation: %s", form_section, err)
-            return JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
+            return JsonResponse({"errors": "Validation failed. Please check your input and try again.", "type": "error_row"}, status=400)
         except MultiValueDictKeyError as err:
EOF
@@ -153,3 +153,3 @@
logger.warning("%s Excel upload failed validation: %s", form_section, err)
return JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
return JsonResponse({"errors": "Validation failed. Please check your input and try again.", "type": "error_row"}, status=400)
except MultiValueDictKeyError as err:
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not leaking stack information. This error list contains all the validation errors which gets displayed in a human readable format to the user on submitting the excel file.

This was in place already, and moved as part of the larger refactor.

Copy link
Copy Markdown
Contributor

@anagradova anagradova Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem most of these scanning errors are heuristics-based rules that are just "keying" off the fact that it's dynamic instead of looking for the context of what the error could be. I think we could probably refactor this in the future, as you mentioned, but it should not impact the system now.

raise BadRequest() from err
except KeyError as err:
logger.warning("Field error. Field: %s", err)
return JsonResponse({"errors": str(err), "type": "error_field"}, status=400)

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 1 year ago

To fix the problem, we need to ensure that the detailed error message from the KeyError exception is not exposed to the user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block for KeyError.

  1. Log the detailed error message using the logger.warning method.
  2. Return a generic error message in the JSON response.
Suggested changeset 1
backend/audit/views/excel_file_handler.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/audit/views/excel_file_handler.py b/backend/audit/views/excel_file_handler.py
--- a/backend/audit/views/excel_file_handler.py
+++ b/backend/audit/views/excel_file_handler.py
@@ -159,3 +159,3 @@
             logger.warning("Field error. Field: %s", err)
-            return JsonResponse({"errors": str(err), "type": "error_field"}, status=400)
+            return JsonResponse({"errors": "A field error occurred.", "type": "error_field"}, status=400)
         except ExcelExtractionError as err:
EOF
@@ -159,3 +159,3 @@
logger.warning("Field error. Field: %s", err)
return JsonResponse({"errors": str(err), "type": "error_field"}, status=400)
return JsonResponse({"errors": "A field error occurred.", "type": "error_field"}, status=400)
except ExcelExtractionError as err:
Copilot is powered by AI and may make mistakes. Always verify output.
except ExcelExtractionError as err:
if err.error_key == UNKNOWN_WORKBOOK:
return JsonResponse(
{"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 1 year ago

To fix the problem, we need to ensure that detailed error information is not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the except ExcelExtractionError block to log the error and return a generic message.

Suggested changeset 1
backend/audit/views/excel_file_handler.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/audit/views/excel_file_handler.py b/backend/audit/views/excel_file_handler.py
--- a/backend/audit/views/excel_file_handler.py
+++ b/backend/audit/views/excel_file_handler.py
@@ -161,7 +161,8 @@
         except ExcelExtractionError as err:
+            logger.warning("Excel extraction error: %s", err)
             if err.error_key == UNKNOWN_WORKBOOK:
                 return JsonResponse(
-                    {"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400
+                    {"errors": "Unknown workbook error", "type": UNKNOWN_WORKBOOK}, status=400
                 )
-            raise JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
+            return JsonResponse({"errors": "Excel extraction error", "type": "error_row"}, status=400)
         except LateChangeError:
EOF
@@ -161,7 +161,8 @@
except ExcelExtractionError as err:
logger.warning("Excel extraction error: %s", err)
if err.error_key == UNKNOWN_WORKBOOK:
return JsonResponse(
{"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400
{"errors": "Unknown workbook error", "type": UNKNOWN_WORKBOOK}, status=400
)
raise JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
return JsonResponse({"errors": "Excel extraction error", "type": "error_row"}, status=400)
except LateChangeError:
Copilot is powered by AI and may make mistakes. Always verify output.
return JsonResponse(
{"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400
)
raise JsonResponse({"errors": list(err), "type": "error_row"}, status=400)

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 1 year ago

To fix the problem, we should ensure that the error details are not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block for ExcelExtractionError to log the error and return a generic message.

Suggested changeset 1
backend/audit/views/excel_file_handler.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/audit/views/excel_file_handler.py b/backend/audit/views/excel_file_handler.py
--- a/backend/audit/views/excel_file_handler.py
+++ b/backend/audit/views/excel_file_handler.py
@@ -161,7 +161,8 @@
         except ExcelExtractionError as err:
+            logger.error("Excel extraction error: %s", err)
             if err.error_key == UNKNOWN_WORKBOOK:
                 return JsonResponse(
-                    {"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400
+                    {"errors": "Unknown workbook error", "type": UNKNOWN_WORKBOOK}, status=400
                 )
-            raise JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
+            return JsonResponse({"errors": "Excel extraction error", "type": "error_row"}, status=400)
         except LateChangeError:
EOF
@@ -161,7 +161,8 @@
except ExcelExtractionError as err:
logger.error("Excel extraction error: %s", err)
if err.error_key == UNKNOWN_WORKBOOK:
return JsonResponse(
{"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400
{"errors": "Unknown workbook error", "type": UNKNOWN_WORKBOOK}, status=400
)
raise JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
return JsonResponse({"errors": "Excel extraction error", "type": "error_row"}, status=400)
except LateChangeError:
Copilot is powered by AI and may make mistakes. Always verify output.
@jrothacker jrothacker marked this pull request as ready for review March 3, 2025 20:38
Comment thread backend/audit/models/viewflow.py
@jadudm jadudm marked this pull request as draft March 6, 2025 15:23
Comment thread backend/dissemination/management/commands/migrate_audits.py Outdated
Comment thread backend/dissemination/management/commands/migrate_audits.py
_convert_additional_fields(audit_data, sac)

# update existing Audit.
if Audit.objects.filter(report_id=sac.report_id).exists():
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to ignore existing audits. We'll have to think on this one... in theory existing audits should already be up to date with the sac. But this could be helpful if we're seeing problems with the audits being out of sync. Something for us to ponder.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly why I introduced it - if audits are out of sync with the intake data for whatever reason, this could be a fail-safe to recuperate any inconsistencies.

That said, with this change it will still likely ignore existing audits because it is ONLY running on SACs where migrated=false. But what would be a good reason not to have a condition like this in place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason, I wouldn't want to update a record that is already up to date. An example where this might happen, is in progress audits that haven't been migrated yet, but were created during the phase of launch where we are monitoring.

I'm not even convinced I'm right, just something I want us to think about.

- Address feedback where we no longer need`_convert_additional_fields()` - it is a duplicate of a method in `utils.py`.
- Minor cleanup on `_populate_accesses()`. I no longer felt it needed to be an isolated function while addressing the above bullet point
- Supplied two new arguments, "--disseminated" and "--intake". The first will only fetch SAC data that is disseminated, while the second will only fetch SAC data that has not yet disseminated.
- Rather than iterate through 50k SACs at a time, this change introduces a while condition that will continuously migrate batches of 100 until all relevant SACs are migrated. This should mean we only need to run the command once at a time.
Bandit does not like potential SQL injection with string-based query construction, however this logic is only accessed internally.
- Added `migrated_to_audit` flag on `SingleAuditChecklist` for determining which SACs have not yet been migrated.
- Added some logic (for local testing ONLY) which cleans up audit data and references to mimic a clean slate.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 6, 2025

Code Coverage

Package Line Rate Branch Rate Health
. 100% 100%
api 99% 90%
audit 97% 87%
audit.cross_validation 98% 88%
audit.fixtures 84% 50%
audit.intakelib 88% 82%
audit.intakelib.checks 92% 85%
audit.intakelib.common 98% 82%
audit.intakelib.transforms 100% 95%
audit.management.commands 78% 17%
audit.migrations 100% 100%
audit.models 91% 64%
audit.templatetags 100% 100%
audit.views 73% 52%
census_historical_migration 96% 65%
census_historical_migration.migrations 100% 100%
census_historical_migration.sac_general_lib 92% 84%
census_historical_migration.transforms 95% 90%
census_historical_migration.workbooklib 68% 69%
config 77% 37%
curation 100% 100%
curation.curationlib 93% 100%
curation.migrations 100% 100%
dissemination 92% 72%
dissemination.migrations 97% 25%
dissemination.searchlib 76% 66%
dissemination.templatetags 100% 100%
djangooidc 53% 38%
djangooidc.tests 100% 94%
report_submission 93% 88%
report_submission.migrations 100% 100%
report_submission.templatetags 74% 100%
support 91% 66%
support.migrations 100% 100%
support.models 96% 50%
tools 98% 50%
users 95% 92%
users.fixtures 100% 83%
users.management 100% 100%
users.management.commands 100% 100%
users.migrations 100% 100%
Summary 91% (18753 / 20579) 76% (2288 / 3000)

@jrothacker jrothacker closed this Mar 7, 2025
@jrothacker jrothacker deleted the jr/source-of-truth-create branch March 7, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants