Creating/Saving Audits by jrothacker · Pull Request #4737 · GSA-TTS/FAC

jrothacker · 2025-03-03T20:10:51Z

Purpose

This PR will begin creating and updating Audits in the new Audit table.

How

Audit should now be wired into everything all the way through to dissemination. Search and all UIs are still reliant on SAC/dissemination tables.

Testing

Linting, Unit Tests, End to End tests

Notes

I truly apologize about the size of this PR.
There is a large refactor as part of this change.
I will want to do additional thorough manual testing before we move this to production.
Due to the nature of this change, I'm going to request 2 approvals before I merge.

github-actions · 2025-03-03T20:12:00Z

Terraform plan for meta

No changes. Your infrastructure matches the configuration.

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

📝 Plan generated in Pull Request Checks #4436

github-actions · 2025-03-03T20:12:00Z

Terraform plan for dev

Plan: 1 to add, 0 to change, 1 to destroy.

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.dev.module.cors.null_resource.cors_header must be replaced
-/+ resource "null_resource" "cors_header" {
!~      id       = "*******************" -> (known after apply)
!~      triggers = { # forces replacement
!~          "always_run" = "2025-03-03T19:52:20Z" -> (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

📝 Plan generated in Pull Request Checks #4436

jrothacker · 2025-03-03T20:37:54Z

+            # These come back as tuples:
+            # [(col1, row1, field1, link1, help-text1), (col2, row2, ...), ...]
+            logger.warning("%s Excel upload failed validation: %s", form_section, err)
+            return JsonResponse({"errors": list(err), "type": "error_row"}, status=400)


To fix the problem, we need to ensure that the error details returned to the user do not contain sensitive information. Instead of returning the raw ValidationError details, we should log the error on the server and return a generic error message to the user. This approach maintains the ability to debug issues while protecting sensitive information from being exposed.

Modify the exception handling block for ValidationError to log the error details and return a generic error message.

Ensure that the logging configuration is set up to capture these error details for debugging purposes.

This is not leaking stack information. This error list contains all the validation errors which gets displayed in a human readable format to the user on submitting the excel file.

This was in place already, and moved as part of the larger refactor.

It would seem most of these scanning errors are heuristics-based rules that are just "keying" off the fact that it's dynamic instead of looking for the context of what the error could be. I think we could probably refactor this in the future, as you mentioned, but it should not impact the system now.

+            raise BadRequest() from err
+        except KeyError as err:
+            logger.warning("Field error. Field: %s", err)
+            return JsonResponse({"errors": str(err), "type": "error_field"}, status=400)


To fix the problem, we need to ensure that the detailed error message from the KeyError exception is not exposed to the user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block for KeyError.

Log the detailed error message using the logger.warning method.

Return a generic error message in the JSON response.

+        except ExcelExtractionError as err:
+            if err.error_key == UNKNOWN_WORKBOOK:
+                return JsonResponse(
+                    {"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400


To fix the problem, we need to ensure that detailed error information is not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the except ExcelExtractionError block to log the error and return a generic message.

+                return JsonResponse(
+                    {"errors": str(err), "type": UNKNOWN_WORKBOOK}, status=400
+                )
+            raise JsonResponse({"errors": list(err), "type": "error_row"}, status=400)


To fix the problem, we should ensure that the error details are not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block for ExcelExtractionError to log the error and return a generic message.

jrothacker · 2025-03-06T16:00:32Z

+        _convert_additional_fields(audit_data, sac)
+
+        # update existing Audit.
+        if Audit.objects.filter(report_id=sac.report_id).exists():


We probably want to ignore existing audits. We'll have to think on this one... in theory existing audits should already be up to date with the sac. But this could be helpful if we're seeing problems with the audits being out of sync. Something for us to ponder.

That is exactly why I introduced it - if audits are out of sync with the intake data for whatever reason, this could be a fail-safe to recuperate any inconsistencies.

That said, with this change it will still likely ignore existing audits because it is ONLY running on SACs where migrated=false. But what would be a good reason not to have a condition like this in place?

One reason, I wouldn't want to update a record that is already up to date. An example where this might happen, is in progress audits that haven't been migrated yet, but were created during the phase of launch where we are monitoring.

I'm not even convinced I'm right, just something I want us to think about.

- Address feedback where we no longer need`_convert_additional_fields()` - it is a duplicate of a method in `utils.py`. - Minor cleanup on `_populate_accesses()`. I no longer felt it needed to be an isolated function while addressing the above bullet point

- Supplied two new arguments, "--disseminated" and "--intake". The first will only fetch SAC data that is disseminated, while the second will only fetch SAC data that has not yet disseminated. - Rather than iterate through 50k SACs at a time, this change introduces a while condition that will continuously migrate batches of 100 until all relevant SACs are migrated. This should mean we only need to run the command once at a time.

Bandit does not like potential SQL injection with string-based query construction, however this logic is only accessed internally.

- Added `migrated_to_audit` flag on `SingleAuditChecklist` for determining which SACs have not yet been migrated. - Added some logic (for local testing ONLY) which cleans up audit data and references to mimic a clean slate.

github-actions · 2025-03-06T21:19:03Z

Package	Line Rate	Branch Rate	Health
.	100%	100%	✔
api	99%	90%	✔
audit	97%	87%	✔
audit.cross_validation	98%	88%	✔
audit.fixtures	84%	50%	❌
audit.intakelib	88%	82%	➖
audit.intakelib.checks	92%	85%	➖
audit.intakelib.common	98%	82%	✔
audit.intakelib.transforms	100%	95%	✔
audit.management.commands	78%	17%	❌
audit.migrations	100%	100%	✔
audit.models	91%	64%	➖
audit.templatetags	100%	100%	✔
audit.views	73%	52%	❌
census_historical_migration	96%	65%	✔
census_historical_migration.migrations	100%	100%	✔
census_historical_migration.sac_general_lib	92%	84%	➖
census_historical_migration.transforms	95%	90%	✔
census_historical_migration.workbooklib	68%	69%	❌
config	77%	37%	❌
curation	100%	100%	✔
curation.curationlib	93%	100%	➖
curation.migrations	100%	100%	✔
dissemination	92%	72%	➖
dissemination.migrations	97%	25%	✔
dissemination.searchlib	76%	66%	❌
dissemination.templatetags	100%	100%	✔
djangooidc	53%	38%	❌
djangooidc.tests	100%	94%	✔
report_submission	93%	88%	➖
report_submission.migrations	100%	100%	✔
report_submission.templatetags	74%	100%	❌
support	91%	66%	➖
support.migrations	100%	100%	✔
support.models	96%	50%	✔
tools	98%	50%	✔
users	95%	92%	➖
users.fixtures	100%	83%	✔
users.management	100%	100%	✔
users.management.commands	100%	100%	✔
users.migrations	100%	100%	✔
Summary	91% (18753 / 20579)	76% (2288 / 3000)	➖

Creating/Saving Audits

fdaff71

jrothacker temporarily deployed to meta March 3, 2025 20:10 — with GitHub Actions Inactive

jrothacker temporarily deployed to dev March 3, 2025 20:11 — with GitHub Actions Inactive

jrothacker temporarily deployed to testing March 3, 2025 20:11 — with GitHub Actions Inactive

github-advanced-security AI found potential problems Mar 3, 2025

View reviewed changes

jrothacker marked this pull request as ready for review March 3, 2025 20:38

phildominguez-gsa requested changes Mar 4, 2025

View reviewed changes

Comment thread backend/audit/models/viewflow.py

phildominguez-gsa previously approved these changes Mar 4, 2025

View reviewed changes

jadudm marked this pull request as draft March 6, 2025 15:23

Audit migration command

2b8892b

rnovak338 dismissed phildominguez-gsa’s stale review via 2b8892b March 6, 2025 15:33

rnovak338 temporarily deployed to dev March 6, 2025 15:34 — with GitHub Actions Inactive

rnovak338 temporarily deployed to meta March 6, 2025 15:34 — with GitHub Actions Inactive

rnovak338 temporarily deployed to testing March 6, 2025 15:34 — with GitHub Actions Inactive

jrothacker commented Mar 6, 2025

View reviewed changes

Comment thread backend/dissemination/management/commands/migrate_audits.py Outdated

jrothacker commented Mar 6, 2025

View reviewed changes

Comment thread backend/dissemination/management/commands/migrate_audits.py

jrothacker commented Mar 6, 2025

View reviewed changes

Linting and feedback

1e2c435

- Address feedback where we no longer need`_convert_additional_fields()` - it is a duplicate of a method in `utils.py`. - Minor cleanup on `_populate_accesses()`. I no longer felt it needed to be an isolated function while addressing the above bullet point

rnovak338 temporarily deployed to dev March 6, 2025 16:25 — with GitHub Actions Inactive

rnovak338 temporarily deployed to meta March 6, 2025 16:25 — with GitHub Actions Inactive

rnovak338 temporarily deployed to testing March 6, 2025 16:25 — with GitHub Actions Inactive

rnovak338 temporarily deployed to dev March 6, 2025 16:48 — with GitHub Actions Inactive

rnovak338 temporarily deployed to meta March 6, 2025 16:48 — with GitHub Actions Inactive

rnovak338 had a problem deploying to testing March 6, 2025 16:48 — with GitHub Actions Failure

rnovak338 temporarily deployed to dev March 6, 2025 17:04 — with GitHub Actions Inactive

rnovak338 temporarily deployed to meta March 6, 2025 17:04 — with GitHub Actions Inactive

Exclusive mgmt command from Bandit

7f62375

Bandit does not like potential SQL injection with string-based query construction, however this logic is only accessed internally.

rnovak338 temporarily deployed to testing March 6, 2025 17:05 — with GitHub Actions Inactive

Remove deprecated "Schema" field

147e76c

rnovak338 temporarily deployed to dev March 6, 2025 17:16 — with GitHub Actions Inactive

rnovak338 temporarily deployed to meta March 6, 2025 17:16 — with GitHub Actions Inactive

rnovak338 temporarily deployed to testing March 6, 2025 17:16 — with GitHub Actions Inactive

New migration and migration logic cleanup

279b56c

- Added `migrated_to_audit` flag on `SingleAuditChecklist` for determining which SACs have not yet been migrated. - Added some logic (for local testing ONLY) which cleans up audit data and references to mimic a clean slate.

rnovak338 temporarily deployed to dev March 6, 2025 21:07 — with GitHub Actions Inactive

rnovak338 temporarily deployed to meta March 6, 2025 21:07 — with GitHub Actions Inactive

rnovak338 temporarily deployed to testing March 6, 2025 21:08 — with GitHub Actions Inactive

jrothacker closed this Mar 7, 2025

jrothacker deleted the jr/source-of-truth-create branch March 7, 2025 18:00

jrothacker mentioned this pull request Mar 10, 2025

Source of Truth - Long Tracking Branch #4765

Merged

@@ -153,3 +153,3 @@
                         logger.warning("%s Excel upload failed validation: %s", form_section, err)
-                        return JsonResponse({"errors": list(err), "type": "error_row"}, status=400)
+                        return JsonResponse({"errors": "Validation failed. Please check your input and try again.", "type": "error_row"}, status=400)
                     except MultiValueDictKeyError as err:

Conversation

jrothacker commented Mar 3, 2025

Purpose

How

Testing

Notes

Uh oh!

github-actions Bot commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check warning

Copilot Autofix

jrothacker Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

anagradova Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Uh oh!

Uh oh!

Uh oh!

jrothacker Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

rnovak338 Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

jrothacker Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented Mar 3, 2025 •

edited

Loading

github-actions Bot commented Mar 3, 2025 •

edited

Loading

anagradova Mar 4, 2025 •

edited

Loading