Resubmission Linking Command Updates by jperson1 · Pull Request #5622 · GSA-TTS/FAC

jperson1 · 2026-05-06T20:21:30Z

Resubmission Linking Command Updates

I will not make such large commits a habit.

Related tickets

Directly related - [RESUB DE-DUP] Resubmission Linking Command #5546
Directly related - [RESUB DE-DUP] *Revert* Resubmission Linking Command #5550
Only sort of related, but this includes the opposite of [RESUB DE-DUP] Manual Linking of Resubmissions #5549

Description of changes

The linking command:
- Now uses equivalence to determine the chains. We may come back and use distance to catch stragglers.
- The reviewable CSVs saw an update to enable the undo-ing, and all the files go into their own subdirectory.
A new command to undo the linkage after the fact, just in case.
- Uses a CSV generated from the linking command.
A new command to wipe all resubmission metadata for an audit year or a set of specific records.
- Very convenient while testing. Will be useful if any of these linkages proves false way down the line.
A little Makefile change from last week.

How to test

Switch to this branch, bring everything up normally. Ensure you have some local data to play with. Ideally, both Census and GSA records.

The linking command

Try a few years, perhaps. The more recent years include more data. Those will take a little longer. Run the command with:
python manage.py link_resubmissions --email {ADMIN_EMAIL} --audit_year 2020

Check the .md and .csv files that are produced. Verify they look right. You'll see a lot of duplicate data - that's fine, and ideal, since it means the records are truly in the same resubmission chain. For the 2019-2022 range, verify there are some Census-GSA crossovers.

Press 'c' and 'enter' to continue. You'll see each record get redisseminated quickly.

Verify a few records in search and/or through the API. You may want to re-up the materialized view to use Advanced Search. Also spot check the internal singleauditchecklist table.

The undo linking command

You've just run a few years, probably. Grab a related CSV, and run a command like:
python manage.py undo_link_resubmissions --email {ADMIN_EMAIL} --csv curation/data/{FILENAME}.csv

You'll see a "INFO CSV contains XYZ records." Verify it's right. Press 'c' and 'enter' to continue. You'll see each record get redisseminated quickly.

Verify a few records after the fact. Any record that previously had "NULL" resubmission data will instead have version 0 and status "unknown_resubmission_status". If we were to redisseminate it with NULL, it would be assumed to be version 1. So we explicitly say "unknown" after taking curative action on it.

The reset resubmission metadata command

This command sets the resub version to 0 and the status to "unknown_resubmission_status" for all records it hits. It will do either a full AY or accept a list of report_id's. It's very nice for testing the above commands locally. We may eventually be using this to reset records that are incorrectly brought together - either by our own actions or by user error.

Run the command like:
python manage.py reset_resubmission_metadata --email {ADMIN_EMAIL} --audit_year 2024
python manage.py reset_resubmission_metadata --email {ADMIN_EMAIL} --report_ids 2024-12-GSAFAC-0000383165 2024-12-GSAFAC-0000387718 2024-12-GSAFAC-0000398121

You'll see something like the following:
"INFO Found XYZ records for AY2024."
"INFO Found 3 records for report IDs: 2024-12-GSAFAC-0000383165, 2024-12-GSAFAC-0000387718, 2024-12-GSAFAC-0000398121."

Press 'c' and 'enter' to continue. You'll see each record get redisseminated quickly. Verify a few records after the fact.

Screenshot

* The command itself uses equivalence to determine the chains. We may come back and use distance to catch stragglers. The reviewable CSVs saw an update, and all the files go into their own subdirectory. * A new command to undo the linkage after the fact, just in case. * A new command to wipe all resubmission metadata for an audit year or a ser of specific records. * Some small changes in various places. I will not make such large commits a habit.

The distance and equivalence cluster generation functions are separate, and get their own test classes.

…metatdata, set to v0 rather than NULL. NULL will become v1 at dissemination time.

jperson1 · 2026-05-07T16:41:24Z

I've tested pretty extensively in the Preview environment, and things look good. I had to bump the memory to run the linking command - and it still couldn't get to some of the "larger" years. We may want to bump Production a bit before running the commands there. Something to keep in mind.

phildominguez-gsa · 2026-05-11T12:42:19Z

-    # For each record, compute its distance to the existing sets.
-    # If it is below the threshold, insert it into an existing set.
-    # Otherwise, insert into a new set.
+def generate_clusters_from_records_by_equivalence(records, noisy=False):


PEDANTIC ALERT (also, I know you inherited this code)

Some of these names bug me. "Cluster" instead of "chain" (only makes sense if you agree with my other sorting comment, though) and "record" instead of "audit" or "submission". For example, could this be generate_audit_chains_by_equivalence? Probably annoying to swap the verbiage everywhere but I figure it's worth checking and I'm down to discuss.

I fully agree, and I'll take every opportunity to fix up the language around things. I'm trying to use "chain" pretty much everywhere. That may or may not be helpful, since they're not really "chains" until the sort and annotation is made? They are aspiring chains, and that seems... Fine. I've switched to "submission" in most places, I feel like that makes sense. I've updates some variables to use sac or sacs, which matches up with a lot of the backend code. Should be better, but I'm sure I missed stuff. Nothing broke!

…records_postgres`

…usters" and "sets" in most/all places. Also, fixes some imports. I blame PyLance.

… do elsewhere.

…istency.

… sense to keep it in one spot.

github-actions · 2026-05-14T20:11:31Z

Package	Line Rate	Branch Rate	Health
.	100%	100%	✔
api	98%	86%	✔
api.serializers	97%	88%	✔
api.views	91%	96%	✔
audit	95%	80%	✔
audit.cross_validation	97%	86%	✔
audit.fixtures	84%	50%	❌
audit.formlib	92%	62%	✔
audit.intakelib	89%	83%	➖
audit.intakelib.checks	92%	86%	✔
audit.intakelib.common	98%	82%	✔
audit.intakelib.transforms	100%	95%	✔
audit.management.commands	78%	17%	❌
audit.migrations	100%	100%	✔
audit.models	91%	69%	✔
audit.templatetags	100%	100%	✔
audit.test_viewlib	100%	100%	✔
audit.views	75%	52%	❌
census_historical_migration	96%	65%	✔
census_historical_migration.migrations	100%	100%	✔
census_historical_migration.sac_general_lib	92%	84%	✔
census_historical_migration.transforms	95%	90%	✔
census_historical_migration.workbooklib	68%	69%	❌
config	78%	37%	❌
curation	94%	90%	✔
curation.curationlib	79%	51%	❌
curation.management.commands	46%	34%	❌
curation.migrations	100%	100%	✔
dissemination	90%	70%	✔
dissemination.analytics	27%	0%	❌
dissemination.forms	80%	30%	❌
dissemination.migrations	97%	25%	✔
dissemination.models	100%	100%	✔
dissemination.report_generation	21%	0%	❌
dissemination.report_generation.excel	32%	0%	❌
dissemination.searchlib	61%	44%	❌
dissemination.templatetags	52%	6%	❌
dissemination.views	67%	47%	❌
djangooidc	53%	38%	❌
djangooidc.tests	100%	94%	✔
report_submission	100%	96%	✔
report_submission.migrations	100%	100%	✔
report_submission.templatetags	74%	100%	❌
report_submission.views	78%	61%	❌
support	94%	75%	✔
support.migrations	100%	100%	✔
support.models	90%	50%	➖
tools	98%	50%	✔
users	95%	86%	✔
users.fixtures	100%	83%	✔
users.management	100%	100%	✔
users.management.commands	100%	100%	✔
users.migrations	100%	100%	✔
Summary	88% (22648 / 25643)	68% (2766 / 4050)	➖

Minimum allowed line rate is 85%

jperson1 added 2 commits May 6, 2026 15:45

Fixing a cleanup mistake, some comment rewriting.

9cf34e9

jperson1 self-assigned this May 6, 2026

jperson1 temporarily deployed to testing May 6, 2026 20:21 — with GitHub Actions Inactive

Light refactor to add tests.

56818f7

The distance and equivalence cluster generation functions are separate, and get their own test classes.

jperson1 temporarily deployed to testing May 6, 2026 21:08 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 14:15 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 14:16 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 14:26 — with GitHub Actions Inactive

Bump preview app memory to test the command. Maybe meaningful.

0b85488

jperson1 temporarily deployed to testing May 7, 2026 15:21 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 15:21 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 15:23 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 15:32 — with GitHub Actions Inactive

On making new links, have v0 records start at v1. On resetting resub …

7dcef47

…metatdata, set to v0 rather than NULL. NULL will become v1 at dissemination time.

jperson1 temporarily deployed to testing May 7, 2026 15:52 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 15:53 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 15:54 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 16:04 — with GitHub Actions Inactive

Typo fix.

fce4c33

jperson1 had a problem deploying to testing May 7, 2026 16:10 — with GitHub Actions Failure

jperson1 temporarily deployed to preview May 7, 2026 16:10 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 16:12 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 16:21 — with GitHub Actions Inactive

Drop preview RAM back down post-testing.

3ac80f0

jperson1 temporarily deployed to preview May 7, 2026 16:40 — with GitHub Actions Inactive

jperson1 temporarily deployed to testing May 7, 2026 16:40 — with GitHub Actions Inactive

jperson1 marked this pull request as ready for review May 7, 2026 16:41

jperson1 temporarily deployed to preview May 7, 2026 16:41 — with GitHub Actions Inactive

jperson1 temporarily deployed to preview May 7, 2026 16:51 — with GitHub Actions Inactive

phildominguez-gsa requested changes May 11, 2026

View reviewed changes

jperson1 added 10 commits May 14, 2026 11:20

Merge branch 'main' into jp/resub-linking-command

c6b12f0

fetch_sac_resubmission_records_postgres to `fetch_sac_disseminated_…

6b36aa6

…records_postgres`

export_resubmission_clusters to export_resubmission_chains

0b3462e

Trying to unify language around "chains" of submissions. Replaces "cl…

9794919

…usters" and "sets" in most/all places. Also, fixes some imports. I blame PyLance.

Trying to unify "record" langauge into "submissions" or "sacs", as we…

9d37bef

… do elsewhere.

More naming changes. More import fixes. Test fixes for the new language.

68ec4fb

Filter chains by list comprehension rather than lfiler, just for cons…

63b5a1c

…istency.

Handle invalid JSON coming from the CSVs

70828b1

Move the chain sorting to the end of the chain generation. Makes more…

856a6e2

… sense to keep it in one spot.

More import fixes! That's what I get, renmaing so many things at once.

8476772

jperson1 had a problem deploying to testing May 14, 2026 19:54 — with GitHub Actions Failure

LINTING

161ac73

jperson1 had a problem deploying to testing May 14, 2026 19:55 — with GitHub Actions Failure

Merge branch 'main' into jp/resub-linking-command

b4dddc9

jperson1 had a problem deploying to testing May 14, 2026 19:57 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resubmission Linking Command Updates#5622

Resubmission Linking Command Updates#5622
jperson1 wants to merge 19 commits into
mainfrom
jp/resub-linking-command

jperson1 commented May 6, 2026 •

edited

Loading

Uh oh!

jperson1 commented May 7, 2026

Uh oh!

Uh oh!

phildominguez-gsa May 11, 2026

Uh oh!

jperson1 May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jperson1 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Resubmission Linking Command Updates

Related tickets

Description of changes

How to test

The linking command

The undo linking command

The reset resubmission metadata command

Screenshot

Uh oh!

jperson1 commented May 7, 2026

Uh oh!

Uh oh!

phildominguez-gsa May 11, 2026

Choose a reason for hiding this comment

Uh oh!

jperson1 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jperson1 commented May 6, 2026 •

edited

Loading