Skip to content

Bulk Download URL & View#4871

Merged
jperson1 merged 8 commits into
mainfrom
jp/bulk-download-url-view
Apr 10, 2025
Merged

Bulk Download URL & View#4871
jperson1 merged 8 commits into
mainfrom
jp/bulk-download-url-view

Conversation

@jperson1
Copy link
Copy Markdown
Contributor

@jperson1 jperson1 commented Apr 8, 2025

Bulk Download URL & View

Issue: #4862

Changes:

Just the URL and view. Which is essentially a passthrough for users to download files in the public-data folder of our S3 bucket. No user feedback (unless there's a 404), and no file perusal. The intended access point is outside links (the static site).

Some little tests for verification.

Also, a local linting fix. Not sure why that doesn't get hit in the PR checks or in the container.

How to test:

  1. Switch to this branch, run the app normally.
  2. Populate Minio with some goodies to download
    a. The default path is http://localhost:9002
    b. The default login is minioadmin for both the user and pass
    c. Add some files to a folder named public-data
  3. Attempt to download the files in the folder
    a. For example, http://localhost:8000/dissemination/public-data/FILE_RELATIVE_PATH.EXT
    b. If you bring in data from GDrive, try something like http://localhost:8000/dissemination/public-data/historic/2022.zip
    c. Attempts should succeed whether the file is directly in the folder or in a subfolder, like above

Recording:

Screen.Recording.2025-04-08.at.3.06.08.PM.mov

PR Checklist: Submitter

  • Link to an issue if possible. If there’s no issue, describe what your branch does. Even if there is an issue, a brief description in the PR is still useful.
  • List any special steps reviewers have to follow to test the PR. For example, adding a local environment variable, creating a local test file, etc.
  • For extra credit, submit a screen recording like this one.
  • Make sure you’ve merged main into your branch shortly before creating the PR. (You should also be merging main into your branch regularly during development.)
  • Make sure you’ve accounted for any migrations. When you’re about to create the PR, bring up the application locally and then run git status | grep migrations. If there are any results, you probably need to add them to the branch for the PR. Your PR should have only one new migration file for each of the component apps, except in rare circumstances; you may need to delete some and re-run python manage.py makemigrations to reduce the number to one. (Also, unless in exceptional circumstances, your PR should not delete any migration files.)
  • Make sure that whatever feature you’re adding has tests that cover the feature. This includes test coverage to make sure that the previous workflow still works, if applicable.
  • Make sure the full-submission.cy.js Cypress test passes, if applicable.
  • Do manual testing locally. Our tests are not good enough yet to allow us to skip this step. If that’s not applicable for some reason, check this box.
  • Verify that no Git surgery was necessary, or, if it was necessary at any point, repeat the testing after it’s finished.
  • Once a PR is merged, keep an eye on it until it’s deployed to dev, and do enough testing on dev to verify that it deployed successfully, the feature works as expected, and the happy path for the broad feature area (such as submission) still works.
  • Ensure that prior to merging, the working branch is up to date with main and the terraform plan is what you expect.

PR Checklist: Reviewer

  • Pull the branch to your local environment and run make docker-clean; make docker-first-run && docker compose up; then run docker compose exec web /bin/bash -c "python manage.py test"
  • Manually test out the changes locally, or check this box to verify that it wasn’t applicable in this case.
  • Check that the PR has appropriate tests. Look out for changes in HTML/JS/JSON Schema logic that may need to be captured in Python tests even though the logic isn’t in Python.
  • Verify that no Git surgery is necessary at any point (such as during a merge party), or, if it was, repeat the testing after it’s finished.

The larger the PR, the stricter we should be about these points.

Pre Merge Checklist: Merger

  • Ensure that prior to approving, the terraform plan is what we expect it to be. -/+ resource "null_resource" "cors_header" should be destroying and recreating its self and ~ resource "cloudfoundry_app" "clamav_api" might be updating its sha256 for the fac-file-scanner and fac-av-${ENV} by default.
  • Ensure that the branch is up to date with main.
  • Ensure that a terraform plan has been recently generated for the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2025

Terraform plan for meta

No changes. Your infrastructure matches the configuration.
No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

✅ Plan applied in Deploy to Development and Management Environment #978

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2025

Terraform plan for dev

Plan: 1 to add, 0 to change, 1 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.dev.module.cors.null_resource.cors_header must be replaced
-/+ resource "null_resource" "cors_header" {
!~      id       = "*******************" -> (known after apply)
!~      triggers = { # forces replacement
!~          "always_run" = "2025-04-10T13:38:47Z" -> (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

✅ Plan applied in Deploy to Development and Management Environment #978

@jperson1 jperson1 self-assigned this Apr 8, 2025
@jperson1 jperson1 requested a review from a team April 8, 2025 19:26

Ex. Given 'historic/2022.zip', attempt to serve the file located at 'BUCKET/public-data/historic/2022.zip'.
"""
relative_path = f"/public-data/{relative_path}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to process that value at all? E.g.

  • make sure that the only values in it are [a-zA-Z0-9.-_]
  • make sure it starts with [a-zA-Z]

I mostly don't want it to start with .., and... not sure what else we want to do here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're protected from this, since the value comes from the URL? Maybe not, that's a good callout.

Django itself doesn't do any processing on the string, so I think it's just relying on it being a valid URL. https://github.com/django/django/blob/71a19a0e475165dbc14c1fe02f552013ee670e4c/django/urls/converters.py#L39-L41

I'll add a more restrictive converter.

Copy link
Copy Markdown
Contributor Author

@jperson1 jperson1 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I used

[A-Za-z][A-Za-z0-9\/\-.\\_]+

One character, followed by any number of either characters, numbers, or "/", "-", ".", "", or "_".

I could have been less explicit, but this seemed okay since we know the actual names of all served files and this covers it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Thank you. That was the spirit of what I was aiming for. If we ever expand this to cover other things, we'll know really quickly that this is the source of the problem. It assuages my paranoia, if nothing else. :D

@github-actions
Copy link
Copy Markdown
Contributor

Code Coverage

Package Line Rate Branch Rate Health
. 100% 100%
api 99% 87%
api.serializers 97% 88%
api.views 91% 100%
audit 95% 80%
audit.cross_validation 97% 85%
audit.fixtures 84% 50%
audit.intakelib 89% 83%
audit.intakelib.checks 92% 85%
audit.intakelib.common 98% 82%
audit.intakelib.transforms 100% 95%
audit.management.commands 78% 17%
audit.migrations 100% 100%
audit.models 92% 62%
audit.templatetags 100% 100%
audit.views 74% 55%
census_historical_migration 96% 65%
census_historical_migration.migrations 100% 100%
census_historical_migration.sac_general_lib 92% 84%
census_historical_migration.transforms 95% 90%
census_historical_migration.workbooklib 68% 69%
config 78% 37%
curation 100% 100%
curation.curationlib 93% 100%
curation.migrations 100% 100%
dissemination 91% 69%
dissemination.migrations 97% 25%
dissemination.report_generation 29% 0%
dissemination.report_generation.excel 32% 0%
dissemination.searchlib 59% 41%
dissemination.templatetags 100% 100%
dissemination.views 76% 55%
djangooidc 53% 38%
djangooidc.tests 100% 94%
report_submission 100% 95%
report_submission.migrations 100% 100%
report_submission.templatetags 74% 100%
report_submission.views 77% 63%
support 93% 74%
support.migrations 100% 100%
support.models 90% 50%
tools 98% 50%
users 95% 86%
users.fixtures 100% 83%
users.management 100% 100%
users.management.commands 100% 100%
users.migrations 100% 100%
Summary 89% (19974 / 22403) 71% (2430 / 3442)

@jperson1 jperson1 added this pull request to the merge queue Apr 10, 2025
Merged via the queue into main with commit dc24d2f Apr 10, 2025
17 checks passed
@jperson1 jperson1 deleted the jp/bulk-download-url-view branch April 10, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants