Skip to content

Conversation

@carloea2
Copy link
Contributor

@carloea2 carloea2 commented Jan 6, 2026

What changes were proposed in this PR?

  • Enforce the single_file_upload_max_size_mib limit for multipart uploads at init by requiring fileSizeBytes + partSizeBytes and rejecting when the total declared file size exceeds the configured max.
  • Persist multipart sizing metadata in DB by adding file_size_bytes and part_size_bytes to dataset_upload_session, plus constraints to keep them valid.
  • Harden uploadPart against size bypasses by computing the expected part size from the stored session metadata and rejecting any request whose Content-Length does not exactly match the expected size (including the final part).
  • Add a final server-side safety check at finish: after lakeFS reports the completed object size, compare to the max and roll back the object if it exceeds the limit.
  • Update frontend init call to pass fileSizeBytes and partSizeBytes when initializing multipart uploads.
  • Add DB migration (sql/updates/18.sql) to apply the schema change on existing deployments.

Any related issues, documentation, discussions?

Close #4147

How was this PR tested?

  • Added/updated unit tests for multipart upload validation and malicious cases, including:

    • max upload size enforced at init (over/equals boundaries + 2-part boundary)
    • header poisoning and Content-Length mismatch rejection (non-numeric/overflow/mismatch)
    • finish rollback when max is tightened before finish (oversized object must not remain accessible)

Was this PR authored or co-authored using generative AI tooling?

Co-authored-by: ChatGPT

@github-actions github-actions bot added ddl-change Changes to the TexeraDB DDL fix frontend Changes related to the frontend GUI service labels Jan 6, 2026
@carloea2 carloea2 changed the title v1 fix(dataset): enforce max file size for multipart upload Jan 6, 2026
@carloea2
Copy link
Contributor Author

carloea2 commented Jan 6, 2026

@xuang7 @aicam @chenlica

@chenlica chenlica requested a review from aicam January 6, 2026 04:44
@chenlica
Copy link
Contributor

chenlica commented Jan 6, 2026

@xuang7 Please be the first reviewer before @aicam can do it.

Copy link
Contributor

@xuang7 xuang7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. LGTM!

Copy link
Contributor

@aicam aicam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think life cycle of upload session records needs better design, if needed, we can meet

@aicam aicam enabled auto-merge (squash) January 19, 2026 21:05
Copy link
Contributor

@aicam aicam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!, thanks for the great PR

auto-merge was automatically disabled January 19, 2026 22:01

Head branch was pushed to by a user without write access

Signed-off-by: carloea2 <carloea2@uci.edu>
@carloea2
Copy link
Contributor Author

@aicam Can you run the testing? Thanks

@aicam aicam enabled auto-merge (squash) January 19, 2026 23:56
@aicam aicam merged commit 7d42cb6 into apache:main Jan 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddl-change Changes to the TexeraDB DDL fix frontend Changes related to the frontend GUI service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multipart dataset upload can bypass single_file_upload_max_size_mib limit

4 participants