Skip to content

Conversation

@thiyaguk09
Copy link
Contributor

Description

This pull request significantly enhances data integrity for object uploads by introducing comprehensive client-side checksum validation. It ensures that data uploaded through the JSON API path remains consistent by verifying both client-provided and client-calculated CRC32C and MD5 hashes against the server's reported hashes. This mechanism proactively detects and prevents silent data corruption, providing a more reliable upload experience.

Impact

The primary impact is a substantial improvement in data integrity and reliability for object uploads, particularly those utilizing the JSON API path (e.g., small single-chunk uploads).

  • Prevents Silent Data Corruption: By validating client-side hashes (calculated or user-provided) against the server's reported hashes, the SDK proactively detects and stops data corruption that might otherwise go unnoticed.
  • Enhanced Reliability: The stream is immediately destroyed upon hash mismatch, preventing the client from assuming a corrupted file upload was successful.
  • Feature Parity: Enables the use of the X-Goog-Hash header, which is essential for ensuring the integrity of the data payload sent to the server.
  • API Usage: Introduces new configuration options (clientCrc32c, clientMd5Hash, crc32c, md5) that allow users fine-grained control over checksum validation behavior.

Testing

  • Unit and Integration Tests Added? Yes. Extensive test coverage was added and expanded in test/resumable-upload.ts and system-test/kitchen.ts. This includes:

    • Verification of X-Goog-Hash header injection with calculated and client-provided hashes in single and multi-chunk uploads.
    • Validation tests to confirm that the stream is correctly destroyed on both CRC32C and MD5 checksum mismatches between the client and server.
    • A dedicated describe block for Validation of Client Checksums Against Server Response with various success and failure scenarios.
  • Were any tests changed? Yes. Tests were expanded and refactored (e.g., checksum application logic tests).

  • Are any breaking changes necessary? No. This change introduces new features and configuration options but does not appear to break existing functionality. The changes are largely additive and internal refactoring of hash handling and validation logic.

Additional Information

  • Refactoring for Robustness: The HashStreamValidator was improved by adding the md5Digest getter, which centralizes the MD5 calculation and caching logic. The related fix in _flush prevents a potential race condition where calling digest() multiple times would cause a runtime error.
  • Internal Consistency: Private helper methods (#validateChecksum, #applyChecksumHeaders) were introduced in resumable-upload.ts to simplify and clarify the hash handling logic, especially ensuring checksum headers are applied correctly across different upload paths.
  • Dependency on Configuration: Proper client-side validation requires users to explicitly enable calculation (crc32c: true, md5: true) or provide the pre-calculated hashes.

Checklist

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease
  • Appropriate docs were updated
  • Appropriate comments were added, particularly in complex areas or places that require background
  • No new warnings or issues will be generated from this change

Fixes #

Adds validation for client-provided (pre-calculated) and
client-calculated CRC32C and MD5 hashes when the final upload request is
made via the JSON API path (status 200).

This ensures consistency checks are performed even when the `Upload`
stream is finalized, preventing silent data corruption if the
server-reported hash (in the response body) mismatches the client's
expected hash.
Adds the 'Validation of Client Checksums Against Server Response' test
suite. Fixes test failures in client-provided hash scenarios by updating
mock responses to ensure server-reported checksums match the client's
expected values.
Refactors four duplicate test cases (CRC32C/MD5 success and failure)
into a single, parameterized test block within the 'Validation of Client
Checksums Against Server Response' suite.

This improves test clarity and reduces code duplication by dynamically
generating test scenarios for post-upload hash validation.
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: storage Issues related to the googleapis/nodejs-storage API. labels Dec 9, 2025
This commit introduces several stability fixes for the ResumableUpload
class:

1.  **Fixes Timeouts in Unit Tests:** Updates `makeRequestStream` mocks
to fully drain the request body stream, resolving stream consumption
deadlocks and timeouts in `#startUploading` unit tests.
2.  **Fixes Multi-Part Hang:** Correctly finalizes the `pipeline` for
partial chunks (`isPartialUpload=true`) by calling `pipelineCallback()`
immediately after successful chunk upload, preventing indefinite hangs
in multi-session tests.
3.  **Fixes Single-Chunk Checksum Missing Header:** Applies the
`X-Goog-Hash` header unconditionally in single-chunk mode if a validator
is present, ensuring checksum validation is active even when
`contentLength` is unknown.
@thiyaguk09 thiyaguk09 marked this pull request as ready for review December 10, 2025 11:33
@thiyaguk09 thiyaguk09 requested review from a team as code owners December 10, 2025 11:33
) ||
this.#validateChecksum(clientMd5HashToValidate, serverMd5, 'MD5')
) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we exit early when there is a successful checksum check? Wouldn't we want to continue and do the cleanup below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early exit (return;) is mandatory because if this.#validateChecksum detects a CRC32C or MD5 mismatch, it immediately calls this.destroy(error). This action places the stream in an error state.

Continuing execution without the early exit would lead to a "false success" by incorrectly running the subsequent cleanup code, which includes emitting 'metadata' and 'uploadFinished'. The return ensures that the upload process halts immediately upon data integrity failure, preventing the application from signaling a successful upload when a mismatch occurs. If both checksums pass or are skipped, the process continues normally to cleanup.

shubham-up-47
shubham-up-47 previously approved these changes Dec 17, 2025
Simplify `HashStreamValidator._flush` by utilizing `md5Digest` getter.
@thiyaguk09
Copy link
Contributor Author

@ddelgrosso1 Just a quick reminder to take a look at this PR when you get a chance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the googleapis/nodejs-storage API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants