-
Notifications
You must be signed in to change notification settings - Fork 891
Description
Description
The gws CLI supports uploading files (to Google Drive) via the --upload flag. However, the current implementation reads the entire file into memory twice before sending the HTTP request, and due to Rust's vector reallocation strategy, actually allocates up to 4x the file size in memory. For large files (video files or large archives), this causes an immediate memory exhaustion (OOM) crash.
In src/executor.rs (build_http_request), the CLI does:
// 1. Reads the ENTIRE file into a Vec<u8> in memory
let file_bytes = tokio::fs::read(upload_path).await.map_err(...)?;
// 2. Passes the entire byte slice to build_multipart_body
let (multipart_body, content_type) = build_multipart_body(&input.body, &file_bytes)?;Then in build_multipart_body:
let mut body = Vec::new(); // Starts empty, grows dynamically
// ... appends boundaries and metadata ...
// 3. Allocates a SECOND copy of the entire file in memory
body.extend_from_slice(file_bytes); Proof (Mock Test)
We wrote a memory-tracking allocator test mimicking the exact logic from executor.rs on a 500 MB mock file vs a streaming approach. Here are the true memory allocation results from the OS level:
Current Implementation Memory:
=== OOM VULNERABILITY MOCK TEST ===
Simulating an upload of a 500 MB file...
[Step 1] tokio::fs::read completed. Memory allocated: 500 MB
[Step 2] build_multipart_body completed. Memory allocated: 1500 MB
Total memory allocated: 2000 MB (4.00x Overhead)
Proposed Streaming Fix Memory:
=== STREAMING FIX MOCK TEST ===
Simulating an upload of a 500 MB file via chunks...
Total memory allocated: 0.06 MB (0.0001x Overhead)
Because body is initialized as an empty Vec::new() and extend_from_slice is called with a massive 500MB slice, Rust's vector capacity doubling strategy forces it to allocate far more memory than strictly necessary while growing the vector. Uploading a 5 GB file asks the OS for 20 GB of contiguous RAM, guaranteeing a crash.
Steps to Reproduce
- Create a large dummy file (example 5 GB):
dd if=/dev/zero of=large_video.mp4 bs=1G count=5
- Attempt to upload it to Google Drive using
gws:gws drive files create --upload large_video.mp4 --params '{"name": "large_video.mp4"}' - Observe the CLI crash or get killed by the OS OOM killer as memory usage spikes.
Expected Behavior
The CLI should stream the file chunks directly from the disk to the network socket, maintaining a small, constant memory footprint regardless of the file size. As proven in the mock test, this requires < 1MB of RAM overhead.
Suggested Fix
Since Google APIs require multipart/related (which reqwest::multipart::Form does not natively support, as it only does multipart/form-data), construct an async stream and pass it to reqwest::Body::wrap_stream.
The stream should yield:
Bytescontaining the opening boundary and JSON metadata part.- File chunks yielded via
tokio_util::codec::FramedRead(reading the file sequentially). Bytescontaining the closing boundary.
This guarantees O(1) memory usage (a few MBs for the chunk buffer) during uploads.