Skip to content

Bug: Out-Of-Memory (OOM) Crash on Large File Uploads (Google Drive/YouTube) #244

@anaslimem

Description

@anaslimem

Description

The gws CLI supports uploading files (to Google Drive) via the --upload flag. However, the current implementation reads the entire file into memory twice before sending the HTTP request, and due to Rust's vector reallocation strategy, actually allocates up to 4x the file size in memory. For large files (video files or large archives), this causes an immediate memory exhaustion (OOM) crash.

In src/executor.rs (build_http_request), the CLI does:

// 1. Reads the ENTIRE file into a Vec<u8> in memory
let file_bytes = tokio::fs::read(upload_path).await.map_err(...)?;

// 2. Passes the entire byte slice to build_multipart_body
let (multipart_body, content_type) = build_multipart_body(&input.body, &file_bytes)?;

Then in build_multipart_body:

let mut body = Vec::new(); // Starts empty, grows dynamically 
// ... appends boundaries and metadata ...
// 3. Allocates a SECOND copy of the entire file in memory
body.extend_from_slice(file_bytes); 

Proof (Mock Test)

We wrote a memory-tracking allocator test mimicking the exact logic from executor.rs on a 500 MB mock file vs a streaming approach. Here are the true memory allocation results from the OS level:

Current Implementation Memory:

=== OOM VULNERABILITY MOCK TEST ===
Simulating an upload of a 500 MB file...
[Step 1] tokio::fs::read completed. Memory allocated: 500 MB
[Step 2] build_multipart_body completed. Memory allocated: 1500 MB

Total memory allocated: 2000 MB (4.00x Overhead)

Proposed Streaming Fix Memory:

=== STREAMING FIX MOCK TEST ===
Simulating an upload of a 500 MB file via chunks...

Total memory allocated: 0.06 MB (0.0001x Overhead)

Because body is initialized as an empty Vec::new() and extend_from_slice is called with a massive 500MB slice, Rust's vector capacity doubling strategy forces it to allocate far more memory than strictly necessary while growing the vector. Uploading a 5 GB file asks the OS for 20 GB of contiguous RAM, guaranteeing a crash.

Steps to Reproduce

  1. Create a large dummy file (example 5 GB):
    dd if=/dev/zero of=large_video.mp4 bs=1G count=5
  2. Attempt to upload it to Google Drive using gws:
    gws drive files create --upload large_video.mp4 --params '{"name": "large_video.mp4"}'
  3. Observe the CLI crash or get killed by the OS OOM killer as memory usage spikes.

Expected Behavior

The CLI should stream the file chunks directly from the disk to the network socket, maintaining a small, constant memory footprint regardless of the file size. As proven in the mock test, this requires < 1MB of RAM overhead.

Suggested Fix

Since Google APIs require multipart/related (which reqwest::multipart::Form does not natively support, as it only does multipart/form-data), construct an async stream and pass it to reqwest::Body::wrap_stream.

The stream should yield:

  1. Bytes containing the opening boundary and JSON metadata part.
  2. File chunks yielded via tokio_util::codec::FramedRead (reading the file sequentially).
  3. Bytes containing the closing boundary.

This guarantees O(1) memory usage (a few MBs for the chunk buffer) during uploads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions