-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Description
Bug: Packaging issue when handling large files plus small files (zip + large file)
I found a bug regarding uploading a large file along with several small files in the same upload session.
Reproduction Script
Prerequisites:
- A PID in a Dataverse instance where you have write access
- python-dotenv installed
pip install python-dotenv- Create a
.envfile with the following variables:- DV_URL='your_dataverse_url'
- API_TOKEN='your_api_token'
- PID='your_pid'
To reproduce the issue, use the following script:
# Create dummy files (large and small) for testing purposes
import os
from pathlib import Path
# Create a directory for the files if it doesn't exist
Path("./files").mkdir(parents=True, exist_ok=True)
def create_dummy_file(file_path: Path, size_in_bytes: int):
with open(file_path, "wb") as f:
f.write(os.urandom(size_in_bytes))
create_dummy_file(
Path("./files/4gb_dummy_file.bin"), 4 * 1024 * 1024 * 1024
) # 4 GB file
create_dummy_file(
Path("./files/1gb_dummy_file.bin"), 1 * 1024 * 1024 * 1024
) # 1 GB file
create_dummy_file(Path("./files/10mb_dummy_file.bin"), 10 * 1024 * 1024) # 10 MB file
create_dummy_file(Path("./files/1kb_dummy_file.bin"), 1024) # 1 KB file
from dotenv import load_dotenv
import os
load_dotenv()
# Load the env
DV_URL = os.getenv("DV_URL", "")
API_TOKEN = os.getenv("API_TOKEN", "")
PID = os.getenv("PID", "")
import dvuploader as dv
files = [
*dv.add_directory("./files/"), # Add an entire directory
]
dvuploader = dv.DVUploader(files=files)
dvuploader.upload(
api_token=API_TOKEN,
dataverse_url=DV_URL,
persistent_id=PID,
n_parallel_uploads=4, # Whatever your instance can handle
replace_existing=False,
)And you will only see the 4GB file in the upload queue, the other 3 files are lost (should be zipped together):
╭───────────── DVUploader ──────────────╮
│ Server: https://demo.borealisdata.ca/ │
│ PID: doi:10.80240/FK2/JBIRNC │
│ Files: 4 │
╰───────────────────────────────────────╯
🔎 Checking dataset files
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ File ┃ Status ┃ Action ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ 10mb_dummy_file.bin │ New │ Upload │
│ 4gb_dummy_file.bin │ New │ Upload │
│ 1gb_dummy_file.bin │ New │ Upload │
│ 1kb_dummy_file.bin │ New │ Upload │
└─────────────────────┴────────┴────────┘
⚠️ Direct upload not supported. Falling back to Native API.
🚀 Uploading files
4gb_dummy_file.bin ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
Note: I tried to upload to demo.dataverse.org and demo.borealisdata.ca, the error is the same. So it's not about the repository.
Metadata
Metadata
Assignees
Labels
No labels