Skip to content

Large table uploads fail #85

@JackWilb

Description

@JackWilb

The upload to s3 completes, but the celery task to move data from s3 to arango fails. This is likely fixable by chunking the upload to arango. Here are some relevant info from Jake:

So btw, chunking was implemented, but was reverted with this commit: 2d4685c
So here is what chunking looked like before:

def put_rows(self, rows: List[Dict]) -> RowInsertionResponse:
"""Insert/update rows in the underlying arangodb collection."""
errors = []
# Limit the amount of rows inserted per request, to prevent timeouts
for chunk in chunked(rows, DOCUMENT_CHUNK_SIZE):
res = self.get_arango_collection().insert_many(chunk, overwrite=True)
errors.extend(
(
RowModifyError(index=i, message=doc.error_message)
for i, doc in enumerate(res)
if isinstance(doc, DocumentInsertError)
)
)
inserted = len(rows) - len(errors)
return RowInsertionResponse(inserted=inserted, errors=errors)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions