The upload to s3 completes, but the celery task to move data from s3 to arango fails. This is likely fixable by chunking the upload to arango. Here are some relevant info from Jake:
So btw, chunking was implemented, but was reverted with this commit: 2d4685c
So here is what chunking looked like before:
|
def put_rows(self, rows: List[Dict]) -> RowInsertionResponse: |
|
"""Insert/update rows in the underlying arangodb collection.""" |
|
errors = [] |
|
|
|
# Limit the amount of rows inserted per request, to prevent timeouts |
|
for chunk in chunked(rows, DOCUMENT_CHUNK_SIZE): |
|
res = self.get_arango_collection().insert_many(chunk, overwrite=True) |
|
errors.extend( |
|
( |
|
RowModifyError(index=i, message=doc.error_message) |
|
for i, doc in enumerate(res) |
|
if isinstance(doc, DocumentInsertError) |
|
) |
|
) |
|
|
|
inserted = len(rows) - len(errors) |
|
return RowInsertionResponse(inserted=inserted, errors=errors) |
The upload to s3 completes, but the celery task to move data from s3 to arango fails. This is likely fixable by chunking the upload to arango. Here are some relevant info from Jake: