-
Notifications
You must be signed in to change notification settings - Fork 652
Description
I am uploading files to Google Cloud Storage and then importing them to Big Query and am seeing an intermittent notFound error during the import job.
Here is a simplified version of how I am streaming files to GCS:
rs.pipe(bucket.file('[filename]').createWriteStream({
metadata: {
contentType: 'text/csv',
metadata: {
custom: 'metadata'
}
}).on('finish', function () {});After file streaming has finished, the next step is importing to Big Query:
bigQuery
.dataset([datasetId])
.table([tableId])
.import([bucket.file('[filename]'], {
createDisposition: 'CREATE_IF_NEEDED',
writeDisposition: 'WRITE_TRUNCATE',
sourceFormat: 'CSV',
schema: {
fields: [fields array]
}
}, function (err, bqJob) {
// Executes successfully
});I am seeing the error in the response I get from job.getMetadata():
bigQuery.job(bqJob.id).getMetadata(function (err, response) {});response.status.errorResult.message: Not found: URI gs://[bucket]/[filename]
response.status.errorResult.reason: notFound
response.status.errorResult.errors: [{"reason":"notFound","message":"Not found: URI gs://[bucket]/[filename]"}]
After receiving this error, I have confirmed that the file in question does exist in GCS. Can there be some lag between when a stream has finished and when the file is actually available? If so is there a better way to know when the file(s) are ready for import into big query?
To be clear, this flow of events usually completes successfully, but I am intermittently seeing the notFound error. I would like to track it down or at least find a way to avoid it. Is this a known issue? (I haven't been able to find mention of it yet). Is it possible that rate limiting it occurring and causing this error to be generated?