Skip to content

Cloud Storage not persisting on error #1097

@techniq

Description

@techniq

I've ran into an issue that is only present in production using the Google Cloud Storage API. I have an importer that loads 100 entries at a time. When developing locally (with require('fs')) and an error occurs on say page 5, the previous pages worth of entries (400 total) are still persisted to disk. When using GCS, this is not the case (the file doesn't even appear in the bucket).

Here is a snippet of the problematic part of the pipeline.

    stream = importer.streamPages({perPage: perPage, importDate: importDate}, maxPages);

    let writeStream = null;
    if (process.env.NODE_ENV === 'production') {
      writeStream = bucket.file(importFileName).createWriteStream();
    } else {
      writeStream = fs.createWriteStream(`data/${importFileName}`);
    } 

    stream
      .on('error', err => this.logError(err))
      .pipe(ndjson.stringify())
      .pipe(writeStream);

Metadata

Metadata

Labels

api: storageIssues related to the Cloud Storage API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions