Skip to content

Stages discarding documents in a parallell pipeline makes other stages log errors #226

@laserval

Description

@laserval

When multiple stages are doing work on a document and one of them discards the document, other stages working on the same document will attempt to persist it and fail.

Ideally, stages would know if another stage has discarded the working document, and be able to act on that (perhaps by simply ignoring the document). Documents would need to remain in the documents collection for that to work, I think, and no new stages should be able to fetch the document.

The current behaviour yields logs filled with:

2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): DEBUG Saving document to RemotePipeline..
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): ERROR Node gave an unexpected response: HTTP/1.1 404 Not Found
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): ERROR Message: No document found matching your query
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): ERROR Node gave an unexpected response: HTTP/1.1 404 Not Found
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): ERROR Message: No document found matching your query
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): $STACKTRACE$ 
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): ERROR Unable to persist an error to the database
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): java.io.IOException: Unable to save changes to core
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin):   at com.findwise.hydra.stage.AbstractProcessStage.run(AbstractProcessStage.java:114)
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): 
2013-07-05 13:48:53 : INFO   : Thread-188 : com.findwise.hydra.StreamLogger : Received message from stage systest-strip-html (stdin): $STACKTRACE$ 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions