Affected Version
v 0.17.0
Description
We set up Druid Indexer nodes to test the new native parallel ingestion.
Then we used the following InputSource section within an index_parallel spec to point to a "directory" in S3 that would contain a _SUCCESS file along with a bunch of data files.
"inputSource": {
"type": "s3",
"prefixes": ["s3://smt-druid-ingestion-stage/SI-835/year=2020/month=01/day=20/hour=00/1580297687716/auction"]
}
The index_parallel task fails and we observed in the logs that the above section got rewritten to the following
"inputSource": {
"type": "s3",
"uris": null,
"prefixes": null,
"objects": [
{
"bucket": "smt-druid-ingestion-stage",
"path": "SI-835/year=2020/month=01/day=20/hour=00/1580297687716/auction/_SUCCESS"
}
]
}
This looks to me like an attempt was made to support filtering out _SUCCESS files from the file list and that inadvertently the filter condition is doing the opposite.
Affected Version
v 0.17.0
Description
We set up Druid Indexer nodes to test the new native parallel ingestion.
Then we used the following InputSource section within an index_parallel spec to point to a "directory" in S3 that would contain a _SUCCESS file along with a bunch of data files.
The index_parallel task fails and we observed in the logs that the above section got rewritten to the following
This looks to me like an attempt was made to support filtering out _SUCCESS files from the file list and that inadvertently the filter condition is doing the opposite.