Skip to content

Sampler swallows up nulls in CSV #8845

@vogievetsky

Description

@vogievetsky

Affected Version

0.16.0 and probably 0.15.0

Description

When parsing a CSV in the sampler it totally ignores columns that should return as null or ""

Request:

POST /druid/indexer/v1/sampler

Payload:

{
  "type": "index",
  "spec": {
    "type": "index",
    "ioConfig": {
      "type": "index",
      "firehose": {
        "type": "inline",
        "data": "Make,Model,Color,Year\nHonda,,,2009\nVW,,,2010"
      }
    },
    "dataSchema": {
      "dataSource": "sample",
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "csv",
          "timestampSpec": {
            "column": "!!!_no_such_column_!!!",
            "missingValue": "2010-01-01T00:00:00Z"
          },
          "dimensionsSpec": {},
          "hasHeaderRow": true
        }
      }
    }
  },
  "samplerConfig": {
    "numRows": 500,
    "timeoutMs": 15000,
    "cacheKey": "4cadbef182774626b2f2e397ec28e5f8"
  }
}

Result:

{
  "cacheKey": "4cadbef182774626b2f2e397ec28e5f8",
  "numRowsRead": 2,
  "numRowsIndexed": 2,
  "data": [
    {
      "raw": "Honda,,,2009",
      "parsed": {
        "__time": 1262304000000,
        "Year": "2009",
        "Make": "Honda"
      }
    },
    {
      "raw": "VW,,,2010",
      "parsed": {
        "__time": 1262304000000,
        "Year": "2010",
        "Make": "VW"
      }
    }
  ]
}

Where are Model / Color ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions